# Translation 
**Translation** is the process of reading the mRNA and generating a protein from it. The cells use tRNAs to "read" the RNA and then the ribsome adds the amino acid attached to the tRNA to the growing peptide chain.  

It's been awhile since we coded so let's refresh the Core4:  
**Variables** are stored values that can change.  
**Conditionals** are if/else statements we use to create alternate code "paths."  
**Loops** are for/while statements that allow us to repeat code blocks for a certain length of time.  
**Functions** are a groups of commands that do (if written well) a discrete task; they're essential for maintaining readability in more complex code/when you need to repeat an action more than twice.  

Let's write a script to translate your mRNA into protein. Using the provided codon dictionary and transcription function, use the pseudocode below to write your script to translate the DNA. Stop codons are represented by an asterisk. 

Questions to consider:
1) What do you want the input(s) and the output of your function to be?  
2) How will you approach indexing codons (3 bases)?  
3) Why do we convert between strings and lists?  

In [4]:
# Given dictionaries and functions
codons = {"UUU" : "F","CUU" : "L","AUU" : "I","GUU" : "V","UUC" : "F","CUC" : "L","AUC" : "I","GUC" : "V","UUA" : "L",
          "CUA" : "L","AUA" : "I","GUA" : "V","UUG" : "L","CUG" : "L","AUG" : "M","GUG" : "V","UCU" : "S","CCU" : "P",
          "ACU" : "T","GCU" : "A","UCC" : "S","CCC" : "P","ACC" : "T","GCC" : "A","UCA" : "S","CCA" : "P","ACA" : "T",
          "GCA" : "A","UCG" : "S","CCG" : "P","ACG" : "T","GCG" : "A","UAU" : "Y","CAU" : "H","AAU" : "N","GAU" : "D",
          "UAC" : "Y","CAC" : "H","AAC" : "N","GAC" : "D","UAA" : "*","CAA" : "Q","AAA" : "K","GAA" : "E","UAG" : "*",
          "CAG" : "Q","AAG" : "K","GAG" : "E","UGU" : "C","CGU" : "R","AGU" : "S","GGU" : "G","UGC" : "C","CGC" : "R",
          "AGC" : "S","GGC" : "G","UGA" : "*","CGA" : "R","AGA" : "R","GGA" : "G","UGG" : "W","CGG" : "R","AGG" : "R","GGG" : "G"}

def transcribe(DNA_str):
    # input: DNA string
    # output: RNA string
    rna_dict = {"A":"A", "T":"U","G":"G","C":"C"}
    rna_list = [rna_dict[base] for base in DNA_str] #this is a list comprehension and also an example of using a dictionary
    return "".join(rna_list) #this is how we join list items into a string


## Here's some pseudocode to get you started. 

    set value of your DNA string  
    transcribe the DNA to RNA  
 
    translation function 
       set a list to hold the amino acids
       for each codon  
          get amino acid value
          append to the list
       join amino acid list  
       (optional: split list by stop codons and take first value)  
     
    pass RNA to translation function
    print peptide

*You can also translate until you reach a stop codon and print the result. Why might you run into errors with that approach? Could you fix this with indexing?*  

*This algorithm assumes that you have a start codon to begin with. In more complex analyses, you'll want to check for a start codon and be able to find proteins in any frame.*  


## Example DNA sequence  
ATGTATCCCAGAACTGAGATCAATAGTACCCGTATTAACGGGTGATAGAACACTACTCACATCTCATAG

**Goal:** Determine the peptide string that would result from the above "gene."

In [10]:
# Your code here

# Bonus: Can you find the longest [ORF](http://rosalind.info/problems/orfr/)?

ORF (open reading frames) are sequences between a start codon and a stop codon. These are the parts of the mRNA that code for the protein. In the above example, the start codon was the very fist codon, but this isn't what we see in nature as mRNA contains untranslated regions (UTRs) or what we see in the genome. In the genome, we don't even know what strand the gene is located on!

Can you modify your above code to find the protein sequence of the longest ORF?

Things to consider:  
1) Which codon is a [start codon](https://biologywise.com/start-codon)?    
2) How can you test the multiple start codons in the sequence? (*Hint: It may be easiest to modify your index via an additional input variable in your function.*)  
3) How will you generate the opposite strand of DNA to test?    
4) How will you get the longest string from your list?   

## Example DNA sequence    
AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG  

## Expected Peptide sequence  
MLLGSFRLIPKETLIQVAGSSPCNLS  


**Goal:** Find the longest ORF in the given DNA sequence. 

In [None]:
# Here's a function to reverse compliment the DNA
# (You did this last week so you can also use that one. )
def RC(DNA_str):
    rc_dict = {"A":"T","T":"A","C":"G","G":"C"}
    rc_list = [rc_dict[letter] for letter in DNA_str[::-1]] #generate a list of complemented letters from the reversed string
    return("".join(rc_list)) #returns RNA string

In [None]:
# Your code here