# Converting `codons` to `aminoacid sequences`

Implement an application that converts the coding region of a gene, into an aminoacid sequence. In order to start, use the genetic code (the table is available on moodle), note that the translation process must start with an ATG codon, so the reading frame is based on a multiple of 3 letters, until a stop codon is encountered. (Note that there are 3 versions of a stop codon, see the table).

The input of this application will be a **DNA sequence**.
The output of this application will be an **aminoacid sequence**, each aminoacid being represented using single letters. 

Look on the internet for the `aminoacid table` that shows single-letter representations for aminoacids.  

### Codon-to-amino acid mapping based on the genetic code table

In [15]:
codon_to_amino_acid = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M', 
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T', 
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K', 
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',                 
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L', 
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P', 
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q', 
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R', 
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V', 
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A', 
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E', 
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G', 
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S', 
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L', 
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_', 
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
}

### Translating DNA sequence to Protein sequence

In [52]:
def translate_dna_to_protein(dna_sequence):

    dna_sequence = dna_sequence.upper()
    
    start = dna_sequence.find("ATG")
    if start == -1:
        return "No start codon (ATG) found in the sequence."
    
    # Translating
    protein_sequence = ""
    for i in range(start, len(dna_sequence), 3):
        codon = dna_sequence[i:i+3]
        
        if len(codon) != 3:
            break
        
        amino_acid = codon_to_amino_acid.get(codon, '')
        
        if amino_acid == '_':
            break
        
        protein_sequence += amino_acid

    return protein_sequence

In [53]:
dna_sequence = "ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG"
protein = translate_dna_to_protein(dna_sequence)
print("Aminoacid seq.:", protein)

Aminoacid seq.: MAIVMGR
