## 10. MRNA - Inferring mRNA from Protein 
  
Protein translation follows the universal code, where each 3 bases (1 codon) in mRNA represent one amino acid (aa). The genetic code is unique for each aa. In other words, no aa shares genetic code with another. Also, genetic code covers all possibilities of 3 consecutive bases (4^3). However, almost all amino acids have more than one codon (since all codons are 64 and common structural aa are 20). Therefore, a peptide sequence can result from different combinations of mRNA sequences that happen to yield the same aa symbols aligning the same.  
The plan is to just take the product of each independent event (amino acid), each event has a number of elements equal its possible codons.

In [19]:
from math import prod
path = "datasets/rosalind_mrna.txt"
code = {"A": ["GCU","GCC", "GCA","GCG"], "V": ["GUU","GUA","GUC","GUG"], "I": ["AUU", "AUC", "AUA"],"M":["AUG"], "L": ["CUU","CUC", "CUG", "CUA", "UUA","UUG"],"F": ["UUU", "UUC"], "S": ["UCU","UCC","UCA","UCG","AGU","AGC"], "P": ["CCU", "CCC","CCA","CCG"],"T": ["ACU","ACA","ACC","ACG"], "E": ["GAA","GAG"], "D":["GAU", "GAC"], "N":["AAU","AAC"],"K":["AAA","AAG"], "Q":["CAA", "CAG"], "H":["CAU","CAC"], "Y": ["UAU","UAC"], "C": ["UGU","UGC"], "W":["UGG"],"R":["CGA","CGC","CGU","CGG","AGA","AGG"], "G":["GGU","GGC","GGA","GGG"], "STOP":["UGA","UAA","UAG"]} ##genetic code
aa_all = "ACDEFGHIKLMNPQRSTVWY" #all protein-building amino acids

def mrna(path):
    codons_per_aa = {}         ### a dict to get number of unique codons for each aa
    possible_codons = []       ### a list to gather number of possible codons per each aa that the loops slide on.
    for aa in sorted(code):
        codons_per_aa[aa] = len(code[aa]) 

    with open(path) as file:
        for peptide in file:
            for amino in peptide.rstrip():
                if amino in codons_per_aa:          ###when sliding by aa, we append the corresponding number of possible codons for that aa.
                    possible_codons.append(codons_per_aa[amino])
                else:
                    return " ".join(["Error, a.a. not identified at position: ", str(line.index(amino)+1),amino])
    
    total_possible_mrna = prod(possible_codons)   ## since each choosing the codon for each aa is independent from the others, the total number of possible mRNA molecules that yield that peptide is the product of codon possibilities for each aa.
    return (total_possible_mrna % 1000000)*3       ### since the product is immensly huge, we take it modulo 1,000,000. However, we multiply by 3 since all these possibilities can end with 3 different stop codons regardless.
 
print(mrna(path))
###Since the product was taken modulo 1E+6, the answer is only the last 6 digits

2122368
