# Problem

For positive integers $a$ and $n$, $a \mod n$ is the remainder when $a$ is divided by $n$. For
example, $29 \mod 11 = 7$ because $29=11 \times 2 + 7$.

Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect
to the modulo operation. We say that $a$ and $b$ are congruent modulo $n$ if $a \mod n = b \mod n$;
in this case, we use the notation $a \equiv b \mod n$.

Two useful facts in modular arithmetic are that if $a \equiv b \mod n$ and $c \equiv d \mod n$, then
$a+c \equiv (b+d) \mod n$ and $a \times c \equiv (b+d) \mod n$. To check your understanding of these
rules, you may wish to verify these relationships for a=29, b=73, c=10, d=32, and n=11.

As you will see in this exercise, some Rosalind problems will ask for a (very large) integer
solution modulo a smaller number to avoid the computational pitfalls that arise with storing such
large numbers.

<font color="red">Given</font>: A protein string of length at most 1000 aa.

<font color="red">Return</font>: The total number of different RNA strings from which the protein
could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in
protein translation.)

### Sample Dataset

```
MA
```

### Sample Output

```
12
```

In [1]:
CODONS = {
    "UUU": "F", "CUU": "L", "AUU": "I", "GUU": "V",
    "UUC": "F", "CUC": "L", "AUC": "I", "GUC": "V",
    "UUA": "L", "CUA": "L", "AUA": "I", "GUA": "V",
    "UUG": "L", "CUG": "L", "AUG": "M", "GUG": "V",
    "UCU": "S", "CCU": "P", "ACU": "T", "GCU": "A",
    "UCC": "S", "CCC": "P", "ACC": "T", "GCC": "A",
    "UCA": "S", "CCA": "P", "ACA": "T", "GCA": "A",
    "UCG": "S", "CCG": "P", "ACG": "T", "GCG": "A",
    "UAU": "Y", "CAU": "H", "AAU": "N", "GAU": "D",
    "UAC": "Y", "CAC": "H", "AAC": "N", "GAC": "D",
    "UAA": "Stop", "CAA": "Q", "AAA": "K", "GAA": "E",
    "UAG": "Stop", "CAG": "Q", "AAG": "K", "GAG": "E",
    "UGU": "C", "CGU": "R", "AGU": "S", "GGU": "G",
    "UGC": "C", "CGC": "R", "AGC": "S", "GGC": "G",
    "UGA": "Stop", "CGA": "R", "AGA": "R", "GGA": "G",
    "UGG": "W", "CGG": "R", "AGG": "R", "GGG": "G"
}

In [2]:
aminoacids = set(CODONS.values())
freq = {aa: 0 for aa in aminoacids}
for codon in CODONS:
    freq[CODONS[codon]] += 1

In [3]:
def naive(peptide, aa_freq):
    possible_rna = 1
    for aminoacid in peptide:
        possible_rna *= aa_freq[aminoacid]
    possible_rna *= aa_freq["Stop"]

    return possible_rna

In [4]:
def smarter(peptide, aa_freq):
    possible_rna = 1
    for aminoacid in peptide:
        possible_rna %= 10e5
        possible_rna *= aa_freq[aminoacid]
        possible_rna %= 10e5
    possible_rna *= aa_freq["Stop"]
    possible_rna %= 10e5

    return int(possible_rna)

In [5]:
naive("MA", freq)

12

In [6]:
prot = "L" * 35
naive(prot, freq), smarter(prot, freq)
# smarter(prot, freq)

(5157212399245267773085974528, 974528)

In [7]:
run = "MWSCQNCFFCFKSFNKRGDRHHTNQQMHQDLHKYWRGTMMCSSWRNDITARKDESKFVCRRHSPENITHNRLTTACWWPSTGKEMEPRLDKQEALQSNADWFDECFDKSRLWQRNMHYHATMQQRGGWFLVDLCYQTYYNNEMWSSEGQNQAYEHFMNMNIVRTLCMQCNDQDVIWDRWDKSHDCVGQAAEWDYVCQVEETWPVSVHNDVSQWYDVPEPWQCAKVNKQFPFMWCWKYVMAPADHKDQNYRPTHWDQRHKIDMDEYIVMFAMGYYAHAHLSTTFAWNHLEIQCKDWEMKDVAGRDVPMCAHGTIFCVTMFHEWWQWKRFMIDAWMVTNSRVPACMQLPYFCHDAHFNHQYQITPHIQFNGGASYCDHMENYGGYGPDMGMDEQFSNFITTTYYWDVMANRSYVEMMRSVSGCHWISPQPNVTGKMKSVTRRPPCAYRTMVVWFFPLIQDMMEAGYAHEALQNDNSDKCDCLPTHGIYKEIWIYHPLWWHGFDRADSYYCNAHHMNQNVEPGRRKLRPNHAGAYCGAMAVGISQGHQGVPISNHSTHVFNCPDFCSVVGFGHFKVHFIPPKTEGTKFLGELWCGHLPGYAKGNIANHRNEHDLKAKALHDSFSGDNHDNSSGDMRVCLKSHMMIRSIFYKDKQCIWQSSDAHMQRPMVIHLQSYFNTKTWILIPAKLLNTEDIVNLCHNKVKMAISAYACHKNGPDQVTPENMCVIIERMRFMGVVVFFCWFSSHKAKEVIGMISEYQKQWTVFLMTMMCKYARKTRRNLADVYDLQFNVAYFSWWYRPDYAVSWNEDKNWTSYKAMNVQKRYERWGNHWPIKCHDCELDTCRLCILCVDTLDTEYECIKGRNPHKLPMWCPMFRNYRIMVIKKLMNTRQSHMLMQEAVSYVGSFNDIFLVFGWGAHRPHQWFYTHTYMIVKHKRGPWQPIHNPFDYFFLEWCMVQWEDHKGTSTIHITYDYEGQEMVCGAASNSPVTEALTPWEMKCSFQV"

In [8]:
smarter(run, freq)

722176

In [9]:
naive(run, freq) % 1000000

722176