## Problem #7

Probability is the mathematical study of randomly occurring phenomena. We will model such a phenomenon with a random variable, which is simply a variable that can take a number of different distinct outcomes depending on the result of an underlying random process.

For example, say that we have a bag containing 3 red balls and 2 blue balls. If we let X represent the random variable corresponding to the color of a drawn ball, then the probability of each of the two outcomes is given by Pr(X=red)=35 and Pr(X=blue)=25.

Random variables can be combined to yield new random variables. Returning to the ball example, let Y model the color of a second ball drawn from the bag (without replacing the first ball). The probability of Y being red depends on whether the first ball was red or blue. To represent all outcomes of X and Y, we therefore use a probability tree diagram. This branching diagram represents all possible individual probabilities for X and Y
Y, with outcomes at the endpoints ("leaves") of the tree. The probability of any outcome is given by the product of probabilities along the path from the beginning of the tree; see Figure 2 for an illustrative example.

An event is simply a collection of outcomes. Because outcomes are distinct, the probability of an event can be written as the sum of the probabilities of its constituent outcomes. For our colored ball example, let A be the event "Y is blue." Pr(A) is equal to the sum of the probabilities of two different outcomes: Pr(X=blue and Y=blue)+Pr(X=red and Y=blue), or 310+110=25.
    
    Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.

    Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.

## Sample Dataset
    2 2 2
## Sample Output
    0.78333

In [9]:
# open file containing three ints 
sequence = open(r'/Users/Sid/Downloads/rosalind_dna.txt').read()

# input organism numbers 
k = 28 # homozygous dominant 
m = 21 # heterozygous 
n = 29 # homozygous recessive 

# calculate probability of having a homozygous recessive phenotype 
pFail = (m/(k+m+n))*((m-1)/(k+m+n-1))*(1/4) + (m/(k+m+n))*(n/(k+m+n-1))*(1/2) + (n/(k+m+n))*(m/(k+m+n-1))*(1/2) + (n/(k+m+n))*((n-1)/(k+m+n-1))
# probability of not having homozygous recessive phenotype is (1-pFail)
pSuccess = 1-pFail
# this is the value to return 
print(pSuccess)

0.745920745920746


## Problem #8

The 20 commonly occurring amino acids are abbreviated by using 20 letters from the English alphabet (all letters except for B, J, O, U, X, and Z). Protein strings are constructed from these 20 symbols. Henceforth, the term genetic string will incorporate protein strings along with DNA strings and RNA strings.

The RNA codon table dictates the details regarding the encoding of specific codons into the amino acid alphabet.

    Given: An RNA string *s* corresponding to a strand of mRNA (of length at most 10 kbp).
    Return: The protein string encoded by *s*.

## Sample Dataset
    AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA
## Sample Output
    MAMAPRTEINSTRING

In [41]:
def protein (rna):
    codon = ""
    protein = []
    for i in range(0, len(rna) - 3, 3):
        codon = rna[i:i+3]
        if (codon == "UUU" or codon == "UUC"):
            protein.append("F")
        elif (codon == "UUA" or codon == "UUG"):
            protein.append("L")
        elif (codon == "UCU" or codon == "UCC" or codon == "UCA" or codon == "UCG"):
            protein.append("S")
        elif (codon == "UAU" or codon == "UAC"):
            protein.append("Y")
        elif (codon == "UAA" or codon == "UAG" or codon == "UGA"):
            break
        elif (codon == "UGU" or codon == "UGC"):
            protein.append("C")
        elif (codon == "UGG"):
            protein.append("W")
        elif (codon == "CUU" or codon == "CUC" or codon == "CUA" or codon == "CUG"):
            protein.append("L")
        elif (codon == "CCU" or codon == "CCC" or codon == "CCA" or codon == "CCG"):
            protein.append("P")
        elif (codon == "CAU" or codon == "CAC"):
            protein.append("H")
        elif (codon == "CAA" or codon == "CAG"):
            protein.append("Q")
        elif (codon == "CGU" or codon == "CGC" or codon == "CGA" or codon == "CGG" or codon == "AGA" or codon == "AGG"):
            protein.append("R")
        elif (codon == "AUU" or codon == "AUC" or codon == "AUA"):
            protein.append("I")
        elif (codon == "AUG"):
            protein.append("M")
        elif (codon == "ACU" or codon == "ACC" or codon == "ACA" or codon == "ACG"):
            protein.append("T")
        elif (codon == "AAU" or codon == "AAC"):
            protein.append("N")
        elif (codon == "AAA" or codon == "AAG"):
            protein.append("K")
        elif (codon == "AGU" or codon == "AGC"):
            protein.append("S")
        elif (codon == "GUU" or codon == "GUC" or codon == "GUA" or codon == "GUG"):
            protein.append("V")
        elif (codon == "GCU" or codon == "GCC" or codon == "GCA" or codon == "GCG"):
            protein.append("A")
        elif (codon == "GAU" or codon == "GAC"):
            protein.append("D")
        elif (codon == "GAA" or codon == "GAG"):
            protein.append("E")
        elif (codon == "GGU" or codon == "GGC" or codon == "GGA" or codon == "GGG"):
            protein.append("G")
    return protein

print(protein("AUGGACGUACCGGACACUUGGCAAGGUUACACACAAGAUCGAUAUUUGCGCCCCCACGUCACACCUAAUAUGCUCCCAACUACACCCAGUGCGUGCGUAGGUCAGGAGCUAAAAGAGUACUCCAUCAAGCCCAGUGAGCGCCGGAGAGGUAGAGUAAUGACGGUCAGACGUGGAGGGCGCAAGAUGUCAUACUUUUCCCCAGAGACAUACAUGCCAUUCUGGGUCCAAACGUGUCGACUGUAUCUCCUUAGUGCCGUGAUAUCCCUUAGGCCCGCAUUUGGGACCUCAGGUAAUGACGCUCACCGGGACAGCUUCAUUCGGACUCCGUCAUUUUCGAAUCCACUGAUGGGAGAGGAAACCGACGACCGAAGCUCGCAAAAUCUGUUACCGACCAAAUCCUGGGCCGCAAAAAUUCUGUCAUCUGCAUGGGCGUGGCAGGCGACCACACUCCGUCCACACAGAUAUGGGCAUCCUCAACAAUGUCCAGUGAUUCCUGACAAGACGCCUAUUCUCCUGUCCGUUCAACUUCGUGAAGUUUUACGUAUAUUGGUUAAAGGGACGGUCGAGCCCCGCGAGAUUUUAUUUAGGAUGGCCCAAGUGGAUGGCCCUGGCGUAUGUCGUUUCACUUACGACAAUCGUACGGUGCGCCUUCCAACCGAGAGCAUCUGGGUAUAUGUCUGCAGCGCCUGUAAAGAGGGGGCCCCGGCCACCAACGGGCUGACCGAUUGUUCACGCAACCAUUUCCAGGAACACGGAGAACAAAGAUUCUGUGCACAUGUGCACCAUGGAAGACAGCGUGCGCGGUCUAUGGCCGUGCUGCUAUGCUAUUUAAGGGAUUCCAUCGCUUCGAACCUUAGUCACAGCACAACCUAUUGCAGUACAAGGAUAUUCAGAAAUCAAUCUGAGAUAUUGUUAACCGUUUCCUCUCCCGCAGUCACGCAUUUUUGUGUUAAGAAGUACUCCUCACUGCCCACCUGGGCGAUCGGGGGUUGCUGCCGUCAAGUAGUAGUGAUAACAGUACCGAAUGUUCAGACCGAUCUUGUGGUAGGUACAAUCGAGGACGUCACAAUGACCACAAAGGAAGCGAAACACCACCGAAGACGAGUUGCUCAUCAGACGGCGGUGUGUCGGCAUAUGCCAUCUGGGUACCGAGCGAAGAAGGUCUCAUUGUCCCAGGUAGACCAGUUUAGAGCUUGGAGACCCCGUAACCGAUUACAAUUGGCCACGACGACUGUAGUAUCUCGCCCAUCGCUAGCGCGCACUGAGUCCUCCGAAACUCGCGCUGGAAAAUCCAGUAGCUACUACACGACGGGGAUCCGCGUCGCGAACCGUGCACAUAUUGGGGUCACGGCUUUUCACUCGGGUCACCGCGGCUUCAAUAAAGACCUACCGGAUGUAGCCAGACCGACGAAAUCAGCCGGCGUGCUGAUGACUAGGACCUCCUCAGUCCAGGGGUGCGAUAACAGGCAUCCAGCUCUAACAGCGAGAUUUUUAGUAAGGUAUAAACCCGUAAUUCUUCGCACUUCUAGGAUGGCUCUCAGAAAUCAGCCCUCCAAUGUGGUUGUGUGUUCAAAGGUGUUUUGUCGUCUGCUUUAUAUAAUGUCAAGCAGGCAGCAGAGUCACCGCCAGCAUAAAGUUACGCCUUACGCGUAUGUGCCCACCACUAGUGGGCCCAGCUUUAGUCGGGGAGACCCCUCGAAAGGAUUAAGCAUGUUUCGCUGCCCGCGUUCGAUAGCCCCACGAAGCCUGGGCUCCAAUGCCGUGAAAAGCUUGAAGUCCGGGAGAGUGUUGAUGGAAGAACGACAUAUUCCAUGGAAUUUGUGUGUAUCCGUUCAGGGACCCCUAAUCCUAGAGUUUAGGAUAGUACUUGGCACAGAGGCUGAUAGAGCCAGCCGGUUGCAUUGGCCCGACCGGGGGCCGGAAUUGAUCUUACGUCAGGGAAUGCCGCAAGUAGAAACCCCUUCACCCCGAACAAGUCAAGGUGGAAGCAUGUCGACGACCCGUAAGGUGAACGUCAUCCACCCCAACACUCUGCCUACGGCCCGCACGGUUGUUCGGUUGCGUCUUGGAGGGUAUGGUCCGCACUCACAUUUCUUAUUGGAAGUUGUCUUGUCCUGGGUAGAUUGGAUCCGACUUCUGCCGCUGAUUAGACAGAGACUUCCGGUCCGCAGGAAGCUGUUAGACGCUACAGGUAACUGCUAUAUGACGCAUCUAAUCUGGAACAUAUUAGUGCCGGGUAGCUGUCCACAAGGUAAAUUCUGUGAGUCUGAGUUUCUUGUGACCCAUCUCAAUUCCUUAACCAAUACGAACCAACUUGCAGUAUGGGCCCUACCUUGCGGCAACUUUUACCUACGGAUAUCCCGAGGGGCUAAUGUUUUACCAUACCGUCCCGGGCCUCCAUACGAUCUGUGUGAUGGUCCAGGGCUAGCGGUCAUCAAUCUGCUGGAUGCGGUGCGCCGUCUCGCGUACUACCCAUCCAGCCCUAUUGCUGUAUACGGCGCCGCAAAUCAACUUAUGGGGGACACCGCGAAGUCCCGAACCGCGCGCAAGCUCCUUCGAACGGGAAACCCCCUUUAUCGGCGAAGUUGCAUCAGACGGGUUGGGCCCGCGUCGAGAUUGGCAGUGGGACUGAUCACCCUGACAUUGACCUUGACUUGUUCAACCUACGGCGGAUGCUCACUCAGCCACCUAGGAAGAGACCCAGGGUCCCGUACAACGGACAAAUGUUGCCAGAGCACAGCCGUAUCGUUCGUGAUUAUGAUGGGCCACCUAGAAAGCGAUACCGCGGAAACGCUUAUCCGAGAGAUGGAGCCGAUCUAUAUAUGGGAUAUUCAGAAAAUGGUGGAUCGAAUUUCGGUAACCGGGGUCGUUACAUUACAGUUCCUGCGAAUAGGAGGCAACAUGAAACUUCCUGGUCCACGCUGGGGUUUAUCCAACCGGCGAGCUCUGUGCACCUAUGGAGCUGAUAACAGGCCGACCAAUCUCGUUGUACCCCUAACGUAUCUGCUAGCAUUACCACAUGUUGAGACACUUAAGCGCCGCUUGGUCAAAACACGAAAAUGCGGUCCGCUGAGGUCAUGCACUACUCCUCAAAAACGUCGAAGCUCGCCGUUAAUUCAUGCAUUCCGACGUCAUUGCCACCUUUUACAAGAAGUUGAAACUUUAGUCCAUCUAGUCGCCGUUCGAAUCUCAAUCCCUGUCCUAAUCCUUCUGGGAGAUGGAUUUCGGAGGCGCGAUGGAGUGUUCUUUUUCCACCAAUCGACUCUAAUUGCGGCCUCGGCCACCAUAAAUAAGCAAGUCAGGCGACUUCGACCCGAUGCCUGCUGGUUUGACACCGUCCCGAAUAAGCUCGGUCACACACGGAAGUAUGAAACAGCGGUAAAAAUCUUUGUGCGACAGAUUAAAUUCAGCGUGUGGGCAGUGAAUGUAAUCUCUUGCGACUCCCCAGAUCGUAAAUUGGAUGGAGGAAUUCACUCCCUUCUACCGGCAGCCCUUUCAAGGACUAGCAACUCUGGGACUUCGUGCGCCGGUCAUCAUGGGGAUGGGGGUAGCUUUCGCAUUUUUGGGCUUUCCUCACGGAGUUCGAACCUAGUCGUCAAACCACGAGCGCUGCUACUAGAGAUAUCUUUUUUUCGCACGUGGCGUCAUCUCAUUCUGUCUGAUUGCUCUAUAAGCAUGACGGUAGAAGCAACCCCUAACAUAUUCCAUUAUCUCGAUGUAGUGAAGACGUGGUGUCGGUUUCAGGACCCCCCAUCCCUUCUGGUAAGCGAGCACGCCUCCGUGUAUGAGGCGAGUGUUCAAAUACAAUUGUUAAUACAGAAGGGAGUCCGGGCGAUACUAACCGUAGAUUUCGCCGUUCGGACAAAGGUGAACAAGCCAUUUGCAUGGAAGAGACCACCUUACCCGUAUGGCAUCAAGCCAGGCUGUAUGGACUUUCCGAUGAGCAUGGGCCUAUCUAUGGUUAGUUGUUGUUCUUUGCUCAUAAAAGGCAGUCUUACUUCCGCUAGGAUCCUUUUGUCACCCCUGGACUCAGUAAGAAUGCUAGAUAAUAGCAAUUGGAGCCGAUUAGCUCGGCUGUACAGUGUGAUCGAAAUCAACAUGAAAAUCGGAGUCUCGUCUCAGAGCUUUUUGGGCAGCCAUCCACAGGGCAGAAUUUAUUCACUCGGAUGCGCUACAGGCUCAGCAAAGGGUCGGACUUCCAAUGUAGAAGUCAGAGGUCCGGUUACUGAGCCACAGCGCACUCUUCACAUUGCAGGGGAGGAAGACUUUAUUUCUGUGUCGAGUUCCGUAGAUCCCUUAGUGGUUGCGACAAAUUAUCUUGUUUCGAGCAAGAUUCGUGGGGACAAAUGCGAUGGCUCGUCCAGAGCGUUCGAAGGUCAAUGGAUCGCUAUGUCUGCAUCGGCAGCCGUAUGCAAAAUCGCUGUACUAGUUCCCUGGGGGAUGGUACGCGAACGCUUCUAUGUCGCAUAUUACCUCGAAAAAGGGCAUGCAGUCGGUUUCCCUAAUAUUUGGCAACCUAACUCCCUCAUGGCAUCCCCGCCCGGUGCAUCAAGCGCCAUGAGUGUGUUAUUAAGUUGCCCACUAGUGAUUCAUUAUAAAGCACUCGAUAACGGUCCACACUACCGAUCUGACUAUGGUCGUUUACGAAAGCCUUGGUUGCAUAACAUAGAGGGCGGGAAGCCGUGUUGUCACAGUCGCGAUUAUGGUGGUUCAAGAAACUCUUUAUUAUCCAAAUACGCAAUAAGCAAUUGGCCGGACUCGAUCACCUACAUACAGGUAGAGAAGAAUUCAAGACCACAUUCUGCGUCAAACAGACAGAACCGCUUCUGGUAUCUCACCGUUUACGUGUACCGUCGACCGUCAGUGCCACUUGAGGUUACGGCCUUAAGCAGACUUGGUACACGUGGAAAUCUCCAUCUUGUCGGAGCGAGUAUACGCAGCGCACUAUCCAGUUACCUCCGUAAUCCUUGCGCGAGCGCGCAUUCCAUCCUUUACCCUCCCGGGGUCCCUAUAGCCCUACUUAAAGAUGCUCCCAUUGGCCUCUUUAUGUGGCACUGCAUGAUUGAGCCCUGCACGAGAAGCUUCACGCGUACGUACAGACCCCGUCCAUCCUACCAGGCUGCAUGCCCUGUCCUAGCGAUCGCCGUGAGUGGACCACCAGAGAGUGUUCUCAUUGCAGACAAUUAUGUGUUUGCUACCGUUAGUCCGCAAAGAUAUUCGUUGGUGACGGUGACGGGGGCUGAUCGUAUAACUUUGCGCUGCACACCCUCAUCUACUGUGUAUCUGCGAAAGUGGGCUUGGUGGACGGCAAGCUCUCCUCUGGGUGGUUUCAUGCGUCAACAGGGCUUUCGCCACGGCCUAAUAACUUGGGCUAUAAUUACGACAGGACGACUGUGCGCCGUAUGGACUACUGCGCUGAGAACGGGUGCCGGUGCAUCGGGAGAUCUUGAAGGAGAUGAGAUGAUGCAUAUCCACCGCAAACAACACUGUUGUUAUAGCGGCGCGAUUAGUUCCCCCCUGUCGUUGUGCAAAAGCAGGAGGCUGCUCCCGAUGCGAGUUGGGGAGCUUAGUUACGAGCCCUCUUUGCAUAAUGAAGUAAGAACCUCUGACUCUUCCACUUACGAACAUUCGCUUGUCGAGAUAAACUGGCUCGGAGACUUCUUUUUUAUCAAUCUGGACGUGCAGCGUGGUCAUGGGUCGAGUUAUCCACUUAGCCGCUUCCCUGAACUCAUUUUGAUCGCGCCGGUACGUCGAGCCUUUCAGUUCAACAACCAUGAUUGCUCCUCCUCCUCUAUACCGGUCUGCGACGUCUUCAAACUUAGUUAUGGGGUAAAAACACUGGUCGGCCUCCGGACUCACUGCCUGCCCCGUUUAGAUACUAUCUCUGCGGAUCCCGAAGCAGGUACAAUUUAUAAUACGAGUGGCUUUGGAAGGAUGGUCGAGCAAUUCUCACUAUUGGGUUCUAAUAGCGAAAGUACGGUAUUCACAAGAAUGGACUUAAGGUGGCCAUUGAGGGUUCCUCUGGAAAUUAAUUAUAUCAUUAGGUGCAAGAAGAUGAGUCAUCGACCAGUAGCAAGCAUUCUAAUGGGUGACGAUACAUCGGCCCCUGGGCUUUGGCUAAUGGGAAAAGGGAGGAGGAUUAAGGGUGCAGGAUCAAUCUCGCCUCCUCCGCACGUCUGGCGCCGGAAAAACUCUGCCGUCGGUGGGCAACCAUGCUCUUAUUUCUGGUAUGAGGCGUCGUCUGGUCGCGCGGUAAACCUCAGCAACCAGUGUGAACUAAUACGUAACGAUAUACAGCACGGUAUCGAGUCCCCUCACUACCGUUCCUACAGCGAUUAUGAAAACUUCCGACAGCGACAAACCCGAUUGUCCAAACUUAUGCCAUUAGUUUCUUUCAUCGAUGCGGCCGUAUGCUCCCUUAGUAUAUGCCCCCUAUUAUGGCGAACGAUUACACGGCGCAUCAUCCGAGGGACUAGCUGCAACAGGAAUCACAACGGCCGCGACAUACGUUCUCGCGAAAGUAUAAGGAUCCCAUGGGUGUUUUAUGGCGUGGUUACUCCCAUUAAUCAGCCACAUCGUGGUGUAUUCAUACCUUAUAAGUCCCAGGAGCUAUGCUUUAUGGAGAGGACUAGGCGGAGAUUCCCAAGUACGAGGACCGACGGAGGGUCCCUGCCAUGCAUCACUUAUCCUGGUUCUAACAUAACGAAUAGCAUUCUACAUAGAUACUCUAGUCAGAGACGUCUUCGCCAACUCGGUCCAGUGAAAGCUCUAUCAUUAGGGGUCUACGAUUGCCACCUAACAUCACCUAUGACCGUAGUGAGAGCACUGCGCGUGAAGGCUAAUGAGCAUUUCUUGUGGCAUUCCCUCGGAACAAGUAUAACGAUAGUGAACGUAUUGGAGGUGCAGAGGUUAGAUGUCGUAGCGCGGUACCUAGCCUGCUGGGUCACCCAGGUGAUCUGUGUCGGGGAGAUGAGGGACAAUAAGCCACAGGUCGGUGCUAGGCGUGCUGUCAUCCUAGAAUACGUAUUGCGUCGACCCACCACCUCACCACCCUACAACUGUCUACAUGAGGUACGCGUUUAUCGCUGUCUAUCCUGUAGCAUGGUAUUCUUGUAUAGGGGUCGUCUGCAAAGCGGUGGUGAGCUACAUCAUAACCUUCGGCCCGAGGGGAGGGGGAGGGCGAUCGCAGAGCAAGAUGAGUCACGGAAAUUUGGCCUACUUGGCCCAGUGAGUCCGCGCCUUCUGCUAACGGGAUGGAGAUCUGUUCCGAAAUGCAAAUCCGAGGCAUUGUCGAAACUUUAUAACACCCCAGACUCAGUGGGCACAGCACCUCAGUCCUAUCAAAGCGACAGGUGUAACUGUUGGGGUUUCGGCCAUGCUUGCCUGUUGUGCCCGCUACGCCCAAACGCACGGGCUCCCGCGGGGGGCUCUGCACAAGGAUCUUGUCGUGGCAAAGUAUACGACGUAGUGAAUUUGCAUGAUGCUCUAACUUUGCUGCAGCUCGUGAGUCUCACUCAUAACUACGGGAUAUUCUUUAGGAGUCUGGACCGAAUUGAAUGUCAUAGGCGGUGGUGGAACGGGAUUAGACGAAAUUAUAGGAGCCACCACGUUCAUUACGACUCUAGAGUCUGGAGCGAUGGACACGCGCGGCCUCGCUACAGUAAGCGUUGGGUACCGCAGGAUAAUGUUCCUGCUCUCCAUGCGCACGGCUCAGCGACUUCCGUGCAGCGAGCGCAGCUCAAAUUUAGGAAUUUUUUAAAUCGAGGUAACGGUGUGCCUGAACGAAUAUCACCAAGUUGUCAGUCGCGUCCAAAUCUCCCUAGGAUCUUUGGAGCGCUGUUAGUAGGAAAUCUCUUCGUAGGCCUCGAGCCUAGAGAACUCGAACCUUGCCCAGAGAGGAACCCAGACUCUAGGACAAUCUUGUUCGGUAUGGUAAACUGCUCGGUUAAAACCCAUUUAGUGCGCGCCCUGUUAUGUCCUGGAACGAGGUACUCAUCGCGUACAUCUAAGAUGGUUUCAGUCCUAUACUUGCUCACGCUCCCCGCGGACCAUCUUGGAUCUCCGCGAAAUACGAACCAGUAUUCCAGGGGUAGAUCUCCAACUGAAUCGCUUACGUGCAAAAGAAGCUUAACUCGGAUUCGGAGGCGUUUCAUUUCUCACUGGACCCCCCAUGCAUUAAAUUUUGUUUUUGAGCUCUUCGGUAUAAAGUUGACCCAGACAGCUCAAUAUCCCCGGCUUAUACUGCGAUCGACCUGGCCGUAUGUCCCCGCUUGCACGAAUCGCCACAGGGUAUACCCGGAGUGUUACGCUGAACAAGCUGCUGGAUGUGGUACCCGACGAUGCUGUUGGUGCCCUUUUCCCGCUCCGGUACAGCGUCGUGUUCCUACUACGACCGCGACAGGGGUGCUAUCUGCAUUGGAACCUAAUCCGCACAUGGAGACGGCUGGCACAGAUCUUAAUCAUCGUUCUGGCGUAUCUCCGCGGUUCAUCUCAUGGACGGAGACGGAAUCCGAAUCAUUAAGGGUUGGGGUGUACGCAACACGUGAUGGACUUUUGUGUGUUCGAGUAUUGAGGGGAAUUCAGGUAGGCAUCAUGCGUGUUUAUGUGACACCCAGGACUGCUGUUUCUAAACGGAGGAUCUCAAUGUGCACUUUUCAUAGACCAUCGCUCUUAUCCUGGGGAAGAACGUUUAACUUAAGUAUGGCAGCCAUGAUCUUUGUUUAUCCUCCUGGCGUGACGUCUGUAGGGUCACAUGAGAAUUCACCCCGCUCACUAUCGGCGAAAUGGUUCUUGCUCUGUGUUAAAGAGACAUACAAUGAAGGAACAUCUGGGCGUGACCGUACGAGAAUUAUAAUGAUAGGUUAUAGAUCUAGCAACAAUCAUUUUACCCCCUUCCAGACUAUGUCACUGGAGCAGACGUCCACACAGAAGUGGUGCGAGGUCAGUAACUCGAUUAGCCGCCGGGCAUGUACAAGUCCUUGUGCGACGCGGCUCGAGAGGAAUAAACAAAAUGCCAUCAUAGGUAUAUAUCUCAUGCGCCAGACAGCAUUAACCCGGUGUCCUUCACGUGAUUGCCCCAGCGCCCCCGCGUAUCUAUGUGUUACACACAAACCGAGAUCACCAUGCCAGUUGACACUUGGUUUACUUGCCUUGAGGUUAAGCGGCGAGUCUCUACGACUUUAUUUUGAAUCAAGCAUUCGGCCGCUGUCAACGACGCCCUACUUAGGCGGUUGCUCGCAAUCCGGCGUGGUUUCCAUUCGGCAUCUAUAUCGAGGUCCCGAACGGCCGCAGCUGAAUUACAUGCUAGGUGCUGUCUCUGGCCACAGUGUCACUUUUUAG"))

['M', 'D', 'V', 'P', 'D', 'T', 'W', 'Q', 'G', 'Y', 'T', 'Q', 'D', 'R', 'Y', 'L', 'R', 'P', 'H', 'V', 'T', 'P', 'N', 'M', 'L', 'P', 'T', 'T', 'P', 'S', 'A', 'C', 'V', 'G', 'Q', 'E', 'L', 'K', 'E', 'Y', 'S', 'I', 'K', 'P', 'S', 'E', 'R', 'R', 'R', 'G', 'R', 'V', 'M', 'T', 'V', 'R', 'R', 'G', 'G', 'R', 'K', 'M', 'S', 'Y', 'F', 'S', 'P', 'E', 'T', 'Y', 'M', 'P', 'F', 'W', 'V', 'Q', 'T', 'C', 'R', 'L', 'Y', 'L', 'L', 'S', 'A', 'V', 'I', 'S', 'L', 'R', 'P', 'A', 'F', 'G', 'T', 'S', 'G', 'N', 'D', 'A', 'H', 'R', 'D', 'S', 'F', 'I', 'R', 'T', 'P', 'S', 'F', 'S', 'N', 'P', 'L', 'M', 'G', 'E', 'E', 'T', 'D', 'D', 'R', 'S', 'S', 'Q', 'N', 'L', 'L', 'P', 'T', 'K', 'S', 'W', 'A', 'A', 'K', 'I', 'L', 'S', 'S', 'A', 'W', 'A', 'W', 'Q', 'A', 'T', 'T', 'L', 'R', 'P', 'H', 'R', 'Y', 'G', 'H', 'P', 'Q', 'Q', 'C', 'P', 'V', 'I', 'P', 'D', 'K', 'T', 'P', 'I', 'L', 'L', 'S', 'V', 'Q', 'L', 'R', 'E', 'V', 'L', 'R', 'I', 'L', 'V', 'K', 'G', 'T', 'V', 'E', 'P', 'R', 'E', 'I', 'L', 'F', 'R', 'M', 'A', 'Q', 'V',