SARS-CoV-2, commonly known as COVID-19, is a respiratory virus in the coronavirus family.  It is the source of the worldwide pandemic that began in 2019, and to date has resulted in almost 2.5 million deaths.  Like all coronaviruses, COVID-19 consists of a single strand of RNA that is encapsulated in a protein shell called a capsid.  Immediately after COVID-19 was discovered, scientists worked around the clock to determine the genetic sequence of that RNA strand, as that genetic material codes for all of the proteins that ultimately allow the virus to attack its host's cells.  The first published RNA sequence for COVID-19 was published on January 10, 2020.

Analysis of the COVID-19 RNA sequence shows that it contains 11 sequences of base pairs (called open reading frames or ORF's) that code for a protein.  Some of the proteins produced by COVID-19 were already known to scientists, as they exist in other coronaviruses that have been studied, while others were specific to COVID-19, and contributed to its virulity.  The ORFs/proteins in COVID-19 are as follows:

1. <b>ORF1ab</b> - A 7096 amino acid polyprotein that gets chopped up into 16 smaller proteins by another protein (called a protease).[1]<p>

2. <b>S-protein</b> - The spike protein, a 1273 amino acid glycoprotein (which means that it is attached to a sugar molecule) that binds to a receptor (called ACE2) that is found on the outside of the host cell.  This binding is the first step in the intake of the virus by the host cell.[2]<p>

3. <b>ORF3a</b> - A 276 amino acid protein unique to the SARS-CoV family of coronaviruses whose function is still unknown, but based on the similarity of its sequence to other proteins, is believed to bind to ATP in some way.[3]<p>

4. <b>E-protein</b> - A 75 amino acid "envelope" protein that is essential to the structure of the COVID-19 viral particle.[4]<p>

5. <b>M-protein</b> - The most abundant protein in a COVID-19 virus, this 222 amino acid glyco-membrane protein (meaning it is embedded in the outer layer of the virus and has a sugar attached to it) is thought to mediate the process of endocytosis---the absorption of the virus by a host cell (essentially the host cell tries to bring the sugar in and the virus comes with it).[5]<p>

6. <b>ORF6</b> - A 61 amino acid protein that is secreted by the COVID-19 virus, and is believed to help suppress the host's immune response.[6]<p>

7. <b>ORF7a</b> - A 121 amino acid protein whose function is to help block the production of antibodies by human lymphocite cells.[7]<p>

8. <b>ORF7b</b> - A 40 amino acid protein that is part of the virus's structure, and is believed to help anchor the virus to the host cell's Golgi complex, where the virus uses the host cell's machinery to reproduce.[8]<p>

9. <b>ORF8</b> - A 121 amino acid protein that appears to be unique to COVID-19 (i.e. it's not present in other coronaviruses).  It is secreted by the COVID-19 virus, and is believed to help suppress immune response (similar to ORF6).  The antibodies that the presence of this protein triggers in humans are the principle target of most COVID-19 PCR tests (including all the swab tests you did in the fall!).[9]<p>

10. <b>N-protein</b> - A 419 amino acid protein that is responsible for interfering with the human body's natural antiviral defenses.[10]<p>

11. <b>ORF10</b> - Originally hypothesized to be a 38 amino acid protein, but since its sequence doesn't match any known protein and because this can be cleaved off the RNA strand and the  virus still functions normally, they no longer think this actually codes a protein.[11]<p>
    
Understanding the proteins produced by a virus and what they do is essential in fighting a virus.  Indeed, the two vaccines that have currently been approved (Moderna & Pfizer) both work by introducing the specific RNA sequence for the spike protein into humans, which causes human cells to produce the spike protein and trigger an immune response.  Since scientists use the sequence of amino acids in a protein to try and figure out what a protein does (by looking for similar sequences in proteins that have been found), reading the RNA sequence of a protein and finding the sequences of the proteins it codes is an indespensible step in understanding and combatting a new virus.

So some basics for those who haven't studied the biology of nucleic acids or proteins before:

* <b>RNA, or ribonucleic acid</b>, consists of a sequence of molecules, called bases, attached to a backbone chain of sugar and phosphate molecules.  There are four different base pairs in RNA--cysteine (C), guanadine (G), uracil (U), and adenine (A).  RNA is read by the machinery of a cell three base-pairs at a time, with each combination of three base pairs translating to a specific amino acid.  That grouping of three base pairs is called a <b>codon</b>. Although the sequence you will be working with for this assignment is RNA, it was submitted to the sequence database in DNA form, which means that every instance of uracil was replaced with its DNA counterpart, thymine (T).<p>

* <b>Open reading frames, or ORFs</b>, are portions of an RNA sequence that can be translated into a protein.  Each codon of an ORF translates to a specific amino acid, as seen in the following table:

![image.png](attachment:image.png)<p>

* <b>Proteins</b> are chains of molecules called amino acids that fold into specific shapes that give them specific functions.  The sequence of amino acids is ultimately responsible for the shape of of the protein (and hence its function) and the amino acid sequence is defined by the RNA that is used to produce it.  There are 20 amino acids that are commonly found in protein sequences, and each is referred to by a 3-letter description as seen in the table above.  For instance, the simplest amino acid, glycine, has the abbreviation Gly, and glycine can be coded for by four different codons (GGG, GGA, GGT, or GGC).

### The Project:

Your task for this project is to write a program that will read a sequence of RNA, find all the open reading frames, and translate those segments of RNA into proteins.  To do this, you will need to create three different classes of objects:

#### Class #1:  Sequence

A Sequence object is primarily used to store the RNA code to be translated.  It needs to have 7 attributes:

1. reference (a string holding citation where the RNA sequence was first published)
2. source (a string holding the website where the sequence was first accessed)
3. sub_date (a string holding the date when the sequence was first submitted to the National Center for Biotechnology Information)
4. acc_date (a string holding the date the sequence was downloaded)
5. na_seq (a string holding all the base pairs)
6. coding_type (a string indicating whether the sequence is coded as RNA or DNA... i.e. does it have U's or T's.)
7. orf_list (a list containing all the open reading frames found in the sequence).  

The first six of these items can be read from a text file containing the sequence being analyzed".  The last, orf_list, starts out empty and is filled in by a method of this class.

In addition to those 7 attributes, Sequence should have two methods:

1. A method called "basepairs()" to determine and return the number of base pairs present (essentially the number of characters in the sequence).
2. A method called "find_orfs()" that will scan through the sequence, find the open reading frames (ORFs), create an object for each ORF (see below), and store them all in orf_list.

Finding an ORF is fairly simple.  As mentioned above, RNA is read in three-base pair chunks called codons.  An ORF always starts with the same "start codon"---ATG in a DNA sequence or AUG in an RNA sequence.  An ORF ends with what is known as a "stop codon".  There are three stop codons---TGA, TAA, and TAG in a DNA sequence or UGA, UAA, and UAG in an RNA sequence. Your find_orfs() function should check at the start whether coding_type is "DNA" or "RNA", and if it is RNA, use replace() to change all the uracils to thymines (U's to T's).

Stated simply, what your find_orfs() method needs to do is read the sequence three characters at a time and look for a ATG.  Once it finds ATG, it saves that codon and all codons after it until it comes across a TGA, TAA, or TAG. Once your program sees one of those stop codons, it should create an ORF object (your next class) with the saved sequence (which should not include the stop codon), store it in orf_list, and start looking for another start codon.

#### Class #2:  ORF

The ORF object stores an open reading frame that was found by the find_orfs method of the class Sequence.  It needs to have three attributes, all of which should be created by find_orfs:

1. orf_num (integer to indicate which open reading frame this is in the sequence, so orf1ab would be #1, the s-protein ORF should be #2, etc.)
2. orf_bps (the total number of basepairs in the ORF)
3. bp_start (the number of the basepair in the ORIGINAL sequence where this ORF starts... first bp of the start codon)
4. bp_stop (the number of the basepair in the ORIGINAL sequence where this ORF stops... last bp of the stop codon)
5. orf_seq (the base-pair sequence of the ORF)

The ORF class should have one method called "transcribe()", which reads each codon in the ORF, translates it into an amino acid, and adds it to a protein sequence.  It the returns a Protein object created from the sequence.

#### Class #3:  Protein

The Protein object stores the sequence of a protein created by the transcribe() method of the ORF class.  It should have tthree attributes, all of which should be created by transcribe():

1. prot_num (integer to indicate which protein this is in the sequence... same as orf_num)
2. aa_count (the total number of amino acids in the protein)
3. aa_seq (the sequence of amino acids)

The Protein class should have two methods:

1. A method called "aa_summary()" which prints out the number of each of the 20 amino acids that are found in the protein and the total number of amino acids.
2. A method called "write_protein" which opens a file with name corresponding to prot_num and writes the amino acid sequence to the file.

#### Final Notes

In your project folder, you will find four files.  One is a file called "dummy_DNA_seq.txt", which has 58 base pairs and contains one ORF.  That ORF is 33 base pairs long, and translates into the 11 amino acid sequence: 

`MetLeuGlnThrProPheGluIleLysLeuAla`

There is also a file called "dummy_RNA_seq.txt" that is identical, except that it uses an RNA sequence (to test your ability to deal with both coding types).  There is a third test file called "dummy_2_prot_DNA_seq.txt" that has 101 base pairs, and contains two ORFs.  The first is the same as the one above, and the second produces an 8 amino acid sequence:

`MetArgAlaArgThrPheSerAsn`

The final file, "COVID-19 Genome.txt" is the full genome of the COVID-19 virus and codes for all the proteins listed above.  The assignment is really about getting all three classes up and working, but feel free to build in whatever other functionality you have time for, like:

* Asking the user what the filename is
* Code to create a sequence object based on the file contents
* Code to call the find_orfs() method
* Code to loop through all the ORF objects in orf_list and call the transcribe() method on them (and store them somewhere)
* Code to loop through all the protein objects and run the write_protein() method.

#### Good luck!

<font style="font-size:8px"> [1] Wu, P.C. et. al, <i>In silico analysis of ORF1ab in coronavirus HKU1 genome reveals a unique putative cleavage site of coronavirus HKU1 3C-like protease</i>, <u>Microbiology and Immunology</u>, 2005.  https://pubmed.ncbi.nlm.nih.gov/16237267/  Accessed 2/12/21.<br>
<font style="font-size:8px"> [2] Huang, Y. et. al, <i>Structural and functional properties of SARS-CoV-2 spike protein: potential antivirus drug development for COVID-19</i>, <u>Acta Pharmacologica Sinica</u>, 8/3/20.  https://www.nature.com/articles/s41401-020-0485-4  Accessed 2/12/21.<br>
<font style="font-size:8px"> [3] Zhang, X. and Yap, Y., <i>Putative structure and function of ORF3 in SARS coronavirus</i>, <u>Theochem</u>, 2/28/2005.  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7113666/  Accessed 2/12/21.<br>
<font style="font-size:8px"> [4] Schoeman, D. and Fielding, B., <i>Coronavirus envelope protein: current knowledge</i>, <u>Virology Journal</u>, 5/27/2019.  https://virologyj.biomedcentral.com/articles/10.1186/s12985-019-1182-0  Accessed 2/12/21.<br>
<font style="font-size:8px"> [5] Thomas, S., <i>The Structure of the Membrane Protein of SARS-CoV-2 Resembles the Sugar Transporter SemiSWEET</i>, <u>Pathogens and Immunity</u>, October 19, 2020.  https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7608487/  Accessed 2/12/21.<br>
<font style="font-size:8px"> [6] Miorin, L. et. al, <i>SARS-CoV-2 Orf6 hijacks Nup98 to block STAT nuclear import and antagonize interferon signaling</i>, <u>Proceedings of the National Academy of Sciences</u>, 11/10/20.  https://www.pnas.org/content/117/45/28344  Accessed 2/12/21.<br>
<font style="font-size:8px"> [7] Taylor, J. et. al, <i>Severe Acute Respiratory Syndrome Coronavirus ORF7a Inhibits Bone Marrow Stromal Antigen 2 Virion Tethering through a Novel Mechanism of Glycosylation Interference</i>, <u>Journal of Virology</u>, 9/4/2015.  https://jvi.asm.org/content/89/23/11820  Accessed 2/12/21.<br>
<font style="font-size:8px"> [8] Schraeder, S. et. al, <i>The ORF7b Protein of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) Is Expressed in Virus-Infected Cells and Incorporated into SARS-CoV Particles</i>, <u>Journal of Virology</u>, 10/16/2006.  https://jvi.asm.org/content/81/2/718/article-info  Accessed 2/12/21.<br>
<font style="font-size:8px"> [9] Flower, T. et al., <i>Structure of SARS-CoV-2 ORF8, a rapidly evolving immune evasion protein</i>, <u>Proceedings of the National Academy of Sciences</u>, 1/12/21.  https://www.pnas.org/content/118/2/e2021785118, Accessed 2/12/21.<br>
<font style="font-size:8px"> [10] Mu, J. et al.,<i>SARS-CoV-2 N protein antagonizes type I interferon signaling by suppressing phosphorylation and nuclear translocation of STAT1 and STAT2</i>, <u>Cell Discovery</u>, 9/15/20.  https://www.nature.com/articles/s41421-020-00208-3  Accessed 2/12/21.<br>
<font style="font-size:8px"> [11] Pancer, K. et al., <i>The SARS-CoV-2 ORF10 is not essential in vitro or in vivo in humans</i>, <u>PLOS Pathogens</u>, 12/10/20.   https://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1008959 Accessed 2/12/21.

### <font color = green> Really nice job with this project. I particularly like the addition of the "print attributes" methods that you created in all three classes.  Here is some feedback:
    
1. Don't loop through dictionaries!  You are turning that beautiful dictionary you created into a list by looping through it.


2. As I demonstrated in the classes in-class notebook, you should really comment out a class to make clear what all the attributes and methods do.  When you look up any specialty class that you get from an outside library, there is always documentation that does that.  It's just good practice for anyone who is going to follow your work.


3. Work on combining statements.  No need to take two lines (like 23 & 24 of the ORF class) to do what you can do in one!

`aa_seq.append(codon_to_aa(codon))`


4. Ascendance... just an interesting name for a protein object.  Wondering if there is a story behind that.


5. Nice use of .items() to print out your aa_summary dictionary!


6. I have to question the use of the while loop in your find_orfs() method.  While loops are for cases where you don't know when something is going to end, so you say to keep looping until a condition is met.  In this case, you know when it ends---it ends at the end of self.na_seq.  With a for loop you can specifically ask it to skip every 3rd letter, so that you don't have to do that in lines 56 and 58.  You also don't have to do the -2 to make it work.

In [2]:
#Converts a 3 base codon into an amino acid
def codon_to_aa(codon_amino):
    codon_dict = {'TTT':'Phe','TTC':'Phe','TTA':'Leu','TTG':'Leu','CTT':'Leu','CTC':'Leu','CTA':'Leu','CTG':'Leu','ATT':'Ile','ATC':'Ile','ATA':'Ile','ATG':'Met','GTT':'Val','GTC':'Val','GTA':'Val','GTG':'Val','TCT':'Ser','TCC':'Ser','TCA':'Ser','TCG':'Ser','CCT':'Pro','CCC':'Pro','CCA':'Pro','CCG':'Pro','ACT':'Thr','ACC':'Thr','ACA':'Thr','ACG':'Thr','GCT':'Ala','GCC':'Ala','GCA':'Ala','GCG':'Ala','TAT':'Tyr','TAC':'Tyr','TAA':'STOP','TAG':'STOP','CAT':'His','CAC':'His','CAA':'Gln','CAG':'Gln','AAT':'Asn','AAC':'Asn','AAA':'Lys','AAG':'Lys','GAT':'Asp','GAC':'Asp','GAA':'Glu','GAG':'Glu','TGT':'Cys','TGC':'Cys','TGA':'STOP','TGG':'Trp','CGT':'Arg','CGC':'Arg','CGA':'Arg','CGG':'Arg','AGT':'Ser','AGC':'Ser','AGA':'Arg','AGG':'Arg','GGT':'Gly','GGC':'Gly','GGA':'Gly','GGG':'Gly'}
    
    #match codon with amino acid in dictionary
    for i in codon_dict:
        if i == codon_amino:
            codon_amino = codon_dict[i]
    return codon_amino

In [4]:
#Open reading frame
class ORF:
    #initialization
    def __init__(self, orf_num, orf_bps, bp_start, bp_stop, orf_seq):
        self.orf_num = orf_num
        self.orf_bps = orf_bps
        self.bp_start = bp_start
        self.bp_stop = bp_stop
        self.orf_seq = orf_seq
    
    #print variables
    def print_attributes(self):
        print("Orf attributes:")
        print(f"The orf number is: {self.orf_num}")
        print(f"Basepairs: {self.orf_bps}")
        print(f"Starting at {self.bp_start}, ending at {self.bp_stop}\n")
    
    #Converts list of codons into a Protein object
    def transcribe(self):
        aa_seq = []
        
        for codon in self.orf_seq:
            amino = codon_to_aa(codon)#uses codon-amino dictionary to convert
            aa_seq.append(amino)
        
        ascendence = Protein(self.orf_num,len(aa_seq),aa_seq) #protein object 
        return ascendence

In [5]:
class Protein:
    #initialization
    def __init__(self, prot_num, aa_count, aa_seq):
        self.prot_num = prot_num
        self.aa_count = aa_count
        self.aa_seq = aa_seq
    #print variables
    def print_attributes(self):
        print(self.prot_num, self.aa_count, self.aa_seq)
    def print_aa(self):
        print(f"The amino acids sequence for this protein is {self.aa_seq}")
    
    #gives summary of how many of each aa in protein
    def aa_summary(self):
        sum_dict = {'Phe': 0, 'Leu': 0, 'Ile': 0, 'Met': 0, 'Val': 0, 'Ser': 0, 'Pro': 0, 'Thr': 0, 'Ala': 0, 'Tyr': 0, 'His': 0, 'Gln': 0, 'Asn': 0, 'Lys': 0, 'Asp': 0, 'Glu': 0, 'Cys': 0, 'Trp': 0, 'Arg': 0, 'Gly': 0}
        for aa in self.aa_seq:
            for prot in sum_dict:
                if aa == prot:
                    sum_dict[prot] += 1
        
        print(f"This protein has {self.aa_count} amino acids.")
        for key, value in sum_dict.items():
            print(f"{key:>30}:{value}")
    
    #writes protein into a file
    def write_protein(self):
        f = open(str(self.prot_num),"w")
        f.write(str(self.aa_seq))
        f.close()

In [14]:
class Sequence:
    def __init__(self, reference, source, sub_date, acc_date, coding_type, na_seq):
        self.reference = reference
        self.source = source
        self.sub_date = sub_date
        self.acc_date = acc_date
        self.coding_type = coding_type
        self.na_seq = na_seq
        self.orf_list = []
    
    #Number of basepairs in a sequence
    def basepairs(self):
        return len(self.na_seq)
    
    def print_attributes(self):
        print(self.reference,self.source,self.sub_date,self.acc_date,self.coding_type,self.na_seq,self.orf_list)
    
    #Finds the different orfs in the sequence, converts into ORF object, and stores in a list
    def find_orfs(self):     
        print(self.coding_type)
        if self.coding_type == "RNA": #Turn RNA into DNA
            self.na_seq = self.na_seq.replace("U","T")
        
        #variables for the ORF object
        orf_num = 0
        orf_bps = 0
        bp_start = 0
        bp_stop = 0
        orf_seq = []
        
        started = False
        i = 0
        #while loop goes through whole sequence and pulls out Orfs.
        while i < len(self.na_seq)-2:
            triple_base = self.na_seq[i:i+3]
            
            if started == False and triple_base == "ATG": #starting the orf reading
                started = True
                bp_start = i + 1
            elif started == True and (triple_base == "TGA" or triple_base == "TAA" or triple_base == "TAG"): #ending the orf
                started = False
                
                #calculating orf variables
                orf_num += 1
                bp_stop = i + 3
                orf_bps = bp_stop - bp_start + 1
                
                an_orf = ORF(orf_num,orf_bps,bp_start,bp_stop,orf_seq)#create orf object
                self.orf_list.append(an_orf)
                
                orf_seq = []#renewing the sequence list for the next orf
            
            #different increments depending on if currently reading an orf
            if started:
                orf_seq.append(triple_base)
                i += 3 #skip next 2 iterations because I stored this codon(3 base pairs)
            else:
                i += 3

In [15]:
#Main code
#reads sequence file
f = open("COVID-19 Genome.txt","r")
input = [f.readline().strip() for i in range(6)]
a,b,c,d,e,f = input
seq = Sequence(a,b,c,d,e,f)

print(seq.basepairs())
seq.find_orfs()

print(f"{len(seq.orf_list)} total proteins in orf_list\n")

for i in seq.orf_list:
    i.print_attributes()
    #i.transcribe().print_aa()


seq.orf_list[1].transcribe().write_protein()

29902
DNA
11 total proteins in orf_list

Orf attributes:
The orf number is: 1
Basepairs: 21291
Starting at 265, ending at 21555

Orf attributes:
The orf number is: 2
Basepairs: 3822
Starting at 21562, ending at 25383

Orf attributes:
The orf number is: 3
Basepairs: 831
Starting at 25390, ending at 26220

Orf attributes:
The orf number is: 4
Basepairs: 228
Starting at 26245, ending at 26472

Orf attributes:
The orf number is: 5
Basepairs: 669
Starting at 26521, ending at 27189

Orf attributes:
The orf number is: 6
Basepairs: 186
Starting at 27202, ending at 27387

Orf attributes:
The orf number is: 7
Basepairs: 366
Starting at 27394, ending at 27759

Orf attributes:
The orf number is: 8
Basepairs: 123
Starting at 27766, ending at 27888

Orf attributes:
The orf number is: 9
Basepairs: 366
Starting at 27892, ending at 28257

Orf attributes:
The orf number is: 10
Basepairs: 1260
Starting at 28273, ending at 29532

Orf attributes:
The orf number is: 11
Basepairs: 117
Starting at 29557, endi

In [25]:
#13 starting and ending
print(seq.na_seq[29804:29807])
print(seq.na_seq[29840:29843])

ATG
TGA


In [8]:
#11 Ending
print(seq.na_seq[29566:29569])

TAA


# Experiments

In [5]:
my_list = [1,2,3]
a,b,c = my_list
print(b)

2


In [73]:
a_string = "hello"
sub_string = a_string[1:3]
print(sub_string)

el


In [77]:
a_str = "abcdefg"
a_str.replace("c","lol")

'abloldefg'

In [164]:
dict = {"hello":"123"}
for i in dict:
    i.replace("h","o")
print(dict)

{'hello': '123'}


In [204]:
a_string = "AAAAAAUGUUUAUGUGAUUGCAGACACCUUUUGAAAUAAUAAAUUGGCAUAAAAAAAAAA"
a_string = a_string.replace("U","T")
print(a_string)

AAAAAATGTTTATGTGATTGCAGACACCTTTTGAAATAATAAATTGGCATAAAAAAAAAA


In [268]:
a_dict = {'TTT':'Phe','TTC':'Phe','TTA':'Leu','TTG':'Leu','CTT':'Leu','CTC':'Leu','CTA':'Leu','CTG':'Leu','ATT':'Ile','ATC':'Ile','ATA':'Ile','ATG':'Met','GTT':'Val','GTC':'Val','GTA':'Val','GTG':'Val','TCT':'Ser','TCC':'Ser','TCA':'Ser','TCG':'Ser','CCT':'Pro','CCC':'Pro','CCA':'Pro','CCG':'Pro','ACT':'Thr','ACC':'Thr','ACA':'Thr','ACG':'Thr','GCT':'Ala','GCC':'Ala','GCA':'Ala','GCG':'Ala','TAT':'Tyr','TAC':'Tyr','TAA':'STOP','TAG':'STOP','CAT':'His','CAC':'His','CAA':'Gln','CAG':'Gln','AAT':'Asn','AAC':'Asn','AAA':'Lys','AAG':'Lys','GAT':'Asp','GAC':'Asp','GAA':'Glu','GAG':'Glu','TGT':'Cys','TGC':'Cys','TGA':'STOP','TGG':'Trp','CGT':'Arg','CGC':'Arg','CGA':'Arg','CGG':'Arg','AGT':'Ser','AGC':'Ser','AGA':'Arg','AGG':'Arg','GGT':'Gly','GGC':'Gly','GGA':'Gly','GGG':'Gly'}
new_dict = {}

for value in a_dict.values():
    new_dict[value] = 0

print(new_dict)

{'Phe': 0, 'Leu': 0, 'Ile': 0, 'Met': 0, 'Val': 0, 'Ser': 0, 'Pro': 0, 'Thr': 0, 'Ala': 0, 'Tyr': 0, 'STOP': 0, 'His': 0, 'Gln': 0, 'Asn': 0, 'Lys': 0, 'Asp': 0, 'Glu': 0, 'Cys': 0, 'Trp': 0, 'Arg': 0, 'Gly': 0}
