<a href="https://colab.research.google.com/github/leventdusunceli/Bioinformatic-Functions/blob/main/ofr_openreadingframe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Open Reading Frames 

Finding orfs 

Steps:


1.   Define DNA codon table (helps you bypass mRNA transcription step) 
2.   Create reverse complement of given string (reverse complement because transcription 5' to 3') 
3. Define fragment finder with regex 
4. Define fragment translator 
5. Perform fragment finder on string and complementary strand 
6. Perform translation on fragments  



In [1]:
#Step 1: DNA Codon Table 

dna_codon_table = {
    "TTT":"F", "CTT":"L", "ATT":"I", "GTT":"V",
    "TTC":"F", "CTC":"L", "ATC":"I", "GTC":"V",
    "TTA":"L", "CTA":"L", "ATA":"I", "GTA":"V",
    "TTG":"L", "CTG":"L", "ATG":"M", "GTG":"V",
    "TCT":"S", "CCT":"P", "ACT":"T", "GCT":"A",
    "TCC":"S", "CCC":"P", "ACC":"T", "GCC":"A",
    "TCA":"S", "CCA":"P", "ACA":"T", "GCA":"A",
    "TCG":"S", "CCG":"P", "ACG":"T", "GCG":"A",
    "TAT":"Y", "CAT":"H", "AAT":"N", "GAT":"D",
    "TAC":"Y", "CAC":"H", "AAC":"N", "GAC":"D",
    "TAA":"STOP", "CAA":"Q", "AAA":"K", "GAA":"E",
    "TAG":"STOP", "CAG":"Q", "AAG":"K", "GAG":"E",
    "TGT":"C", "CGT":"R", "AGT":"S", "GGT":"G",
    "TGC":"C", "CGC":"R", "AGC":"S", "GGC":"G",
    "TGA":"STOP", "CGA":"R", "AGA":"R", "GGA":"G",
    "TGG":"W", "CGG":"R", "AGG":"R", "GGG":"G"
}

In [2]:
#Step 2: Reverse Complementer 

def reverse_complementer(DNA):

    aa_dict = {"G":"C", "C":"G","T":"A","A":"T"}

    reversed_strand= DNA[::-1]    #this line reverses the given DNA Sequence
    
    rev_comp_strand = ""

    for char in reversed_strand:  #this line cretes complementary strand by replacing bases
      rev_comp_strand += aa_dict[char]

    return rev_comp_strand 

In [3]:
#Step 3: Fragment Finder Function 

#will also find fragments for reverse complementary strand 

def fragment_finder(dna_string):

  import re 
  fragments = []
  regex = re.compile(r'(?=(ATG(?:...)*?)(?=TAA|TAG|TGA))') 

  for fragment in re.findall(regex, dna_string):
    fragments.append(fragment)
  
  for fragment in re.findall(regex, reverse_complementer(dna_string)):
    fragments.append(fragment)

  return fragments 

Explanation of the regular expression (regex) 




```
fragment = re.compile(r'(?=(ATG(?:...)*?)(?=TAA|TAG|TGA))') 
```
* ```re.compile()``` --> compile regex into a fragment that can be used in re.search or re.findall methods 
* ```(r' ``` --> pass through backslashes w/out any change
* ```(?=(ATG(?...)*?``` --> find sequence starting with ATG [```(?=(ATG```], followed by zero or more codons [```(?...)*?```]
* ```(?=TAA|TAG|TGA)``` --> and ends with stop codons but only if it starts with the regex defined above. Also don't include stop codons in the final sequence returned


In [4]:
#Step 4: Fragment Translator

#This function is for a single fragment
#In the final implementation we'll iterate through the list of fragments created with fragment_finder() function   

def translator(fragment):
  protein_seq = []

  codons = [fragment[i:i+3] for i in range(0, len(fragment),3)] #returns codons 

  for codon in codons: #translates each codon into corresponding amino acid
    protein_seq += dna_codon_table[codon]

  return "".join(protein_seq) #join all aminoacids into one list and return

In [8]:
#Step 5 & 6 : Implementation of code 

string = "AGGATATTAGCTCCCAAGTAGGCAGACCTGTCGATCCCTGAAAGAGAGTCGCATCCTCTCGGTTCGCCTAAGCTCGAGTGTCCTAGGTTTGGCATAGGGACAGTCCAAGTACGCGGAGGTTATGCTTCGGACCGGCTCCACCCGATCGTCGGTGTGTTATCCACCCGTTTACTGGAGTCGTCCCAGCGAAATGCACCTAGTAGAGTTAATTAATGCATCTAGGACACAAGTATGCGCAGTCCGCGCTCTCATATGCTAGTGACAGTTTCTAAGAGCGCCCGAGTGTAAACGTAGGCAACACCACCTACTGTATTGCAGCATCCTCGGGCGATCTGATTGGTCTCCAGCAAAACGCATTCTCGATCGTTGGACTAAGCACACAGCGTTAGTACAAAACGCTACATGCCGGCTTCGAGGCCCAAGACGCAACTGGTGTCTTGTCCGCTGTGCCACACAGAAACATGAGTAACTGCGTGATTGAGGTTCCATAGCTATGGAACCTCAATCACGCAGTTACTCATCACATCCGATGATAATTAGCTGTCGACTCCGCGAAGTATTGTTTAGTTCAGTGCGTAGGTGCCAGATGATTGGTTCTGACTGCCTTCTTACCGACGCCGCGAATTATTGATAGGGCAGTCTCTGTCAACGGCCCCGTCTAGAACGGCGCCTATTTAGGTCCCGCTCTGTTCGATATTAAGAACCGGTCCATCGCTATGGGTGCTTTACGCACACGCCGTACTTATCAGCATTGCGTTTGCTACCTTGTCAGAATTAGATTAAGACAAAGTGATCCCTCGCTCGTCTTGGTAGGCCAGCAGCGGGCGTCCTCATCAGATAGGCTGTGGGCGGCTTTGCGGTCTAGGGTACACTTTAGAAGGGGGCGCGGACCGTCCTCTTAGCCTTCCCAGTTTAGTTCGCGCCCAGTTTAGGAAACCCTAGTTACTTATTGGAGGCGGGGGGCACACGCCGGTTGAGCGTC"

fragments1 = fragment_finder(string)

proteins =[]
for i in set(fragments1):
  print(translator(i))

MRTPAAGLPRRARDHFVLI
MHLGHKYAQSALSYASDSF
MLQYSRWCCLRLHSGALRNCH
MRFAGDQSDRPRMLQYSRWCCLRLHSGALRNCH
MIGSDCLLTDAANY
M
MPASRPKTQLVSCPLCHTET
MRLSFRDRQVCLLGS
MLISTACA
MH
MEPQSRSYSCFCVAQRTRHQLRLGPRSRHVAFCTNAVCLVQRSRMRFAGDQSDRPRMLQYSRWCCLRLHSGALRNCH
MEPQSRSYSSHPMIISCRLREVLFSSVRRCQMIGSDCLLTDAANY
MSNCVIEVP
MFLCGTADKTPVASWASKPACSVLY
MLVTVSKSARV
MRSPRSHMLVTVSKSARV
MLRTGSTRSSVCYPPVYWSRPSEMHLVELINASRTQVCAVRALIC
MHLVELINASRTQVCAVRALIC
MIISCRLREVLFSSVRRCQMIGSDCLLTDAANY
MDRFLISNRAGPK
MRARTAHTCVLDALINSTRCISLGRLQ
MGALRTRRTYQHCVCYLVRIRLRQSDPSLVLVGQQRASSSDRLWAALRSRVHFRRGRGPSS
