-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completed gene finder #1
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks really good in general! I like your clean syntax practices and the code is really well documented with appropriate docstings. I added a few comments that are mostly stylistic guides.
|
||
""" | ||
|
||
import random | ||
from amino_acids import aa, codons, aa_table # you may find these useful | ||
from load import load_seq | ||
dna = load_seq("./data/X73525.fa") | ||
import doctest | ||
from pickle import dump, load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow pickle toolbox! great job
|
||
""" | ||
|
||
import random | ||
from amino_acids import aa, codons, aa_table # you may find these useful | ||
from load import load_seq | ||
dna = load_seq("./data/X73525.fa") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you usually want to separate any function calls from these import statements. It would make more sense to put it under if name == "main":
elif nucleotide == 'T': | ||
return 'A' | ||
|
||
#doctest.run_docstring_examples(get_complement, globals(), verbose=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove comments in your final upload
I added two more doctests because each complementary nucleotide is in | ||
its own if/else if branch. If one branch doesn't work and there is no | ||
doctest for that branch, the mistake will not be caught. In fact, I | ||
found out that I spelled nucleotide wrong in the G branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great practice that you are adding your own tests
|
||
return comp_str | ||
|
||
#doctest.run_docstring_examples(get_reverse_complement, globals(), verbose=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another comment to remove
for nuc in range(0, len(reversed_dna)): | ||
comp_nuc = get_complement(reversed_dna[nuc]) | ||
comp_list.append(comp_nuc) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you wanna get fancy with just one line of code :
return ''.join([get_complement(reversed_dna[nuc]) for nuc in dna[::-1]])
I added two more doctests: one in which the frame stop codon is TAA and | ||
another in which there is no frame stop codon. This way, all the | ||
branches are tested for error. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
||
return dna | ||
|
||
#doctest.run_docstring_examples(rest_of_ORF, globals(), verbose=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another comment to remove
frame1 = find_all_ORFs_oneframe(dna) | ||
frame2 = find_all_ORFs_oneframe(dna[1:]) | ||
frame3 = find_all_ORFs_oneframe(dna[2:]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also use a for loop here
[find_all_ORFs_oneframe(dna[i:]) for i in range 3]
humongest = longest_ORF(dna_shuffles[i+1]) | ||
|
||
return len(humongest) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be more memory efficient to just save the max_length of longest_ORF than to save the actual longest_ORF
I finally understood the purpose of if name == "main": and moved the gene_finder(dna) portion under that branch.