-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
translate.seqs #788
Comments
@pschloss, for the translation to dna from amino acids, how do we choose between the multiple options in the compressed column? For example how is L translated?
AminoAcid DNA_codons Compressed |
For DNA to amino acids if it runs into CTT (or CTC,CTA,CTG, TTA,TTG) then it is replaced with an L. For amino acids to DNA... Hmm. I'm not sure what to do there. I think the main use of amino acid to DNA would be to align a DNA sequence the user already has. So if the amino acid sequence had an L, then we would look for CTT,CTC,CTA,CTG,TTA, or TTG in the DNA sequence to be aligned. Does that make sense? Basically - I don't think we'd ever have a DNA sequence without an amino acid sequence, but we will likely have a DNA sequence without amino acid sequence. |
Hmm... so should we scrap this bullet point?
Because of degeneracies there will be non-ATGC IUPAC codes in the output sequence |
Yeah that's probably best for `translate.seqs`
…On Mon, Nov 15, 2021 at 1:25 PM Sarah Westcott ***@***.***> wrote:
Hmm... so should we scrap this bullet point?
- Amino acid sequences and translate it to a DNA sequence
Because of degeneracies there will be non-ATGC IUPAC codes in the output
sequence
Output as *.dna.fasta
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#788 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJUUBBBDCZDMPDAIBSVE6LUMFGCPANCNFSM5C3YMIKQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@pschloss, could you explain this part a bit more and correct any bad assumptions on my part?
Trivial example:
becomes
Perfect matches in reference database could include: AAATGC---TGCAAA, AAGTGC---TGCAAA, AAATGT---TGCAAA, ... , AAGTGT---TGTAAG
|
For this option, the user would provide a DNA sequence and an amino acid sequence. One of the two would be aligned. For example, the user could provide an aligned amino acid sequence and an unaligned DNA sequence. They'd want the DNA sequence to come out aligned...
Output would then be...
If instead they provided...
then the output would be...
Does that make sense? I don't think there's a need here for a compressed DNA alphabet. |
That makes sense, thanks for clarifying. No reference file, a pair of files, like the fasta/qfile pairings. The command would look like: mothur > translate.seqs(fasta=alignedOrUnalignedDNA, amino=alignedOrUnalignedAminoAcids) assumes sequences are in the same order in both files |
This function would take...
DNA sequences and translate it to an amino acid sequence
Amino acid sequences and translate it to a DNA sequence
Unaligned DNA and unaligned/aligned Amino acid sequences
The text was updated successfully, but these errors were encountered: