Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gffread handle Mitochondrion gff3 has bugs? #21

Closed
GuoqiLiu opened this issue Apr 8, 2018 · 6 comments
Closed

gffread handle Mitochondrion gff3 has bugs? #21

GuoqiLiu opened this issue Apr 8, 2018 · 6 comments

Comments

@GuoqiLiu
Copy link

GuoqiLiu commented Apr 8, 2018

hello,
gffread handle animal Mitochondrion genomes has bugs? As follows:

gene4 gene=ATP8
MPQLDTST.LTTILSIFLALFIIFQLKISKYDFYHNPELTAKILKHNTP.ETK.TKIYLPLLLPL.
There are breakpoint in the middle . What is gffread support transl_table ?
Thanks !

@gpertea
Copy link
Owner

gpertea commented Apr 8, 2018

Indeed at this time gffread only supports the common, standard code for translations (transl_table 1).
As a workaround you could probably use the -x option instead, to output the CDSs and pipe the output into a mitochondrial-aware FASTA DNA-to-aminoacid translator, for example my seqmanip script (found in https://github.com/gpertea/gscripts, direct link to the raw file: https://raw.githubusercontent.com/gpertea/gscripts/master/seqmanip, it doesn't have any other dependencies). With seqmanip it would go like this:

gffread transcripts.gff -x- | seqmanip -T -B > proteins.fa 

(seqmanip has the -B option to enforce the mitochondrial translation table.)

@GuoqiLiu
Copy link
Author

GuoqiLiu commented Apr 9, 2018

OK,I Know ! And whether I can use gffread to handle Bacteria genomes download from NCBI ?

@gpertea
Copy link
Owner

gpertea commented Apr 9, 2018

Not sure if I understand the question, if it's about supporting the non-standard translation tables for bacterial genomes (other transl_table values), I already mentioned that gffread does not (and neither does seqmanip).

If you are asking more generally about gffread handling annotation GTF/GFF3 data for bacterial genomes.. I haven't tested this but I suppose the answer is yes, it should work just the same (except for the translation issues that you already noticed). Not sure how the NCBI annotation looks like, for bacterial genomes -- if they always define genes with transcripts and/or exon/CDS features then it should work just the same. But if they use only "gene" features with no other sub-features, then one could use the new --gene2exon option of gffread which was specifically added to deal with some minimalistic GFF3 files which only have "gene" feature lines for single-transcript, single-exon genes.
But only use that option if needed.

@GuoqiLiu
Copy link
Author

Thanks for your reply ,
Your understand is right ! I have used gffread to handle Bacteria genomes download from NCBI that can't come out error , so gffread can handle Bacteria.

@Maarten-vd-Sande
Copy link
Contributor

Just dropping in to say that supporting multiple tables would be an awesome feature (for me). I haven't found any other tool that works as easy and fast as gffread.

Would you support a PR in this direction, if so, can you give some pointers? It's been over 5 years since I did anything serious with C++ so it might not result into anything useful though 😇 .. I can't seem to find where the table is defined..

@gpertea
Copy link
Owner

gpertea commented Feb 11, 2021

Sure, PRs are always welcome! As you can see, the actual codon table is in gclib/codons.cpp
I don't know where I got that code in gclib/codons.cpp from, seems unnecessary to waste 32K for a simple codon table like that (just to maximize the translation performance..). There are likely better ways to implement (multiple) codon tables..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants