Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix_start_stop does not resolve all issues #197

Open
emmannaemeka opened this issue Sep 1, 2020 · 2 comments
Open

Fix_start_stop does not resolve all issues #197

emmannaemeka opened this issue Sep 1, 2020 · 2 comments

Comments

@emmannaemeka
Copy link

I ran the Fix_start_stop on my genome but it does not resolve all

Total sequence length 496122366
Number of genes 19350
Number of mRNAs 19350
Number of exons 119452
Number of introns 100102
Number of CDS 18371
Overlapping genes 384
Contained genes 61
CDS: complete 22
CDS: start, no stop 340
CDS: stop, no start 1205
CDS: no stop, no start 17783
Total gene length 82068175
Total mRNA length 82068175
Total exon length 25049228
Total intron length 57219151
Total CDS length 20883708
Shortest gene 17
Shortest mRNA 17
Shortest exon 1
Shortest intron 4
Shortest CDS 15
Longest gene 207760
Longest mRNA 207760
Longest exon 6759
Longest intron 199438
Longest CDS 10206
mean gene length 4241
mean mRNA length 4241
mean exon length 210
mean intron length 572
mean CDS length 1137
% of genome covered by genes 16.5
% of genome covered by CDS 4.2
mean mRNAs per gene 1
mean exons per mRNA 6
mean introns per mRNA 5

What could be the problem?

@Neato-Nick
Copy link

Neato-Nick commented Jul 1, 2021

I'm sure it's much too late to be of help, but posting for others who find this issue. I've been preferring AGAT for most GFF processing, it is very actively maintained. agat_sp_fix_CDS_phases.pl will adjust the CDS phase based on errors from intron adjustment, and agat_sp_fix_start_and_stop_codons.pl has a nice output for how many starts/stops can be added. You will have to manually look at genes with no start/stops, otherwise NCBI/tbl2asn will mark the genes as partial products. That's okay too, and common for genes called on contig ends.

@davidjstudholme
Copy link

Thank you, @Neato-Nick. That's a really helpful suggestion and I will try it next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants