Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which is the advantage to pre-use prokka to perform analysis using genbank (.gbk and gbff) files? #412

Closed
felipelira opened this issue Jul 27, 2018 · 3 comments

Comments

@felipelira
Copy link

May it is a silly question but it would not so efficient to include this steps if you are using a set of 600 genomes, for example. Ok, it is not a lot, but... Any statistics (just for curiosity).

@andrewjpage
Copy link
Member

You can use annotation from genbank (or RAST) if you wish,and there are instructions on the roary webpage. The important thing is that all annotation & ORF prediction is performed using the same method, otherwise you will just get lots of noise and false signals. GenBank is not ideal since the submitters of genomes can submit the annotation, hence you can get a big mixture of different annotation methods. RefSeq is much better because they use PGAP to ensure consistent annotation (some exceptions to watch out for).

@felipelira
Copy link
Author

For an accurate study, I prefer to use RefSeq .gbff genomes because they share the same annotation process. Thank you Andrew. I will try both files and methods.

@tseemann
Copy link
Contributor

tseemann commented Aug 6, 2018

@felipelira i think the issue with refseq .gff files is that they do not have the FASTA file appended to them, and sometimes the GFF "ID" does not match the FASTA "ID".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants