-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for GFF files from NCBI #120
Comments
NCBI GFF files don't include the .FASTA at the end, also the seqid (column 1) is just the NC_ number and not the full gi||ref|| ID that is in the FASTA. |
I rewrote Bio-RetrieveAssemblies (cpanm Bio::RetrieveAssemblies) so that it will download WGS and RefSeq assemblies and convert them to GFF (including the FASTA file at the end) which should make it easier to get existing data from NCBI. It filters out dodgy stuff using RefWeak. Theres a tweak or two I need to make to Roary first, but I hope in a day or so that you will be able to create a pan genome of all S. typhi by doing: retrieve_assemblies -o my_files -f gff typhi |
So Roary now works grand with WGS and RefSeq assemblies now. Its not everything in GenBank but it provides a vast quantity of data for people to play around with. retrieve_assemblies -a -f gff typhi http://sanger-pathogens.github.io/Roary/index.html#genbank_files |
Hello |
GFF3 files from NCBI usually only contain the annotation and not the nucleotide sequence. You'll need to make sure you get a file with the annotation and assembly in the same file. You can either do this by appending the assembly to the end of the GFF file (with ##FASTA as the delimiter) or you can convert from the gbff file (Genbank file with annotation and assembly) to a GFF file. Or you could use the script above. |
Hi @andrewjpage I tried your suggesting about appending but I still got an error message below is one such message. |
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.gff.gz
The text was updated successfully, but these errors were encountered: