Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF files derived from Prokka genbank raise errors #130

Closed
mgalardini opened this issue May 28, 2015 · 8 comments
Closed

GFF files derived from Prokka genbank raise errors #130

mgalardini opened this issue May 28, 2015 · 8 comments
Labels

Comments

@mgalardini
Copy link
Contributor

Hi,

I'm using Roary with a bunch of bacterial genomes; some have been annotated with prokka, some others not. A genbank file is available for all of them. I've converted all the genbank files to gff3 using the bcbio gff writer (https://github.com/chapmanb/bcbb/tree/master/gff), which to the best of my knowledge produces valid GFF3 files.

When running using the prokka generated gff files the program runs smoothly; when running with the gff files derived from the genbank file, the program halts with the following error:

BLAST Database error: No alias or index file found for protein database [/home/user/workspace/Roary/bin/UcWJpjcOru/output_contigs] in search path [/home/user/workspace/Roary/bin::]

Some files are however still produced, like the gene_presence_absence.csv one, even though the genomes columns do not contain the locus_tag but either nothing or the EC_number (see below). A more detailed documentation on the expected GFF format (order of the anotations for instance) would maybe help?

Thanks a lot,
Marco

Example of annotation from prokka:

gnl|Prokka|GENOME02_contig000001   Prodigal:2.6    CDS     42      578     .       +       0       ID=GENOME02_00001;inference=ab initio prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical protein;protein_id=gnl|Prokka|GENOME02_00001

Example annotation from the gff file converted from the prokka genbank file:

GENOME02_contig000001      feature CDS     42      578     .       +       0       codon_start=1;inference=ab initio prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical protein;protein_id=Prokka:GENOME02_00001;transl_table=11;translation=MIAEIFQGGFVVFQQQFSKVHFEAATTHNAHHHDVGGFTAESEGRNLPAAQTQTFREVVQGVSRIFTIFQFEANRRDAFVRATRTDELIRPQFGDFIRQISGNLVRGVLYFGIAFTTEAQEFIVLCNYLTRRAGEVDGKSTNLTTQVVNVEHQFLRQRFFVTPDNPAAAQRSQTEFMA

gene_presence_absence.csv produced from the prokka gff files (6036 lines):

"group_4797","","hypothetical protein","3","3","1","","","","","","GENOME02_00001","GENOME03_01386","GENOME04_00768"

gene_presence_absence.csv produced from the gff files derived from the genbank (2472 lines):

"group_1","","","1","1","1","","","","","","","","EC_number=2.7.2.11"
@andrewjpage
Copy link
Member

Hi Marco,
Thanks for reporting the issue. I'm away until Monday so I'm afraid I won't
get a chance to look at it until then.
Regards,
Andrew
On 28 May 2015 11:14, "Marco Galardini" notifications@github.com wrote:

Hi,

I'm using Roary with a bunch of bacterial genomes; some have been
annotated with prokka, some others not. A genbank file is available for all
of them. I've converted all the genbank files to gff3 using the bcbio gff
writer (https://github.com/chapmanb/bcbb/tree/master/gff), which to the
best of my knowledge produces valid GFF3 files.

When running using the prokka generated gff files the program runs
smoothly; when running with the gff files derived from the genbank file,
the program halts with the following error:

BLAST Database error: No alias or index file found for protein database [/home/user/workspace/Roary/bin/UcWJpjcOru/output_contigs] in search path [/home/user/workspace/Roary/bin::]

Some files are however still produced, like the gene_presence_absence.csv
one, even though the genomes columns do not contain the locus_tag but
either nothing or the EC_number (see below). A more detailed documentation
on the expected GFF format (order of the anotations for instance) would
maybe help?

Thanks a lot,
Marco

Example of annotation from prokka:
gnl|Prokka|GENOME02_contig000001 Prodigal:2.6 CDS 42 578 . + 0
ID=GENOME02_00001;inference=ab initio
prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical
protein;protein_id=gnl|Prokka|GENOME02_00001

Example annotation from the gff file converted from the prokka genbank
file:
GENOME02_contig000001 feature CDS 42 578 . + 0 codon_start=1;inference=ab
initio
prediction:Prodigal:2.6;locus_tag=GENOME02_00001;product=hypothetical
protein;protein_id=Prokka:GENOME02_00001;transl_table=11;translation=MIAEIFQGGFVVFQQQFSKVHFEAATTHNAHHHDVGGFTAESEGRNLPAAQTQTFREVVQGVSRIFTIFQFEANRRDAFVRATRTDELIRPQFGDFIRQISGNLVRGVLYFGIAFTTEAQEFIVLCNYLTRRAGEVDGKSTNLTTQVVNVEHQFLRQRFFVTPDNPAAAQRSQTEFMA

gene_presence_absence.csv produced from the prokka gff files (6036 lines):
"group_4797","","hypothetical
protein","3","3","1","","","","","","GENOME02_00001","GENOME03_01386","GENOME04_00768"

gene_presence_absence.csv produced from the gff files derived from the
genbank (2472 lines):
"group_1","","","1","1","1","","","","","","","","EC_number=2.7.2.11"


Reply to this email directly or view it on GitHub
#130.

@mgalardini
Copy link
Contributor Author

Hi,

no problem, I can surely wait until next week; thanks a lot for the amazing work done on this piece of software.

Marco

@andrewjpage andrewjpage added the bug label Jun 1, 2015
@andrewjpage
Copy link
Member

Hi Marco,
I've just released a new version which I 'hope' will resolve the issue your having (2.3.0). Could you give it a whirl and let me know how you get along?
Andrew

@mgalardini
Copy link
Contributor Author

Hi Andrew,

absolutely, I'll get back to you tomorrow at the latest, thanks a lot!
Marco

On Mon, Jun 1, 2015 at 4:04 PM andrewjpage notifications@github.com wrote:

Hi Marco,
I've just released a new version which I 'hope' will resolve the issue
your having (2.3.0). Could you give it a whirl and let me know how you get
along?
Andrew


Reply to this email directly or view it on GitHub
#130 (comment)
.

@mgalardini
Copy link
Contributor Author

Hi,

unfortunately I've got a new error before the "usual" one:

2015/06/01 16:40:00 Could not extract any protein sequences from GENOME02.gbk.gff. Does the file contain the assembly as well as the annotation?
2015/06/01 16:40:03 Could not extract any protein sequences from GENOME03.gbk.gff. Does the file contain the assembly as well as the annotation?
2015/06/01 16:40:06 Could not extract any protein sequences from GENOME04.gbk.gff. Does the file contain the assembly as well as the annotation?
BLAST Database error: No alias or index file found for protein database [[...]/test_roary/87V8kzv_9H/output_contigs] in search path [[...]/test_roary::]

I can send you the gff files if you like; they all contain the nucleotide sequences as well.
Marco

@andrewjpage
Copy link
Member

That would be great thanks. The email is roary@sanger.ac.uk or
andrewjpage@gmail.com

On 1 June 2015 at 16:45, Marco Galardini notifications@github.com wrote:

Hi,

unfortunately I've got a new error before the "usual" one:

2015/06/01 16:40:00 Could not extract any protein sequences from GENOME02.gbk.gff. Does the file contain the assembly as well as the annotation?
2015/06/01 16:40:03 Could not extract any protein sequences from GENOME03.gbk.gff. Does the file contain the assembly as well as the annotation?
2015/06/01 16:40:06 Could not extract any protein sequences from GENOME04.gbk.gff. Does the file contain the assembly as well as the annotation?
BLAST Database error: No alias or index file found for protein database [[...]/test_roary/87V8kzv_9H/output_contigs] in search path [[...]/test_roary::]

I can send you the gff files if you like; they all contain the nucleotide
sequences as well.
Marco


Reply to this email directly or view it on GitHub
#130 (comment)
.

@andrewjpage
Copy link
Member

I've updated Roary (v2.3.1) to fall back to using locus_tags when an ID isnt available and the files sent are working now.

@mgalardini
Copy link
Contributor Author

Thanks, it works like a charm!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants