Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF parsing doesn't match GFF3 specification #249

Closed
widdowquinn opened this issue May 21, 2016 · 0 comments
Closed

GFF parsing doesn't match GFF3 specification #249

widdowquinn opened this issue May 21, 2016 · 0 comments
Labels

Comments

@widdowquinn
Copy link

widdowquinn commented May 21, 2016

For correct operation, the Roary GFF parser appears to require the ##FASTA directive to indicate a trailing FASTA sequence for the input genome, but the GFF3 specification ("Other Syntax") allows this to be implicit:

"For backward-compatibility with the GFF version output by the Artemis tool, a GFF line that begins with the character > creates an implied ##FASTA directive."

Without the ##FASTA directive, when Roary (invoked with roary input_data/*.gff -v) fixes an input GFF file, the produced output in fixed_input_files looks like this:

[...]
NZ_CM001513.1   RefSeq  gene    6178796 6179191 .   +   .   ID=gene5445___11243;Name=PFLSS101_RS27240;gbkey=Gene;gene_biotype=pseudogene;locus_tag=PFLSS101_RS27240;pseudo=true
NZ_CM001513.1   Protein Homology    CDS 6178796 6179191 .   +   0   ID=cds5355___11244;Parent=gene5445;gbkey=CDS;product=hypothetical protein;pseudo=true;transl_except=(pos:6179072..6179074%2Caa:Other);transl_table=11
###
>NZ_CM001513.1 Pseudomonas fluorescens SS101 chromosome, whole genome shotgun sequence
                                ID=id___11245CAGAGAATGACAACACGGGGGAGAAGCGGTGCTCTGCCTGTGCTGCGCTATCGGATAAGCTGTGTGTGGAACAGCTAGTT
                                ID=id___11246ATCCACAGGTCGGTTACCCACAGACTTTCGACCCCACTTGTACAATGAGCTTAGGTGGGTTTATCCACAGAGCTTATGCA
                                ID=id___11247CAGACCACTGGTCGCATTTTTTAGGGTTAAGGCATTGATTTTGGGTGTCCTGTGAGCAACCTACATGTGGATAAGTGGAC
[...]

With the ##FASTA directive, it looks like this:

NZ_CP010945.1   tRNAscan-SE exon    6997911 6997987 .   -   .   ID=id190___24142;Parent=rna90;anticodon=(pos:complement(6997951..6997953));gbkey=tRNA;product=tRNA-Ile
NZ_CP010945.1   RefSeq  sequence_feature    6998060 6998154 .   -   .   ID=id191___24143;Note=16S ribosomal RNA rRNA prediction is too short;gbkey=misc_feature
##FASTA
>NZ_CP010945.1 Pseudomonas fluorescens NCIMB 11764, complete genome
TTACCGGGGCTTCGATCAAGAGCTTCGCGTTAGCTAACCCCATCAATTAACCTTCCGGCACCGGGCAGGCGTCACACCCT
ATACGTCCACTTTCGTGTTTGCAGAGTGCTGTGTTTTTAATAAACAGTCGCAGCGGCCTGGTATCTTCGACCGGCATGAG
CTTACGGAGCAAGTCCTTCACCCTCACCGGCGCACCTTCTCCCGAAGTTACGGTGCCATTTTGCCTAGTTCCTTCACCCG
AGTTCTCTCAAGCGCCTTGGTATTCTCTACCCAACCACCTGTGTCGGTTTGGGGTACGGTTCCTGGTTACCTGAAGCTTA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants