Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert_gff3_to_gbk.py, add full support for non-protein-coding genes #24

Closed
jonathancrabtree opened this issue Jul 15, 2014 · 8 comments
Assignees
Labels

Comments

@jonathancrabtree
Copy link
Contributor

If convert_gff3_to_gbk.py finds a tRNA, rRNA, or other non protein-coding gene in the input GFF3 it will output the parent "gene" feature in the output GenBank file, but nothing else. Only protein-coding genes with an mRNA feature below the parent gene appear to be converted fully. It looks like biocodegenbank.print_biogene needs to be generalized to handle all gene types, or at least all those that currently have a corresponding representation in the biothings module.

@mikemc
Copy link

mikemc commented Feb 19, 2020

Came to post an issue but I think I'm having the same problem noted above, so will just add a concrete example of why this is a problem. I am trying to extract 16S sequences that are annotated in a GenBank file (example). The fact that a gene is the 16S sequence is identified by the product name in the GenBank file,

     gene            517900..517988
                     /locus_tag="SAMN05444282_102329"
     rRNA            517900..517988
                     /locus_tag="SAMN05444282_102329"
                     /product="16S ribosomal RNA . Bacterial SSU"

However, the product name doesn't make it into the GFF3 file and so it is impossible to select the 16S sequences downstream separately from other rRNA's,

FNQD01000002	GenBank	gene	517900	517988	.	+	.	ID=SAMN05444282_102329;locus_tag=SAMN05444282_102329
FNQD01000002	GenBank	rRNA	517900	517988	.	+	.	ID=SAMN05444282_102329.rRNA.1;Parent=SAMN05444282_102329

@jorvis
Copy link
Owner

jorvis commented Feb 19, 2020

I'll see if I can get this added tonight.

@jorvis
Copy link
Owner

jorvis commented Feb 20, 2020

Last night has shifted into today.

@jorvis
Copy link
Owner

jorvis commented Feb 20, 2020

@mikemc Is it possible to attach your GBK file so I can test with it, or is it private?

@mikemc
Copy link

mikemc commented Feb 20, 2020

@jorvis The example I gave is from this GenBank file

@jorvis jorvis self-assigned this Feb 24, 2020
@jorvis jorvis added the bug label Feb 24, 2020
jorvis added a commit that referenced this issue Mar 9, 2020
@jorvis
Copy link
Owner

jorvis commented Mar 9, 2020

@mikemc - The current version of the code should fix your issue. The tRNAs now export with anticodon reported and rRNAs with product. I'm not closing this ticket yet, as what @jonathancrabtree reported is actually the reverse conversion, going from GFF3 -> GBK.

jorvis added a commit that referenced this issue Mar 10, 2020
@jorvis
Copy link
Owner

jorvis commented Mar 10, 2020

Closing. I've now confirmed retention of annotation of tRNAs and rRNAs from source Genbank Flat file, converted to GFF3, then converted back to Genbank.

@jorvis jorvis closed this as completed Mar 10, 2020
@mikemc
Copy link

mikemc commented Mar 10, 2020

Great, thanks @jorvis! I haven't had a chance to test yet but sounds like this covers my issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants