Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID not taken as locus tag #50

Closed
rob123king opened this issue Feb 20, 2020 · 2 comments
Closed

ID not taken as locus tag #50

rob123king opened this issue Feb 20, 2020 · 2 comments

Comments

@rob123king
Copy link

rob123king commented Feb 20, 2020

Hi, excited to see something that may make things easier. bit of a nightmare otherwise. I have ID's within my Gff and was expecting them to be used for the locus tags but they are not and sequential numbers are instead. A note is created of the ID which I think would be better if just the locus_tag became the ID as I think that is it's purpose. I don't have gene names in gff but ideally a tab file of gene names could be given to add these to the resulting embl file too, as this is the likely starting point of having gene names available. I would also like to parse the exon number after the : and add this in, although I don't think this is essential. I'm still trying to work out the ENA format requirements for submission. I think I could just have a locus tag as the minimum feature and what I'm working towards. The webin validation tool complains about overlapping UTR and CDS features of two genes in the same direction. Could a correction part be added to cleave UTR and correct gene when detects this? As I have to work out how to fix this and start again. I know of a script somewhere that will do the cleaving of UTR at least. Sorry a few change requests or otherwise I'll try to make the changes myself when time but harder when don't know the code.

FT mRNA join(433449..433533,433946..434073,434612..434836,
FT 435438..435904)
FT /locus_tag="SPEXI_LOCUS1"
FT /note="source:maker"
FT /note="ID:SPEXI_01T000001"
FT CDS join(433449..433533,433946..434073,434612..434836,
FT 435438..435710)
FT /locus_tag="SPEXI_LOCUS1"
FT /note="source:maker"
FT /note="ID:SPEXI_01T000001:cds"
FT /transl_table=1
FT exon 433449..433533
FT /locus_tag="SPEXI_LOCUS1"
FT /note="source:maker"
FT /note="ID:SPEXI_01T000001:1"

@Juke34
Copy link
Collaborator

Juke34 commented Feb 20, 2020

You can use the attribute of you choice as locus_tag using the --use_attribute_value_as_locus_tag parameter. But be aware that when submitting the file to ENA, the locus_tag will be anyway overwritten.

About overlapping UTR and CDS you could automatically fix it using gff3_sp_fix_features_locations_duplicated.pl from AGAT.
The same if you encounter problems with short introns you can use gff3_sp_flag_short_introns.pl .

For the gene names the easier is to have it prior conversion in the GFF file, then it will automatically be included in the EMBL file.
To load gene names from a blast output in your GFF file you can use agat_sp_manage_functional_annotation.pl.
If you want to add the gene names afterwards in the EMBL file you will have to code your own script (don't hesitate to share it then, I could include it here in case someone else would like to do the same).

@Juke34
Copy link
Collaborator

Juke34 commented Mar 25, 2020

We didn't hear anything back from you for a while, I guess you found your way. So I close the issue but feel free to re-open it necessary.

@Juke34 Juke34 closed this as completed Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants