Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAG discards UTR features #149

Open
marchoeppner opened this issue Oct 7, 2014 · 5 comments
Open

GAG discards UTR features #149

marchoeppner opened this issue Oct 7, 2014 · 5 comments
Assignees

Comments

@marchoeppner
Copy link

I am not sure this behaviour is by design, but GAG currently ejects UTR features into the file genome.ignored.gff , resulting in the final annotation genome.gff to have no UTR annotations either. This concerns features with the feature type:

five_prime_UTR
three_prime_UTR

However, these two features are perfectly valid within the GFF3 standar and probably shouldn't be ignored (?).

More information at: http://www.sequenceontology.org/gff3.shtml

@bruab
Copy link
Member

bruab commented Oct 9, 2014

@marchoeppner This is by design, or by laziness perhaps. Since UTR isn't included in the NCBI's .tbl file, we choose to ignore them. Including them in the output is non-trivial, since certain filters and fixes within GAG can shift the boundaries between CDS and UTR. It's doable, it's just more complicated than Read-Them-In-And-Write-Them-Out.

As long as this omission doesn't cause anybody trouble with their genome submission, fixing it is low-priority. If anyone gets errors or other flak due to the absence of UTR, we'll move it up the queue.

@marchoeppner
Copy link
Author

Understood - maybe something for the future? We are using Gag and Annie for things other than NCBI tbl dumping, so not being able to parse all features makes things a little tricky. That being said, it is already a very useful tool as is!

@bruab bruab reopened this Oct 31, 2014
@bruab
Copy link
Member

bruab commented Oct 31, 2014

We have decided to do this. Maybe next week, depending on how horribly some transcriptome submissions go ...

@bruab
Copy link
Member

bruab commented Oct 31, 2014

We will create new UTR features from scratch, rather than preserve the original ones. This is simpler, gets around the issue of fixes and filters shifting UTR boundaries.

@bruab bruab assigned bruab and tedsta and unassigned bruab Oct 31, 2014
@PaTapiaBioinfo
Copy link

I am not sure this behaviour is by design, but GAG currently ejects UTR features into the file genome.ignored.gff , resulting in the final annotation genome.gff to have no UTR annotations either. This concerns features with the feature type:

five_prime_UTR
three_prime_UTR

However, these two features are perfectly valid within the GFF3 standar and probably shouldn't be ignored (?).

More information at: http://www.sequenceontology.org/gff3.shtml

i solved it replacing the line 246 in src/gff_reader.py:

        elif ltype == 'start_codon' or ltype == 'stop_codon' or ltype == 'five_prime_UTR' or ltype == 'three_prime_UTR'

this mantain the UTR in the .gff output but i thinks that not valid for NCBI tbl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants