Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative coordinates in TEanno.gff3 #263

Open
nhartwic opened this issue Apr 14, 2022 · 3 comments
Open

Negative coordinates in TEanno.gff3 #263

nhartwic opened this issue Apr 14, 2022 · 3 comments
Labels
bug Something isn't working

Comments

@nhartwic
Copy link

nhartwic commented Apr 14, 2022

Basically the title. Here is the weird lines from the gff3 file...

15593   EDTA    repeat_region   -2      3598    .       ?       .       ID=repeat_region_23903;Name=TE_00012818;Classification=LTR/unknown;Sequence_ontology=SO:0000657;ltr_identity=0.9733;Method=structural;motif=TGCA;tsd=TTAAT
15593   EDTA    target_site_duplication -2      2       .       ?       .       ID=lTSD_23903;Parent=repeat_region_23903;Name=TE_00012818;Classification=LTR/unknown;Sequence_ontology=SO:0000434;ltr_identity=0.9733;Method=structural;motif=TGCA;tsd=TTAAT

...I've never seen negative coordinates like this before in any of my other EDTA runs. I'm not really sure what this is supposed to mean, but my downstream tools really don't like it.

I'm currently running EDTA version 1.9.6.

Let me know if there are any files I can send to try to figure out what happened here. In the mean time, I've noticed that EDTA 2.0 released a few months ago, so I suppose I'll update. As to this specific output, I'm just going to manually edit the gff3 to fix this entry and move on with life.

@oushujun
Copy link
Owner

Hi @nhartwic,

It looks like a bug. Can you send the contig sequence 15593 to my email shujun.ou.1@gmail.com? Thanks!

Shujun

@nhartwic
Copy link
Author

Apologies for the delay on this. Got sidetracked.

EtweTM011.v2.15593.fasta.gz
EtweTM011.v2.fasta.mod.EDTA.TElib.fa.gz

Here is the contig and the repeat library that EDTA generated for whole assembly.

@oushujun oushujun added the bug Something isn't working label Sep 15, 2022
@oushujun
Copy link
Owner

oushujun commented Jan 7, 2024

Hello @nhartwic,

Sorry for the long overdue. This issue originated in LTR_retriever for LTR candidates found at the boundary of sequences (i.e., contig 15593 in your case). LTR_retriever needs to extract 50bp flanking the candidate for further analysis. The element in your case starts at position 6 of contig 15593, leaving insufficient flanking sequence for the program and thus producing erroneous results. I have set filters to remove cases like these because they could not provide sufficient flanking sequences for LTR_retriever to determine the authenticity of the candidate. The update is reflected in this commit: oushujun/LTR_retriever@4039eb7.

You may manually remove such cases or rerun LTR_retriever on EDTA/raw using the latest version on github. Note that the conda version is lagging and not as new as the github version.

Hope this helps! Sorry again for the delay. Please let me know if you have further questions.

Best,
Shujun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants