-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No nonLTR elements detected by repeatmodeler ? #58
Comments
Dear Timothee, Thanks for the information. You need to do your own curation in this case. From the two annotations, the major difference is the amount of DTC, DTM and L2 elements. You can check a couple of DTC and DTM candidates as well as some of the L2 elements. You can also rerun EDTA with Best, |
Hey Shujun, This are the summaries of some of the fungal genomes:
Hirsutella sp
Purpureocillium sp
And this is for the insect genome (Sitophilus oryzae)
Is there any recommendation for fixing this? |
Hi Luis, So far I have not identified an approach to confidently annotate nonLTR elements automatically. You may try out different structural-based methods and manually curate the results, then provide them to EDTA with The Hope these helps. Best, |
Hello Shujun, I work with Luis, and I don't understand why no LINEs are picked up by RM? Is there a way we could change the code so RM gets the LINE+SINE annotations as it would do normally? |
Hi Rita, I am very sorry for the delayed reply. As suggested previously, you may annotate nonLTRs using other methods and manually curated them, and provided the manually curated nonLTR library to EDTA. Best, |
I got the same issues. There are too much disparities with RepeatModeler/RepeatMasker and I don't detect any non LTR with EDTA .. |
Looks like there is some bugs with RepeatModeler. I will look into it
further.
Shujun
…On Thu, Mar 19, 2020 at 10:29 AM Patrick Tran Van ***@***.***> wrote:
I got the same issues.
There too much disparities with RepeatModeler/RepeatMasker and I don't
detect any non LTR with EDTA ..
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#58 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNX4NDD2ERZ4WNO6X4FMFTRII26JANCNFSM4KVEECPA>
.
|
Hi, @oushujun @Tkastylevsky @OnlyHigh @RRebo @ptranvan The issue is the The # Error of conda RepeatClassifier
RepeatClassifier Version 2.0.1
======================================
Search Engine = rmblast
- Looking for Simple and Low Complexity sequences..
- Looking for similarity to known repeat proteins..
- Looking for similarity to known repeat consensi..
Missing /data/software/conda_envs/EDTA/share/RepeatMasker/Libraries/RepeatMasker.lib.nsq!
Please rerun the configure program in the RepeatModeler directory
before running this script. # After replace the RepeatModeler and RepeatMasker, the tbl show the nonLTR elements
Repeat Classes
==============
Total Sequences: 15
Total Length: 465993302 bp
Class Count bpMasked %masked
===== ===== ======== =======
DNA 449 63335 0.01%
Academ-H 331 56230 0.01%
CMC-EnSpm 2047 566318 0.12%
CMC-Transib 1350 57141 0.01%
DTA 67007 11386924 2.44%
DTC 103511 16836805 3.61%
DTH 23006 3521116 0.76%
DTM 170985 27421409 5.88%
DTT 39504 5397629 1.16%
Helitron 206020 43936749 9.43%
IS3EU 152 9504 0.00%
MULE-MuDR 2833 989555 0.21%
Maverick 459 136535 0.03%
Sola-3 2 130 0.00%
TcMar-Tc1 46 12764 0.00%
LINE -- -- --
I-Jockey 22 10552 0.00%
L1 2241 904672 0.19%
L1-Tx1 172 44425 0.01%
L2 289 59534 0.01%
Penelope 45 21823 0.00%
R1 27 38794 0.01%
RTE-BovB 57 29446 0.01%
RTE-X 70 10261 0.00%
LTR -- -- --
Copia 63804 29281157 6.28%
Gypsy 75449 25996361 5.58%
Unknown 5668 1417264 0.30%
unknown 160676 45748080 9.82%
MITE -- -- --
DTA 16500 2172299 0.47%
DTC 3021 305422 0.07%
DTH 12724 1893596 0.41%
DTM 36734 5512094 1.18%
DTT 2764 273700 0.06%
SINE 1173 252833 0.05%
Unknown 68810 14544934 3.12%
---------------------------------
total interspersed 1067948 238909391 51.27%
---------------------------------------------------------
Total 1067948 238909391 51.27% |
Thanks for looking at it. I have install EDTA with this command:
Do you know any easy solution to replace the conda Repeatmodeler with an other version ? I saw there is a manual command:
But I don't see |
@ptranvan I am working on this bug and will have it fixed in the next update. Shujun |
I didn't include any lib (except the Transposable element protein database which is by default in the repo) in the RepeatMasker recipe (RepeatMasker is used by repeatModeler to classify the detected repeats). In theory you can choose to pay a licence for Repbase on go for a free solution like Dfam. I explained it here: bioconda/bioconda-recipes#9988 (comment) I could include by default the Dfam one by default in an updated version of the recipe |
@Juke34 I have worked around the classification of RM2 results in EDTA using TEsorter. It will be reflected in the next update. Outside of EDTA I think including some sort of classification scheme would benefit the end-user. |
@Tkastylevsky Please update EDTA and rerun it with the |
@oushujun can we update edta using the conda command ?
|
Not for now, because the new update has not been pushed to conda yet. You can |
here is the result : so, a few LINEs were detected, but it is still very weird that so few are picked by the analysis...In my other annotations, lines can cover as much as 8% of this chromosome by themselves. |
@Tkastylevsky Without manual curation it's difficult to say whether 0.24% is due to false-negative or 8% is due to false positive. More likely it's both. |
The conda EDTA has also been updated to v1.8.3. I will mark this issue solved. Feel free to reopen if necessary. |
Hello, it's me again !
so, EDTA finished its run on my chicken chr1. However, I ran into a bit of an issue...No non LTR elements were detected by the analysis, even when I used the --sensitive 1 setting(and I checked, I have a repeatmodeler folder with 6 rounds of library construction in it in my EDTA run folder). However, avian genomes are known to have been invaded by in particular LINE CR1 elements :
As I am testing several annotation methods I have used Repeatmodeler alone on this chromosome and this is what I got after repeatmasking (i'm fusing the repeatmodeler database obtained with Dfam) :
So, why do you think there is such a huge difference between both annotations ? Am I looking into the right files for the results of EDTA ?
PS :
the code i used to run EDTA after the EDTA_raw.pl code :
Timothee
EDIT : (I removed some of the results from EDTA for readability purposes since this thread is getting attention)
The text was updated successfully, but these errors were encountered: