Implement Checkpointing, improve nested insertion removal, and more!
Checkpointing is implemented!
Users can recover interrupted runs from a number of major checkpoints. This is particularly useful when running LTR_retriever on huge genomes (i.e., common wheat) and got interrupted (for example, the job is killed due to walltime limit). Use LTR_retriever -h
for further information.
Remove nesting of entire LTR elements in library
Previous versions would remove nested insertion of solo LTRs. However, when a full element is nested in a library sequence, the internal region of the nesting element won't be removed, causing sequence mosaics and library redundancy. In this update, a new module is developed to clean up composite sequences caused by full-element nesting. This update was inspired by Mr. Robert Hubley's report.
The current version has a slight decrease of accuracy with a marginal gain of sensitivity. This is likely due to the removal of nesting sequences that may have slightly shifted the annotation dynamic of RepeatMasker. Nevertheless, there is no extra sequence added in this process, but removes up to 60% of library sequences (i.e., in common wheat) that are redundant due to nested full-element insertions.
Rice (MSUv7) | v1.x | v2.0 | v2.5 |
---|---|---|---|
Sensitivity | 95.0% | 95.3% | 96.3% |
Specificity | 95.0% | 94.6% | 94.0% |
Accuracy | 95.0% | 94.8% | 94.5% |
Precision | 85.4% | 84.5% | 83.1% |
Other updates
- Update README, no longer supports MGEScan_LTR due to the inability to run it on modern Linux platforms.
- Add an easy way (conda) to install dependencies.
- Fix a bug occurred when chromosome names are pure numbers.
- Improve the estimation of LTR age. Previous versions included InDels for divergence estimation, which would result in overestimation of LTR age. This version will only use SNPs, no indels, to compute LTR divergence and age.