Releases · oushujun/LTR_retriever

08 Jan 04:40

oushujun

v2.9.9

4039eb7

v2.9.9 update

New feature

Enable strand-aware outputs

For LTR candidates found in the negative strand, the locus presentation is now 5' -> 3', similar to candidates found in the positive strand. For example, Chr1:7890..3456 suggests the candidate is on the - strand. This information is shown in the first column of the pass.list, the last column of the gff3 file, and the sequence names of the intact.fa file. If the element is on the - strand, its sequence in the intact.fa file will be shown as 5' -> 3' from the negative strand. For example, Chr1:7890..3456's sequence will be a reverse complement to Chr1:3456..7890's sequence. For candidates without strand information (i.e., lack of coding sequence), their strangeness will be assumed positive for convenience.

Bug fix

Ensure candidates have sufficient flanking sequences to extend (default 50bp), which is necessary for LTR_retriever to determine whether the candidate is true or false. Candidates that can't satisfy this criterion will be skipped. Such a scenario is mostly likely found in fragmented genomes. Bug report: oushujun/EDTA#263

Assets 2

28 Dec 03:06

oushujun

v2.9.8

b746912

v2.9.8 update Latest

Latest

New features

Use the same LTR name for parts of INT and LTR from the same element in preparation for solving @edta#251
Add the yml file for conda installation

Bug fix

Update get_range.pl

A bug introduced in Aug, 2023 (# a375c5e) that will output all candidates (both LTR retrotransposons and not LTR repeats) for generating the library file. You will see non-LTR sequences in the library due to this bug (eg., LTR/EnSpm-CACTA). Now it's fixed.
A bug introduced in May, 2023 (#058ce29) that fails to remove masked sequences in the final library. Now it's fixed.
Remove the RepeatMasker support to simplify the code since this functionality is never used in the official release.

Contributors

edta

Assets 2

11 Jul 05:27

oushujun

v2.9.5

84ca5fc

Bug fix

Fix bug #153 in v2.9.4 when introducing TEsorter to classify LTR candidates.

Assets 2

08 May 20:03

oushujun

v2.9.4

058ce29

It just gets better with community efforts!

Major Updates

Add TEsorter to help to identify not LTR sequences. Candidate LTRs will be determined as "false" if they contain not-LTR HMM profile matches even the candidate contains LTR/TSD and the TGCA motif. This purging will remove a small number of structurally intact LTR candidates (5/2304 in rice). This implementation offers slight improvements over older versions and should be more significant for larger genomes.

LTR_retriever-harvest_FINDER	sens	spec	accu	prec	FDR	F1
retriever_v2.5	0.967	0.920	0.931	0.789	0.211	0.869
retriever_v2.6	0.963	0.931	0.939	0.811	0.189	0.881
retriever_v2.9.2	0.966	0.926	0.935	0.802	0.198	0.876
retriever_v2.9.4	0.967	0.928	0.937	0.804	0.196	0.878

Add more filtering parameters to identify solo LTRs, improve the solo-intact ratio calculation (#111, #110).
Resolve RMblast errors when it attempts to overutilize CPUs #137

Other improvements

Now require sequence IDs for 13 characters or less to accomodate for huge chromosomes up to 999Mb in length.
Add missing TRF parameter (#133)
Add check to ensure the input genome is writable (LTR_retriever won't overwrite your genome) (#125).
Remove gap length for genome size calculation.

Acknowledgements

Andreas Wallberg, @Shokusei, Evan Ernst, @xie-wei-hh, @with9, and users like YOU!

Contributors

with9, xie-wei-hh, and Shokusei

Assets 2

28 Jul 18:18

oushujun

v2.9.0

0c4d1fa

Version 2.9.0: Polishing outputs

Major updates

This version has many improvements in the downstream outputs including:

standardized the GFF3 output following these criteria and used the updated TE-related sequence ontologies
combined structural and homological LTR annotations. Homology-based LTR fragments will be replaced by structural-based LTR annotations wherever applicable.

Other improvements

allow users to provide paths to dependencies in the command-line.
updated readme
fixed a number of minor bugs.

Assets 2

20 May 22:12

oushujun

v2.8.7

053d9b4

Reformat GFF3 outputs

Reformat the GFF3 output of intact and whole-genome LTR sequences following the standard GFF3 guideline.
Change to use the env default Perl and make shebang lines more consistent. #68
Fix inconsistent total LTR summary. #66
Remove precompiled trf in the package.

Assets 2

04 Dec 17:44

oushujun

v2.8

e62b406

Recovering 10-20% more intact LTR elements

Major update

I recently identified a bug for dropping intact LTR elements, which have an imbalance LTR length > 15bp due to InDels. After manual checks, I determined these are still high-quality intact elements and thus salvage them in the output. This will marginally improve the sensitivity especially for genomes with limited LTR sequences (e.g. Arabidopsis, ~7%) and the margin decreases for those with decent amounts of LTRs, such as rice (~25%) and maize (~75%), because the abundance of intact elements has been sufficient to construct a comprehensive library. However, the number of intact LTR elements could increase for 10-20% comparing to the last version (v2.7), which has some positive effects on the calculation of LAI. Some benchmarking results:

Arabidopsis (TAIR10)	v1.x	v2.0	v2.8
Sensitivity	90.70%	90.90%	95.04%
Specificity	99.00%	99.00%	98.88%
Accuracy	98.50%	98.50%	98.64%
Precision	86.60%	86.50%	84.99%

Rice (MSUv7)	v1.x	v2.0	v2.5	v2.8
Sensitivity	95.00%	95.30%	96.30%	96.71%
Specificity	95.00%	94.60%	94.00%	93.87%
Accuracy	95.00%	94.80%	94.50%	94.54%
Precision	85.40%	84.50%	83.10%	83.09%

Minor updates

Allow for mirrored candidates produced by LTRharvest
Improve the convert_ltrdetector.pl for the published version (v1.0) of LtrDetector (contributed by @baozg)
Add a convertor convert_ltr_finder2.pl to convert LTR_FINDER -w 2 table format into LTRharvest screen output format
For LAI, allow the -all file to contain other TEs (i.e., whole-genome TE annotation)

Assets 2

15 Sep 21:48

oushujun

v2.7

1f50960

Releasing a 100% faster version

Major improvement

I am excited to release this much faster version of LTR_retriever. Its multithreading module has been slowing down the program and I finally get the chance to improve it. This part of the update will not change the program outcome since this is just a more efficient implementation of parallel computation.

With the test on the 14.5 Gb bread wheat genome, a total of 941,338 LTR raw candidates were processed and a non-redundant library was generated. This process only took 8 days 3 hours and 31 minutes for the current version (v2.7) with 10 threads (-threads 10), which would have required 3 weeks for the last version (v2.6).

Minor changes

Classification of Copia elements was improved to be more sensitive (#51)
Print out the program version number on screen.
Improved genome and sequence reading.

Assets 2

09 May 17:16

oushujun

v2.6

05fe234

Add support to LTR_STRUC, MGEScan 3.0.0, and LtrDetector!

Support three more LTR programs!

Three more programs are supported by LTR_retriever:

Users need to convert candidates identified by these programs into the LTRharvest format with scripts located in the /bin folder:

convert_ltr_struc.pl
convert_MGEScan3.0.pl
convert_ltrdetector.pl

Then feed them to LTR_retriever with -inharvest. You may concatenate multiple LTRharvest format input files together.

Note: You won't find a lot of intact LTR elements from LTR_STRUC and LtrDetector outputs due to the fuzzy sequence boundaries these programs provided. So please use these two as supplements to other inputs.

Minor bug fix

Fix the bug that would count 1 extra bit for sequence names
Maintain codes for solo LTR identification and solo-intact ratio calculations: bin/solo_finder.pl and bin/intact_finder_coarse.pl (#41)
Format sequence names in the redundant library output
Add detailed notes to fix the conda RepeatMasker issue. Notes can be found here: #43

Assets 2

27 Mar 04:48

oushujun

v2.5

f6388f6

Implement Checkpointing, improve nested insertion removal, and more!

Checkpointing is implemented!

Users can recover interrupted runs from a number of major checkpoints. This is particularly useful when running LTR_retriever on huge genomes (i.e., common wheat) and got interrupted (for example, the job is killed due to walltime limit). Use LTR_retriever -h for further information.

Remove nesting of entire LTR elements in library

Previous versions would remove nested insertion of solo LTRs. However, when a full element is nested in a library sequence, the internal region of the nesting element won't be removed, causing sequence mosaics and library redundancy. In this update, a new module is developed to clean up composite sequences caused by full-element nesting. This update was inspired by Mr. Robert Hubley's report.

The current version has a slight decrease of accuracy with a marginal gain of sensitivity. This is likely due to the removal of nesting sequences that may have slightly shifted the annotation dynamic of RepeatMasker. Nevertheless, there is no extra sequence added in this process, but removes up to 60% of library sequences (i.e., in common wheat) that are redundant due to nested full-element insertions.

Rice (MSUv7)	v1.x	v2.0	v2.5
Sensitivity	95.0%	95.3%	96.3%
Specificity	95.0%	94.6%	94.0%
Accuracy	95.0%	94.8%	94.5%
Precision	85.4%	84.5%	83.1%

Other updates

Update README, no longer supports MGEScan_LTR due to the inability to run it on modern Linux platforms.
Add an easy way (conda) to install dependencies.
Fix a bug occurred when chromosome names are pure numbers.
Improve the estimation of LTR age. Previous versions included InDels for divergence estimation, which would result in overestimation of LTR age. This version will only use SNPs, no indels, to compute LTR divergence and age.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature

Enable strand-aware outputs

Bug fix

New features

Bug fix

Contributors

Major Updates

Other improvements

Acknowledgements

Contributors

Major updates

Other improvements

Major update

Minor updates

Major improvement

Minor changes

Support three more LTR programs!

Minor bug fix

Checkpointing is implemented!

Remove nesting of entire LTR elements in library

Other updates

Releases: oushujun/LTR_retriever

v2.9.9 update

New feature

Enable strand-aware outputs

Bug fix

v2.9.8 update

New features

Bug fix

Contributors

Bug fix

It just gets better with community efforts!

Major Updates

Other improvements

Acknowledgements

Contributors

Version 2.9.0: Polishing outputs

Major updates

Other improvements

Reformat GFF3 outputs

Recovering 10-20% more intact LTR elements

Major update

Minor updates

Releasing a 100% faster version

Major improvement

Minor changes

Add support to LTR_STRUC, MGEScan 3.0.0, and LtrDetector!

Support three more LTR programs!

Minor bug fix

Implement Checkpointing, improve nested insertion removal, and more!

Checkpointing is implemented!

Remove nesting of entire LTR elements in library

Other updates