PanEDTA Line Detection #424

sjteresi · 2024-01-30T17:36:31Z

Hello Shujun,

Hope you are doing well. I am writing to share that I had issues with LINE detection in PanEDTA. I am hoping that this will help anyone else who encounters this issue. It is not a bug, just something that I think folks could easily overlook. When I ran PanEDTA (v2.1.0) on its own without pre-calculating results with regular EDTA, it was not finding any LINE elements in my genomes.

After doing some testing, I think it is because the panEDTA script by default calls EDTA.pl without the --sensitive 1 option. The sensitive option calls RepeatModeler. I also observed that when I ran regular EDTA.pl on a genome without the sensitive option, it did not recover any LINEs. So to summarize, it seems that RepeatModeler was doing the heavy lifting for LINE detection in my strawberry genomes, and without it, I wasn't detecting any LINEs. Jordan B, a post-doc in Pat's lab also had this same LINE issue with some Camelina genomes.

In my case, I fixed the issue by running EDTA individually on each genome with the option, and completed the pangenome annotation with panEDTA. That approach worked fine, LINEs were indeed included in my final annotation.

This problem only arises if users decide to use panEDTA to perform all steps of their pangenome annotation. It can easily be sidestepped if user's create the individual annotations with the --sensitive 1 option first.

Sincerely,
Scott Teresi

The text was updated successfully, but these errors were encountered:

oushujun · 2024-02-01T03:49:06Z

Hi Scott,

Thank you for reporting this! Can you please update EDTA to 2.2.0 and test panEDTA again? There are many big changes to the new version for improved SINE/LINE annotations.

Thanks,
Shujun

oushujun · 2024-04-12T15:51:42Z

Any luck?

Shujun

sjteresi · 2024-04-12T16:01:44Z

My apologies for the delayed response Shujun, I will update EDTA this weekend or early next week and follow-up

sjteresi · 2024-04-16T16:01:00Z

Hi Shujun,

I am actually having additional trouble with EDTA 2.2.0 now that I have updated. I am having a lot of error getting conda to resolve dependencies when installing, so I elected to use singularity. That installation worked, but now when I run the genomes I get 0 LINE result files and the TIR detection fails, but does not crash. Here is a sample output:

Mon Apr 15 15:59:43 EDT 2024    Start to find LINE candidates.

Mon Apr 15 15:59:43 EDT 2024    Identify LINE retrotransposon candidates from scratch.

Tue Apr 16 07:25:21 EDT 2024    Warning: The LINE result file has 0 bp! 

Tue Apr 16 07:25:21 EDT 2024    Start to find TIR candidates.

Tue Apr 16 07:25:21 EDT 2024    Identify TIR candidates from scratch.

Species: others
find: ./TIR-Learner-+-TIRvish.gff3: No such file or directory
Traceback (most recent call last):
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
    TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,

The test that you included in your README works mostly... The dependencies check out, but I get a similar set of warnings:

Tue Apr 16 12:02:56 EDT 2024	Start to find LTR candidates.

Tue Apr 16 12:02:56 EDT 2024	Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.
Tue Apr 16 12:03:25 EDT 2024	Finish finding LTR candidates.

Tue Apr 16 12:03:25 EDT 2024	Start to find SINE candidates.

Tue Apr 16 12:04:07 EDT 2024	Warning: The SINE result file has 0 bp!

Tue Apr 16 12:04:07 EDT 2024	Start to find LINE candidates.

Tue Apr 16 12:04:07 EDT 2024	Identify LINE retrotransposon candidates from scratch.

Tue Apr 16 12:05:15 EDT 2024	Warning: The LINE result file has 0 bp!

Tue Apr 16 12:05:15 EDT 2024	Start to find TIR candidates.

Tue Apr 16 12:05:15 EDT 2024	Identify TIR candidates from scratch.

Species: others
Traceback (most recent call last):
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
    TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 81, in __init__
    self.execute()
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 121, in execute
    self.execute_M4()
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 672, in execute_M4
    self["base"] = CNN_predict.execute(self)
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 114, in execute
    df = predict(df, TIRLearner_instance.genome_file_path,
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 62, in predict
    model = load_model(path_to_model)
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model
    return legacy_sm_saving_lib.load_model(
  File "/usr/local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def
    input_shape = input_shape.as_proto()
AttributeError: as_proto
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /mnt/ufs18/rs-004/edgerpat_lab/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
	Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
	Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019
	
mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Tue Apr 16 12:05:31 EDT 2024	Start to find Helitron candidates.

Tue Apr 16 12:05:31 EDT 2024	Identify Helitron candidates from scratch.

sjteresi · 2024-04-16T19:36:12Z

Currently re-trying with a fresh Anaconda installation and conda environment.

oushujun · 2024-04-17T22:59:54Z

The yml file should be helpful for conda installation. I don’t think the singularity version is working at the moment Shujun

…

On Tue, Apr 16, 2024 at 3:36 PM Scott Teresi ***@***.***> wrote: Currently re-trying with a fresh Anaconda installation and conda environment. — Reply to this email directly, view it on GitHub <#424 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABNX4NDXPHCEGJVCJ6UNAWLY5V4UFAVCNFSM6AAAAABCRVERVWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJZG44TOMJTGQ> . You are receiving this because you commented.Message ID: ***@***.***>

sjteresi · 2024-04-23T16:13:43Z

Hi Shujun,

I got the latest version of EDTA to complete the system test, still running it on my genomes. I will report back. I had to install bedtools and samtools on top of the conda environment for this latest upgrade. I did not see those being specified in the yml file, and I was having trouble making the basic install work. Perhaps I am wrong and messed up the install, or maybe they were pre-loaded on your computing cluster system so they were missed. Either way, I hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PanEDTA Line Detection #424

PanEDTA Line Detection #424

sjteresi commented Jan 30, 2024

oushujun commented Feb 1, 2024

oushujun commented Apr 12, 2024

sjteresi commented Apr 12, 2024

sjteresi commented Apr 16, 2024 •

edited

Loading

sjteresi commented Apr 16, 2024

oushujun commented Apr 17, 2024 via email

sjteresi commented Apr 23, 2024

PanEDTA Line Detection #424

PanEDTA Line Detection #424

Comments

sjteresi commented Jan 30, 2024

oushujun commented Feb 1, 2024

oushujun commented Apr 12, 2024

sjteresi commented Apr 12, 2024

sjteresi commented Apr 16, 2024 • edited Loading

sjteresi commented Apr 16, 2024

oushujun commented Apr 17, 2024 via email

sjteresi commented Apr 23, 2024

sjteresi commented Apr 16, 2024 •

edited

Loading