Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PanEDTA Line Detection #424

Open
sjteresi opened this issue Jan 30, 2024 · 7 comments
Open

PanEDTA Line Detection #424

sjteresi opened this issue Jan 30, 2024 · 7 comments

Comments

@sjteresi
Copy link

Hello Shujun,

Hope you are doing well. I am writing to share that I had issues with LINE detection in PanEDTA. I am hoping that this will help anyone else who encounters this issue. It is not a bug, just something that I think folks could easily overlook. When I ran PanEDTA (v2.1.0) on its own without pre-calculating results with regular EDTA, it was not finding any LINE elements in my genomes.

After doing some testing, I think it is because the panEDTA script by default calls EDTA.pl without the --sensitive 1 option. The sensitive option calls RepeatModeler. I also observed that when I ran regular EDTA.pl on a genome without the sensitive option, it did not recover any LINEs. So to summarize, it seems that RepeatModeler was doing the heavy lifting for LINE detection in my strawberry genomes, and without it, I wasn't detecting any LINEs. Jordan B, a post-doc in Pat's lab also had this same LINE issue with some Camelina genomes.

In my case, I fixed the issue by running EDTA individually on each genome with the option, and completed the pangenome annotation with panEDTA. That approach worked fine, LINEs were indeed included in my final annotation.

This problem only arises if users decide to use panEDTA to perform all steps of their pangenome annotation. It can easily be sidestepped if user's create the individual annotations with the --sensitive 1 option first.

Sincerely,
Scott Teresi

@oushujun
Copy link
Owner

oushujun commented Feb 1, 2024

Hi Scott,

Thank you for reporting this! Can you please update EDTA to 2.2.0 and test panEDTA again? There are many big changes to the new version for improved SINE/LINE annotations.

Thanks,
Shujun

@oushujun
Copy link
Owner

Any luck?

Shujun

@sjteresi
Copy link
Author

My apologies for the delayed response Shujun, I will update EDTA this weekend or early next week and follow-up

@sjteresi
Copy link
Author

sjteresi commented Apr 16, 2024

Hi Shujun,

I am actually having additional trouble with EDTA 2.2.0 now that I have updated. I am having a lot of error getting conda to resolve dependencies when installing, so I elected to use singularity. That installation worked, but now when I run the genomes I get 0 LINE result files and the TIR detection fails, but does not crash. Here is a sample output:

Mon Apr 15 15:59:43 EDT 2024    Start to find LINE candidates.

Mon Apr 15 15:59:43 EDT 2024    Identify LINE retrotransposon candidates from scratch.

Tue Apr 16 07:25:21 EDT 2024    Warning: The LINE result file has 0 bp! 

Tue Apr 16 07:25:21 EDT 2024    Start to find TIR candidates.

Tue Apr 16 07:25:21 EDT 2024    Identify TIR candidates from scratch.

Species: others
find: ./TIR-Learner-+-TIRvish.gff3: No such file or directory
Traceback (most recent call last):
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
    TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,

The test that you included in your README works mostly... The dependencies check out, but I get a similar set of warnings:

Tue Apr 16 12:02:56 EDT 2024	Start to find LTR candidates.

Tue Apr 16 12:02:56 EDT 2024	Identify LTR retrotransposon candidates from scratch.

Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.
Tue Apr 16 12:03:25 EDT 2024	Finish finding LTR candidates.

Tue Apr 16 12:03:25 EDT 2024	Start to find SINE candidates.

Tue Apr 16 12:04:07 EDT 2024	Warning: The SINE result file has 0 bp!

Tue Apr 16 12:04:07 EDT 2024	Start to find LINE candidates.

Tue Apr 16 12:04:07 EDT 2024	Identify LINE retrotransposon candidates from scratch.

Tue Apr 16 12:05:15 EDT 2024	Warning: The LINE result file has 0 bp!

Tue Apr 16 12:05:15 EDT 2024	Start to find TIR candidates.

Tue Apr 16 12:05:15 EDT 2024	Identify TIR candidates from scratch.

Species: others
Traceback (most recent call last):
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/TIR-Learner3.0.py", line 80, in <module>
    TIRLearner_instance = TIRLearner(genome_file, genome_name, species, TIR_length,
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 81, in __init__
    self.execute()
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 121, in execute
    self.execute_M4()
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/main.py", line 672, in execute_M4
    self["base"] = CNN_predict.execute(self)
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 114, in execute
    df = predict(df, TIRLearner_instance.genome_file_path,
  File "/mnt/ufs18/rs-004/edgerpat_lab/EDTA/bin/TIR-Learner3.0/bin/CNN_predict.py", line 62, in predict
    model = load_model(path_to_model)
  File "/usr/local/lib/python3.10/site-packages/keras/src/saving/saving_api.py", line 262, in load_model
    return legacy_sm_saving_lib.load_model(
  File "/usr/local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/framework/function_def_to_graph.py", line 278, in function_def_to_graph_def
    input_shape = input_shape.as_proto()
AttributeError: as_proto
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /mnt/ufs18/rs-004/edgerpat_lab/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list genome.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequence
Filter sequence based on TEsorter classifications. Unclassified sequences will also be output to the clean file.
	Usage: perl cleanup_misclas.pl sequence.fa.rexdb.cls.tsv
	Author: Shujun Ou (shujun.ou.1@gmail.com) 10/11/2019
	
mv: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.ext30.fa.pass.fa.dusted.cln.cln.list': No such file or directory
cp: cannot stat 'genome.fa.mod.TIR.intact.raw.fa.anno.list': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Tue Apr 16 12:05:31 EDT 2024	Start to find Helitron candidates.

Tue Apr 16 12:05:31 EDT 2024	Identify Helitron candidates from scratch.

@sjteresi
Copy link
Author

Currently re-trying with a fresh Anaconda installation and conda environment.

@oushujun
Copy link
Owner

oushujun commented Apr 17, 2024 via email

@sjteresi
Copy link
Author

Hi Shujun,

I got the latest version of EDTA to complete the system test, still running it on my genomes. I will report back. I had to install bedtools and samtools on top of the conda environment for this latest upgrade. I did not see those being specified in the yml file, and I was having trouble making the basic install work. Perhaps I am wrong and messed up the install, or maybe they were pre-loaded on your computing cluster system so they were missed. Either way, I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants