Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identifying TIR uses only one CPU #55

Closed
augustold opened this issue Feb 10, 2020 · 8 comments
Closed

Identifying TIR uses only one CPU #55

augustold opened this issue Feb 10, 2020 · 8 comments
Labels
question Further information is requested

Comments

@augustold
Copy link

Hi Shujun,
I am running EDTA.pl in a conda environment using --threads 30. The 'Identify LTR' step finished in less than one day and the 'Identify TIR' has been running for six days now. I've also noticed that this process is using only one CPU. Is it normal?

##### Extensive de-novo TE Annotator (EDTA) v1.7.9  ####
##### Shujun Ou (shujun.ou.1@gmail.com)             ####
########################################################



Mon Feb  3 19:37:57 -02 2020    Dependency checking:
                                All passed!

Mon Feb  3 19:38:41 -02 2020    Obtain raw TE libraries using various structure-based programs:
Mon Feb  3 19:38:41 -02 2020    EDTA_raw: Check dependencies, prepare working directories.

Mon Feb  3 19:38:53 -02 2020    Start to find LTR candidates.

Mon Feb  3 19:38:53 -02 2020    Identify LTR retrotransposon candidates from scratch.

Use of uninitialized value $chr_pre in hash element at /home/augustold/miniconda3/envs/EDTA/share/LTR_retriever/bin/call_seq_by_list.pl line 86.
Tue Feb  4 13:07:26 -02 2020    Finish finding LTR candidates.

Tue Feb  4 13:07:26 -02 2020    Start to find TIR candidates.

Tue Feb  4 13:07:26 -02 2020    Identify TIR candidates from scratch.

Species: others

Best wishes and thank you for providing this tool.

@oushujun
Copy link
Owner

oushujun commented Feb 10, 2020 via email

@augustold
Copy link
Author

Yes. There are files being produced in the folder '*.fa.mod.EDTA.raw/TIR/Module3_New/TIR-Learner'. I'll keep waiting.
Thank you for the quick reply.
Augusto

@oushujun oushujun added the question Further information is requested label Feb 18, 2020
@oushujun
Copy link
Owner

oushujun commented Mar 9, 2020

Hi Augusto,

Just want to check back, did you finish the run successfully?

Thanks,
Shujun

@augustold
Copy link
Author

Hi Shujun,
Unfortunately, I had to kill the process because after 30 days it didn't finish running. I'm dealing with a large polyploid genome and the assembly is highly fragmented. When I run EDTA.pl on a small slice of contigs, it works fine. What do you think about running the pipeline on subsets of contigs?

Thanks,
Augusto

@oushujun
Copy link
Owner

oushujun commented Mar 9, 2020

Hi Augusto,

Thanks for your feedback. Both genome size and fragment number are deterministic for the run time of TIR-Learner. I've run through the chromosomal-scaffold wheat genome and the TIR part took about 26 days (#61). I also tested on the apple genome which is not big but with 127k sequences and it took about 10 days with large memory (>128Gb and I allocate 1 TB on it). You may split the genome into different batches such as split by ploidy, and combine them afterward (also see #61 and other issues).

Best,
Shujun

@oushujun
Copy link
Owner

Close due to no activity. Please reopen if the issue persists.

@Dichopsis
Copy link

Hi @oushujun,
I have approximately the same problem, except that I have 0 process corresponding to the program. I don't have any error message either. Penaeus vanamei genome (1.6 Gb)

nohup singularity exec -B /mystore /biolo/EDTA/1.9.5/edta.sif EDTA.pl --genome GCF_003789085.1_ASM378908v1_genomic.fna --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 48 > run_EDTA.out 2>&1 &

########################################################

Extensive de-novo TE Annotator (EDTA) v1.9.4
Shujun Ou (shujun.ou.1@gmail.com)

########################################################

Mon Mar 15 16:45:20 CET 2021 Dependency checking:
All passed!

Mon Mar 15 16:45:34 CET 2021 The longest sequence ID in the genome contains 136 characters, which is longer than the limit (15)
Trying to reformat seq IDs...
Attempt 1...
Mon Mar 15 16:45:42 CET 2021 Seq ID conversion successful!

Mon Mar 15 16:45:42 CET 2021 Obtain raw TE libraries using various structure-based programs:
Mon Mar 15 16:45:42 CET 2021 EDTA_raw: Check dependencies, prepare working directories.

Mon Mar 15 16:45:47 CET 2021 Start to find LTR candidates.

Mon Mar 15 16:45:47 CET 2021 Identify LTR retrotransposon candidates from scratch.

@oushujun
Copy link
Owner

oushujun commented Mar 19, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants