Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running EDTA efficiently on a cluster #52

Closed
DanJeffries opened this issue Jan 31, 2020 · 4 comments
Closed

Running EDTA efficiently on a cluster #52

DanJeffries opened this issue Jan 31, 2020 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@DanJeffries
Copy link

Hi Shujun,

Just a quick question. I have completed some initial tests on a small fraction (~150Mb) of a ~5Gb genome and am ready to give the real thing a try! However, as I'm sure is the case for many users, I have to tactically dodge run-time limits whilst maximising the resources I can use on the various queues on my cluster. In my case for example I can run a job for 24 hours with a lot of resources, or a job for 10 days with limited resources. So one question I have is:

Can I independently and simultaneously run the TE library steps for tir, ltr and helitron (i.e. divide an conquer) into the same output folder and then use these for the final steps in a later job? Or is there something that would get confused if I did this?

Also if you have any other tips for maximising efficiency when constrained by cluster resources I'd be very happy to hear them. Specifically if you could give some guidance as to whether parallelism or memory are more important for each step that would already be very helpful!

Best wishes, and thanks again for an awesome tool and paper!

Dan

@oushujun
Copy link
Owner

Hi Dan,

Yes the devide and conquer step was designed for this purpose, you can run the three jobs at the same time in the same work dir. However, you may want to differ the start time for a minute or two because the current initial step will convert the genome IDs. Doing the conversion at the same time may cause problems. Each of these steps should be finished within a week, and the remaining steps should be done within a couple days.

Please let me know if you encounter any problems.

Shujun

@oushujun oushujun added the help wanted Extra attention is needed label Jan 31, 2020
@DanJeffries
Copy link
Author

DanJeffries commented Feb 2, 2020

Thanks Shujun,

I indeed encountered the issue with the step to convert the genome IDs, starting 1 job and waiting for the other two until the modified genome was created worked. Thanks!

I am now running this, although I am running into problems with lack of memory for the LTR step (suffixerator gets killed) and the Helitron step (run_helitron_scanner.sh gets killed). I initially allocated 16 cpus and 64Gb for these jobs, so trying with 128Gb now and will update final run times and resources when I have them.

For the TIR run I got some errors unrelated to memory though . . . perhaps you have an idea if this is important?

/scratch/axiom/FAC/FBM/DEE/nperrin/rana_genome/EDTA/EDTA/bin/TIR-Learner2.4/TIR-Learner2.4.sh: line 245: rsync: command not found
/scratch/axiom/FAC/FBM/DEE/nperrin/rana_genome/EDTA/EDTA/bin/TIR-Learner2.4/TIR-Learner2.4.sh: line 249: rsync: command not found
/scratch/axiom/FAC/FBM/DEE/nperrin/rana_genome/EDTA/EDTAenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is depre
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/scratch/axiom/FAC/FBM/DEE/nperrin/rana_genome/EDTA/EDTAenv/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is depre
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])

Thanks

Dan

@oushujun
Copy link
Owner

oushujun commented Feb 2, 2020 via email

@oushujun
Copy link
Owner

oushujun commented Feb 3, 2020

@DanJeffries rsync has been replaced by cp with a for loop as reflected in the v1.7.9 update. You don't need this update unless TIR results are not produced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants