Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated TrimGalore for multi-core support #65

Merged
merged 5 commits into from Nov 21, 2019
Merged

Conversation

drewjbeh
Copy link
Contributor

Many thanks to contributing to nf-core/atacseq!

Please fill in the appropriate checklist below (delete whatever is not relevant). These are the most common things requested on pull requests (PRs).

PR checklist

  • This comment contains a description of changes (with reason)
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Documentation in docs is updated
  • CHANGELOG.md is updated

Learn more about contributing: https://github.com/nf-core/atacseq/tree/master/.github/CONTRIBUTING.md

Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thanks @drewjbeh . Just some very pedantic changes so I can sleep at night 😄

CHANGELOG.md Outdated
@@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

* [#118](https://github.com/nf-core/chipseq/issues/118) - Running on with SGE
* Make executables in `bin/` compatible with Python 3
* [#63](https://github.com/nf-core/atacseq/issues/63) - Added multicore support for Trim Galore!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put this in the Added section instead?

CHANGELOG.md Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
main.nf Outdated Show resolved Hide resolved
@drpatelh
Copy link
Member

Also, if you feel inspired at any point would be great if you can make a similar PR to nf-core/chipseq. Should be pretty much the same.

drewjbeh and others added 4 commits November 21, 2019 15:56
Co-Authored-By: Harshil Patel <drpatelh@users.noreply.github.com>
Co-Authored-By: Harshil Patel <drpatelh@users.noreply.github.com>
Co-Authored-By: Harshil Patel <drpatelh@users.noreply.github.com>
@drewjbeh
Copy link
Contributor Author

All done! Thanks for your help - and it was a pleasure.

Will look at nf-core/chipseq tomorrow!

@drpatelh
Copy link
Member

Great. Ill wait for the Travis CI tests to pass and then merge 👍

@ewels
Copy link
Member

ewels commented Nov 23, 2019

Hi chaps,

I think that this change may break things on strict systems... --cores 2 uses more than 2 cpus, see https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md#full-list-of-options-for-trim-galore

Actual core usage: It should be mentioned that the actual number of cores used is a little convoluted. Assuming that Python 3 is used and pigz is installed, --cores 2 would use 2 cores to read the input (probably not at a high usage though), 2 cores to write to the output (at moderately high usage), and 2 cores for Cutadapt itself + 2 additional cores for Cutadapt (not sure what they are used for) + 1 core for Trim Galore itself. So this can be up to 9 cores, even though most of them won't be used at 100% for most of the time. Paired-end processing uses twice as many cores for the validation (= writing out) step. --cores 4 would then be: 4 (read) + 4 (write) + 4 (Cutadapt) + 2 (extra Cutadapt) + 1 (Trim Galore) = 15, and so forth.

@drpatelh
Copy link
Member

Hmmm...yes, I breezed over that hoping for the best! Maybe we need to do some funky maths based on the value of $task.cpus...

@drpatelh
Copy link
Member

God forbid we specify an even number of cores 😅

@ewels
Copy link
Member

ewels commented Nov 23, 2019

Haha, yes.. It’s the same for the Bismark suite in the methylseq pipeline too. Once you know how many cpus are used per core you should just be able to copy and paste from there.

@drpatelh
Copy link
Member

drpatelh commented Nov 23, 2019

Bit confused by the difference in usage between SE and PE. Aside from the 2 extra cutadapt cores and the 1 for TrimGalore it seems that the calculation would be the same for SE and PE?

e.g.
SE and --cores 3 = 3 (read) + 3 (write) + 3 (cutadapt) + 2 (cutadapt extra) + 1 (TrimGalore) = 12 total
PE and --cores 3 = 3 (read) + 3 (write) + 3 (cutadapt) + 2 (cutadapt extra) + 1 (TrimGalore) = 12 total

Had a look at the methylseq pipeline but TrimGalore shouldn't be that fussed for the memory requirements as bismark is because read trimming is independent from the size of the genome? It has run fine with 1 CPU and 7GB of RAM for large fastqs up until now. Maybe there is a lower limit...

@FelixKrueger any advice before we construct the logic would be peachy 😄

@FelixKrueger
Copy link

Hi Harshil,

I would agree that SE and PE should have the same requirements in terms of CPU. And memory usage should indeed not be any issue. I could check next time I run it but I don't think it will ever exceed more than 1 or 2GB. Have never tested this thoroughly in multicore mode though. Does this help?

@drpatelh
Copy link
Member

Thanks for the rapid response @FelixKrueger 😎

I have updated my representative example for 3 cores for clarity. Is it still correct? If so, then I think that should be enough to work out --cores from the specified CPUs.

@ewels
Copy link
Member

ewels commented Nov 25, 2019

I would agree that SE and PE should have the same requirements in terms of CPU.

@FelixKrueger but your documentation says that PE needs more cores?

Paired-end processing uses twice as many cores for the validation (= writing out) step.
https://github.com/FelixKrueger/TrimGalore/blob/master/Docs/Trim_Galore_User_Guide.md

@drpatelh
Copy link
Member

Yes, it seems that either the documentation or the logic isnt right. Assumed it was the documentation based on @FelixKrueger comment.

@FelixKrueger
Copy link

Sorry I was stuck in a meeting. As a very detailed breakdown I would say the core usage is the following:

[process: cores / load]:

Trimming process:

  • trim_galore: 1 (high)
  • gunzip stream: 1 (low)
  • cutadapt: # cores given with -j (high)
  • write to gzip stream: 1 (low)

So in default mode (--cores 1) I would expect 2 cores at high load, and up to 2 cores for (de-)compression, albeit at negligible load for both SE and PE. --cores 2 would then formally require 5 cores, --cores 3 6, --cores 4 7 etc.

For paired-end only, there is a validation process:

  • trim_galore: 1 (high)
  • gunzip stream (decompression): 2 (low)
  • gzip stream (compression): 2 (medium) (I/O dependent)

So if you are extremely strict then this would amount to 5 cores, although in terms of load this is hardly fair as it is really reading/writing 2 files as a single pass. (If the user chooses to also output unpaired reads there might be another 2 write streams on low load in addition to that).

Maybe a very strict the formula you would want to be using would then be for
single-end: ( 3 + # cores [given by -j/--cores])
paired-end: ( 4 + # cores [given by -j/--cores])

I have just run a test run on 2 million paired-end reads, the memory consumption was 551M.

@drpatelh
Copy link
Member

drpatelh commented Nov 25, 2019

Thanks @FelixKrueger Based on that breakdown maybe the formula I used isnt correct then:

tcores = (((task.cpus as int) - 3) / 3) as int

We will have to come up with an alternative such as the below?

def cores = 1
if (task.cpus) {
    def tcores = ((task.cpus as int) - 3) as int
    if (!params.single_end) {
        tcores = ((task.cpus as int) - 4) as int
    }
    if (tcores > 1) {
        cores = tcores
    }
}

In any case, Im wondering whether we should cap it so that the max value of --cores=4. Just so that we are somewhat guarded against providing a large number of cores and I/O becoming the bottleneck?

@FelixKrueger
Copy link

I think limiting the number of cores to 4 is a good idea, as in my tests I found that:

It seems that --cores 4 could be a sweet spot, anything above has diminishing returns.

(from trim_galore --help)

ewels added a commit to ewels/nf-core-methylseq that referenced this pull request Dec 10, 2019
@ewels
Copy link
Member

ewels commented Dec 10, 2019

I just had a go at doing this in nf-core/methylseq#137 - I used a slightly refactored version of your code @drpatelh:

cores = 1
if(task.cpus){
    cores = (task.cpus as int) - 4
    if (params.single_end) cores = (task.cpus as int) - 3
    if (cores < 1) cores = 1
    if (cores > 4) cores = 4
}

@drpatelh

This comment has been minimized.

@ewels

This comment has been minimized.

@drpatelh
Copy link
Member

@FelixKrueger I just want to double-check the implementation below is ok before I roll it out to the chipseq and atacseq pipelines too:
https://github.com/nf-core/methylseq/blob/4de49c2780de070bafe6b94ae0c709b5af5aa735/main.nf#L491-L497

Thank you!

@FelixKrueger
Copy link

Hi Harshil,

That looks exactly like what you want to be doing. Good luck!

@drpatelh
Copy link
Member

Great! Thanks @FelixKrueger 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants