Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add starting deseq2 differential module #2167

Merged
merged 65 commits into from
Oct 20, 2022

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Oct 6, 2022

This PR adds a wrapper for DESeq2. Notes:

  • Implemented as a template, initially due to need to include a wrapper script and current lack of module bin directory (see https://nfcore.slack.com/archives/CJRH30T6V/p1665066826554919)
    • I think this actually works better than use of a module bin, because we don't need an additional library to provide a CLI! So here we can just use the biocontainer for DESeq2 and nothing else, no need to create bespoke containers etc.
  • Optional params can be supplied via the usual task.ext.args route, and are handled within the template.
  • I've defined contrasts with a groovy map, in an analagous way to sample meta in other workflows, comprising:
    • variable - a variable from the supplied sample sheet that can be used to group samples
    • reference - level of contrast variable indicating 'reference' samples.
    • target - level of contrast variable indicating 'target'/ 'treatment' samples.
    • blocking - comma-separated list of blocking variables to be included in the model.
  • A test expression matrix was added to the test data here.
  • Tiny variations between machines at the 12th decimal point (or so) in some test outputs were causing the CI to fail. So I've just added some very limited rounding to ensure the reproducibility, which can be activated specifically in the CI context.

PR checklist

Tackling #2155

  • Adds module for differential expression analysis with DESeq2
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

@pinin4fjords
Copy link
Member Author

@nf-core-bot fix-linting

@pinin4fjords
Copy link
Member Author

@nf-core-bot fix linting

@nvnieuwk
Copy link
Contributor

nvnieuwk commented Oct 7, 2022

Did you use the nf-core modules create tool to create this module? The test files are absent in this PR (which is why some of the actions aren't running)

@pinin4fjords
Copy link
Member Author

Did you use the nf-core modules create tool to create this module? The test files are absent in this PR (which is why some of the actions aren't running)

I'll get there- just a WIP for visibility and discussion, I'll get things tested before requesting formal review.

@nvnieuwk
Copy link
Contributor

nvnieuwk commented Oct 7, 2022

Maybe change this to a draft PR then :)

@nvnieuwk nvnieuwk added the WIP Work in progress label Oct 7, 2022
@pinin4fjords pinin4fjords marked this pull request as draft October 7, 2022 13:38
@pinin4fjords
Copy link
Member Author

@nf-core-bot fix linting

@pinin4fjords
Copy link
Member Author

@nf-core-bot fix linting

@pinin4fjords
Copy link
Member Author

(note: singularity failure is just because we're waiting for a build to propagate from biocontainers)

Copy link
Member

@ggabernet ggabernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! As mentioned in slack there are some features missing that we would like to include but it's good to start with a minimal POC!
One thing that could be done to also test the blocking variables and that the execution with multiple contrasts works fine would be to add another contrast in the example data with a blocking variable provided.

fix typo

Co-authored-by: WackerO <43847497+WackerO@users.noreply.github.com>
@pinin4fjords
Copy link
Member Author

Thanks @ggabernet! I'm just going to switch the inputs to be keyed by the meta- as per feedback in #2321, will do that quickly now.

@pinin4fjords
Copy link
Member Author

One thing that could be done to also test the blocking variables and that the execution with multiple contrasts works fine would be to add another contrast in the example data with a blocking variable provided.

@ggabernet Good point- modifications to the test data here: nf-core/test-datasets#670

@ggabernet
Copy link
Member

Actually this would be important only in the case that the files are initially provided to the pipeline with a samplesheet (e.g. multiple fastq files to analyze, each of them with individual sample metadata that is passed to the meta map).
In the case of the differentialabundance pipeline, we expect to have just 1 count table with all samples to analyze. The samplesheet containing info about the different samples will be provided to the module input. Therefore I would argue that we will not need the meta map here.

In the case of the other module that you mentioned #2321 that operates on reference data (e.g. reference GTF or fasta) and not on sample (sequencing) files, the meta maps would not be needed either.

@pinin4fjords
Copy link
Member Author

pinin4fjords commented Oct 20, 2022

Thanks @ggabernet (and apologies I just saw this). I was confused in the other review - it seemed like the meta thing was pretty non-negotiable (though as it turned out I think there was a use for it there).

In the latest commits I've put the inputs in a single tuple with the 'contrast meta' as the 'meta' rather than a separate channel for the contrast meta, which I think is nicer, but happy to reverse that if you like.

(and the additional contrast with blocking factor is now tested too)

Copy link
Member

@ggabernet ggabernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sure, looks good to me like this too!

@pinin4fjords
Copy link
Member Author

Thanks for the review @ggabernet !

@pinin4fjords pinin4fjords merged commit 72900e2 into nf-core:master Oct 20, 2022
@pinin4fjords pinin4fjords deleted the deseq2_differential branch October 20, 2022 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new module: deseq2/differential
6 participants