Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deseq2 spike handling #2534

Merged
merged 14 commits into from
Nov 23, 2022
Merged

Deseq2 spike handling #2534

merged 14 commits into from
Nov 23, 2022

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Nov 21, 2022

This PR adds the ability to generate size factors in DESeq2 using a limited set of control genes, as is sometimes performed with technical controls such as ERCC spike-ins.

Based on https://support.bioconductor.org/p/9143354/ and the DESeq2 docs.

Note that if spikes are present, but the spike handling option is not selected, then the spikes are stripped so they do not participate in normalisation (since they could generate artefacts if left in place).

PR checklist

Closes #2440

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

@pinin4fjords pinin4fjords requested review from WackerO and a team November 21, 2022 15:18
@pinin4fjords pinin4fjords changed the title Deseq2 spikes Deseq2 spike handling Nov 21, 2022
Copy link
Contributor

@edmundmiller edmundmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks great to me! I'm wondering if controlling it with the nextflow config is going to be more hassle, or if that should live in the script directive and be configured based on the input params. Instead, right now the end user has to figure out how to use it.

@pinin4fjords
Copy link
Member Author

pinin4fjords commented Nov 21, 2022

Code looks great to me! I'm wondering if controlling it with the nextflow config is going to be more hassle, or if that should live in the script directive and be configured based on the input params. Instead, right now the end user has to figure out how to use it.

@emiller88 Could you expand on that a bit? It can't be automatically configured based on whether or not spikes are applied, because there are two modes:

  1. Supply the spikes, but just so we can strip them from the matrix so they don't bias the global normalisation
  2. Supply the spikes, so they can be used to generate scaling factors

... which is what --sizefactors_from_controls in the nextflow.config differentiates. How would you like to see that change?

Edit: I've added info on --sizefactors_from_controls to the meta.yaml. That's a bit non-standard, but I think justified given that this is a parameter I devised for the wrapper, rather than something native to DESeq2.

@pinin4fjords
Copy link
Member Author

@emiller88 I'm going to go ahead and merge this with the two approvals. Happy to make adjustments in future PRs if you get time to expand on your suggestion at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DESeq2 module should be able to handle spike-ins
3 participants