Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DESeq2 QC issue linked to --count_col parameter #754

Closed
lianov opened this issue Jan 25, 2022 · 4 comments
Closed

DESeq2 QC issue linked to --count_col parameter #754

lianov opened this issue Jan 25, 2022 · 4 comments
Labels
bug Something isn't working
Milestone

Comments

@lianov
Copy link
Member

lianov commented Jan 25, 2022

Description of the bug

An issue related to the DESeq2 QC step was noted with the following error:

Error in h(simpleError(msg, call)) : 
    error in evaluating the argument 'x' in selecting a method for function 'ncol': non-numeric variable(s) in data frame: gene_name
  Calls: DESeqDataSetFromMatrix -> stopifnot -> ncol -> Math.data.frame

Where the second column of the file being quantified is being used at this step for quantification, which should not be as it's the gene_name column. This call is controlled by the --count_col parameter from the deseq2 qc script - see

make_option(c("-f", "--count_col" ), type="integer" , default=2 , metavar="integer", help="First column containing sample count data." ),

where the default value is 2 (explaining the error), however the pipeline's module configuration:

args = "--id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 3"

does pass a value of 3 to this process.

From the .command.sh script within the working directory of this process, I confirmed that this parameter is missing:

deseq2_qc.r \
    --count_file salmon.merged.gene_counts_length_scaled.tsv \
    --outdir ./ \
    --cores 6

Interestingly, from the test run (using test profile) calls the module.config correctly for this process:

deseq2_qc.r \
    --count_file salmon.merged.gene_counts_length_scaled.tsv \
    --outdir ./ \
    --cores 6 \
    --id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 3

The issue was resolved by explicitly calling the missing parameters in the configuration file:

process {
    withName: DESEQ2_QC_STAR_SALMON {
        ext.args = "--id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 3"
    }

    withName: DESEQ2_QC_SALMON {
        ext.args = "--id_col 1 --sample_suffix '' --outprefix deseq2 --count_col 3"
    }
}

Importantly this was only observed with v3.5 of the rnaseq pipeline (this works fine in 3.4)

I have also discussed this issue with another member of the community in slack: https://nfcore.slack.com/archives/CE8SSJV3N/p1642190261114100

Command used and terminal output

nextflow run nf-core/rnaseq \
    -r 3.5 \
    -profile singularity \
    -c ./custom.config # see config in relevant files

Relevant files

files_for_issue.zip

System information

Nextflow version: version 21.10.6
Hardware: HPC
Executor: local (but have used slurm for other runs as well)
Container engine: Singularity
OS: CentOS Linux,
Version of nf-core/rnaseq: 3.5

@lianov lianov added the bug Something isn't working label Jan 25, 2022
@drpatelh drpatelh added this to the 3.6 milestone Feb 7, 2022
drpatelh added a commit to drpatelh/nf-core-rnaseq that referenced this issue Feb 20, 2022
@drpatelh
Copy link
Member

Hi @lianov ! Thank you for all of the information! So the easy fix here was to change the default value of --count_col 3 in bin/deseq2_qc.r script because that is the value use by all processes anyway. This means even if the configuration doesn't get through the defaults will carry the process through. Added in 47814c9

I am still a little confused as to why this is happening though? 🤔 I can confirm that the settings below are being found by the module for the test data so why would it be different in any other scenario?

rnaseq/conf/modules.config

Lines 616 to 623 in 646723c

withName: DESEQ2_QC_STAR_SALMON {
ext.args = [
"--id_col 1",
"--sample_suffix ''",
"--outprefix deseq2",
"--count_col 3",
params.deseq2_vst ? '--vst TRUE' : ''
].join(' ').trim()

Also, the code you linked to here is from v3.4 of the pipeline and not v3.5:

image

I will close this because the underlying issue should be solved but it would be worth doing some more digging to see if there is anything more sinister going on. Maybe @mahesh-panchal can take a look too.

@lianov
Copy link
Member Author

lianov commented Feb 20, 2022

Thanks for changing the default. I agree this is very odd behavior and if I note this in any other process I will be sure to add an issue. And yes sorry for the miss-link as I was looking closely at both versions and miss-linked it here 😅, but glad my overall message was clear.

@mahesh-panchal
Copy link
Member

I think the sinister thing going on here is setting params using a custom config. If the params are set using the -params-file option, it should work correctly.
To explain a bit more, although params from a custom config have higher priority than the settings in nextflow.config, it appears that they're not read in yet when nextflow.config is being parsed, so they don't take effect in the config (unlike setting from CLI or -params-file which have already been parsed). So nextflow.config is still using the default setting when evaluating the if statements, and then using the updated params values when running the workflow after parsing -c.

@mahesh-panchal
Copy link
Member

Seeing the issue far too often, so I made an issue nextflow-io/nextflow#2662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants