Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GMM-Demux #5641

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions modules/nf-core/gmmdemux/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json
name: "gmmdemux"
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- "bioconda::gmm-demux=0.2.2.3"
71 changes: 71 additions & 0 deletions modules/nf-core/gmmdemux/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@

process GMMDEMUX {
tag "$meta.id"
label 'process_low'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/gmm-demux:0.2.2.3--pyh7cba7a3_0':
'biocontainers/gmm-demux:0.2.2.3--pyh7cba7a3_0' }"

input:
tuple val(meta), path(cell_hashing_barcodes,stageAs: "hto_files/*"), path(cell_hashing_matrix,stageAs: "hto_files/*"),path(cell_hashing_features,stageAs: "hto_files/*"),val(hto_names)
val csv
val num_cells
val output_dir
val full_report
val simplified_report
val examine
val ambigous
val extract
val skip
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't some of these (e.g. skip) be paths?


output:
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
tuple val(meta), path('test/barcodes.tsv.gz'), emit: barcodes
tuple val(meta), path('test/*.mtx.gz'), emit: matrix
tuple val(meta), path('test/features.tsv.gz'), emit: features
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
//since this tool has many optional inputs that can be passed in, we need to check if they are null or not
// in order to produce or not certain reports or outputs
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def n_cells = num_cells ? "--summary $num_cells" : ""
def output_path = output_dir ? "-o $prefix" : ""
def full_rep = full_report ? "-f $prefix" : ""
def simplified_rep = simplified_report ? "--simplified $simplified_report" : ""
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
def examine = examine ? "--examine $examine" : ""
def extract = extract ? "--extract $extract" : ""
def skip = skip ? "--skip $skip" : ""
def type = csv ? "-c" : ""
def ambigous = examine ? "--ambigous $ambigous " : ""
def VERSION = '0.2.2.3' // WARN: Version information not provided by tool on CLI. Please update version string below when bumping container versions.
"""
GMM-demux $type hto_files $hto_names $output_path $full_rep $n_cells $simplified_rep $examine $ambigous $extract $skip $args
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
cat <<-END_VERSIONS > versions.yml
"${task.process}":
GMM-Demux: $VERSION
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def VERSION = '0.2.2.3'
"""
mkdir test
touch test/barcodes.tsv.gz
touch test/features.tsv.gz
touch test/matrix.mtx.gz
Comment on lines +56 to +59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir test
touch test/barcodes.tsv.gz
touch test/features.tsv.gz
touch test/matrix.mtx.gz
touch barcodes.tsv.gz
touch features.tsv.gz
touch matrix.mtx.gz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with the suggested lines, but it crashes the test for stub

Copy link
Contributor

@tstoeriko tstoeriko Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the paths just need to be updated to match the new output patterns, then it should hopefully work.

Suggested change
mkdir test
touch test/barcodes.tsv.gz
touch test/features.tsv.gz
touch test/matrix.mtx.gz
mkdir "${prefix}"
touch "${prefix}/barcodes.tsv.gz"
touch "${prefix}/features.tsv.gz"
touch "${prefix}/matrix.mtx.gz"
touch "${prefix}/classification_report_${prefix}"
touch "summary_report_${prefix}.txt"



cat <<-END_VERSIONS > versions.yml
"${task.process}":
GMM-Demux: $VERSION
END_VERSIONS
"""
}
110 changes: 110 additions & 0 deletions modules/nf-core/gmmdemux/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json
name: "gmmdemux"

description: GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq).
keywords:
- demultiplexing
- hashing-based deconvoltion
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
- single-cell
tools:
- "gmmdemux":
description: "GMM-Demux is a Gaussian-Mixture-Model-based software for processing sample barcoding data (cell hashing and MULTI-seq)."
homepage: "https://pypi.org/project/GMM-Demux/"
documentation: "https://github.com/CHPGenetics/GMM-Demux"
tool_dev_url: "https://github.com/CHPGenetics/GMM-demux"
doi: "10.1186/s13059-020-02084-2"
licence: ["MIT"]

input:
# Only when we have meta
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
- cell_hashing_matrix:
type: file
description: path to file containing matrix from cell hashing data, the tool receives either CSV files or TSV, type must be specified using parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the typue MUST be specified using parameters, then this is OK to have as an input cahnnel. Everythin else via ext.args (as I've said a few times above 😬 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, in this case this are the input files, originally the tool gets only one path to the directory where these 3 files are stored.
nf-core cannot receive the path to the directory stored in test-datasets, that's why we need the 3 paths to create an intermediate folder which is later given as input

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tool is noramlly meant to accept a directory:

  1. Package the three test files into a directory and gzip it
  2. Upload to test-datasets
  3. Add to the test a 'setup' block where you specify the gzip archive and use the GUNZIP module to extract it
  4. Pass GUNZIP.out.archive (or w/e it is) as the (single) input to the module

- cell_hashing_barcodes:
type: file
description: path to file containing barcodes from cell hashing data, the tool receives either CSV files or TSV, type must be specified using parameters
- cell_hashing_features:
type: file
description: path to file containing features from cell hashing data, the tool receives either CSV files or TSV, type must be specified using parameters
- csv:
type: string
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
description: |
Take input in csv format, instead of mmx format.
- hto_names:
type: string
description: |
Comma separated list of HTO names, without whitespace
- num_cells:
type: integer
description: |
Generate the statstic summary of the dataset. Requires an estimated total number of cells in the assay as input.
- output_dir:
type: string
description: |
The path for storing the Same-Sample-Droplets (SSDs).
- full_report:
type: string
description: |
Generate the full classification report. Require a path argument.
- simplified_report:
type: string
description: |
Generate the simplified classification report. Require a path argument.
- examine:
type: file
description: |
Provide the cell list. Requires a file argument. Only executes if num_cells is set
- skip:
type: string
description: |
Load a full classification report and skip the mtx folder as input. Require a path argument.
- extract:
type: string
description: |
Names of the sample barcoding tag(s) to extract, separated by ','. Joint tags are linked with '+'.
- ambigous:
type: integer
description: |
The estimated chance of having a phony GEM getting included in a pure type GEM cluster by the clustering algorithm.Only if exclude is executed
output:
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
#Only when we have meta
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- barcodes:
type: file
description: |
barcodes tsv file with removed cell-hashing-identifiable multiplets
pattern: "test/*.tsv.gz"
- matrix:
type: file
description: |
matrix mtx.tsv file with removed cell-hashing-identifiable multiplets
pattern: "test/*.tsv.gz"
- features:
type: file
description: |
features tsv file with removed cell-hashing-identifiable multiplets
pattern: "test/*.tsv.gz"
authors:
- "@mari-ga"
- "@maxozo"
- "@wxicu"
- "@Zethson"
maintainers:
- "@mari-ga"
- "@maxozo"
- "@wxicu"
- "@Zethson"
10 changes: 10 additions & 0 deletions modules/nf-core/gmmdemux/nextflow.config
mari-ga marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
params{
// Optional parameters for GMM-demux - Default values
threshold = 0.8
}

process {
withName: GMMDEMUX {
ext.args = " -t ${params.threshold} "
}
}
86 changes: 86 additions & 0 deletions modules/nf-core/gmmdemux/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
// nf-core modules test gmmdemux
nextflow_process {

name "Test Process GMMDEMUX"
script "../main.nf"
process "GMMDEMUX"

tag "modules"
tag "modules_nfcore"
tag "gmmdemux"


test("Standard_Multiome - 10x mtx") {
when {
process {
"""

input[0] = [
[ id:'test'],
file(params.modules_testdata_base_path + "/genomics/homo_sapiens/10xgenomics/cellranger/hashing_demultiplexing/hto/barcodes.tsv.gz",checkIfExists: true),
file(params.modules_testdata_base_path + "/genomics/homo_sapiens/10xgenomics/cellranger/hashing_demultiplexing/hto/matrix.mtx.gz",checkIfExists: true),
file(params.modules_testdata_base_path + "/genomics/homo_sapiens/10xgenomics/cellranger/hashing_demultiplexing/hto/features.tsv.gz",checkIfExists: true),
"MS-11,MS-12"
]
input[1] = ""
input[2] = ""
input[3] = "True"
input[4] = ""
input[5] = ""
input[6] = ""
input[7] = ""
input[8] = ""
input[9] = ""

"""
}
}

then {
assertAll(
{ assert process.success },
{ assert path(process.out.barcodes.get(0).get(1)).exists() },
mari-ga marked this conversation as resolved.
Show resolved Hide resolved

)
}

}

test("Standard_Multiome - 10x mtx - Stub") {

options "-stub"

when {
process {
"""

input[0] = [
[ id:'test'],
file(params.modules_testdata_base_path + "/genomics/homo_sapiens/10xgenomics/cellranger/hashing_demultiplexing/hto/barcodes.tsv.gz",checkIfExists: true),
file(params.modules_testdata_base_path + "/genomics/homo_sapiens/10xgenomics/cellranger/hashing_demultiplexing/hto/matrix.mtx.gz",checkIfExists: true),
file(params.modules_testdata_base_path + "/genomics/homo_sapiens/10xgenomics/cellranger/hashing_demultiplexing/hto/features.tsv.gz",checkIfExists: true),
"MS-11,MS-12"
]
input[1] = ""
input[2] = ""
input[3] = ""
input[4] = ""
input[5] = ""
input[6] = ""
input[7] = ""
input[8] = ""
input[9] = ""
"""
}
}

then {
assertAll(
{ assert process.success },
{ assert path(process.out.barcodes.get(0).get(1)).exists() },
)
}

}

}
13 changes: 13 additions & 0 deletions modules/nf-core/gmmdemux/tests/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
params{
// Optional parameters for GMM-demux - Default values
threshold = 0.8

}

process {

withName: GMMDEMUX {
ext.args = " -t ${params.threshold} "
}

}
2 changes: 2 additions & 0 deletions modules/nf-core/gmmdemux/tests/tags.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
gmmdemux:
- "modules/nf-core/gmmdemux/**"
Loading