Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GMM-Demux #5641

Closed
wants to merge 18 commits into from
Closed

Add GMM-Demux #5641

wants to merge 18 commits into from

Conversation

mari-ga
Copy link
Contributor

@mari-ga mari-ga commented May 21, 2024

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@mari-ga mari-ga requested a review from a team as a code owner May 21, 2024 13:08
@mari-ga mari-ga requested review from leoisl and removed request for a team May 21, 2024 13:08
Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below,

modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/meta.yml Show resolved Hide resolved
modules/nf-core/gmmdemux/meta.yml Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/meta.yml Outdated Show resolved Hide resolved
e.g. `[ id:'sample1', single_end:false ]`
- cell_hashing_matrix:
type: file
description: path to file containing matrix from cell hashing data, the tool receives either CSV files or TSV, type must be specified using parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the typue MUST be specified using parameters, then this is OK to have as an input cahnnel. Everythin else via ext.args (as I've said a few times above 😬 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, in this case this are the input files, originally the tool gets only one path to the directory where these 3 files are stored.
nf-core cannot receive the path to the directory stored in test-datasets, that's why we need the 3 paths to create an intermediate folder which is later given as input

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tool is noramlly meant to accept a directory:

  1. Package the three test files into a directory and gzip it
  2. Upload to test-datasets
  3. Add to the test a 'setup' block where you specify the gzip archive and use the GUNZIP module to extract it
  4. Pass GUNZIP.out.archive (or w/e it is) as the (single) input to the module

modules/nf-core/gmmdemux/tests/main.nf.test Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the commits includes changes in a bunch of files from other modules (cellranger, custom, mygene, rnatranscripts). Unless I'm missing something, I don't think these changes belong in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this isn't good...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a autoformatting thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very possible, I didn't touch any of those files 🤔 anything I can do to revert it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
Comment on lines 21 to 23
tuple val(meta), path('test/barcodes.tsv.gz'), emit: barcodes
tuple val(meta), path('test/*.mtx.gz' ), emit: matrix
tuple val(meta), path('test/features.tsv.gz'), emit: features
Copy link
Contributor

@tstoeriko tstoeriko May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the test in the path come from? It looks like all output files are currently being published in a directory called test within the actual output directory. Does the --output/-o flag only work for that one file? If you cannot specify the output directory properly, you might want to move the files to the right place after generating them (and delete unnecessary directories).
I think I would either
(1) rename all output files to contain the ${prefix} and publish them in the current directory OR
(2) put the .tsv.gz/mtx.gz files into a directory named ${prefix} (which kallisto quant or salmon quant seem to be doing)
Not sure which one would generally be preferred though.

Suggested change
tuple val(meta), path('test/barcodes.tsv.gz'), emit: barcodes
tuple val(meta), path('test/*.mtx.gz' ), emit: matrix
tuple val(meta), path('test/features.tsv.gz'), emit: features
tuple val(meta), path('barcodes.tsv.gz'), emit: barcodes
tuple val(meta), path('*.mtx.gz' ), emit: matrix
tuple val(meta), path('features.tsv.gz'), emit: features

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these outputs also generated, when the -x flag is used? If not, they need to be optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, but these are generated by the tool, even if no other parameters are added inside a default folder like the author says:

Output

The default content in the output folder are the non-MSM droplets (SSDs), stored in MTX format. The output shares the same format with CellRanger 3.0. By default, the output is stored in SSD_mtx folder. The output location can be overwritten with the -o flag.

modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
Comment on lines 15 to 18
val examine
val ambigous
val extract
val skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't some of these (e.g. skip) be paths?

modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
Copy link
Contributor

@tstoeriko tstoeriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all of my comments, this module seems to be a bit tedious to implement. I mainly left comments on main.nf for now, cause I think it makes sense to get this sorted out before I look at the tests in more detail.

modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/nextflow.config Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/main.nf Outdated Show resolved Hide resolved
Comment on lines +54 to +57
mkdir test
touch test/barcodes.tsv.gz
touch test/features.tsv.gz
touch test/matrix.mtx.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mkdir test
touch test/barcodes.tsv.gz
touch test/features.tsv.gz
touch test/matrix.mtx.gz
touch barcodes.tsv.gz
touch features.tsv.gz
touch matrix.mtx.gz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with the suggested lines, but it crashes the test for stub

Copy link
Contributor

@tstoeriko tstoeriko Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the paths just need to be updated to match the new output patterns, then it should hopefully work.

Suggested change
mkdir test
touch test/barcodes.tsv.gz
touch test/features.tsv.gz
touch test/matrix.mtx.gz
mkdir "${prefix}"
touch "${prefix}/barcodes.tsv.gz"
touch "${prefix}/features.tsv.gz"
touch "${prefix}/matrix.mtx.gz"
touch "${prefix}/classification_report_${prefix}"
touch "summary_report_${prefix}.txt"

modules/nf-core/gmmdemux/meta.yml Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/meta.yml Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/meta.yml Outdated Show resolved Hide resolved
modules/nf-core/gmmdemux/meta.yml Outdated Show resolved Hide resolved
@jfy133 jfy133 self-requested a review May 30, 2024 06:48
Comment on lines +18 to +23
tuple val(meta), path("${meta.id}/barcodes.tsv.gz" ), emit: barcodes
tuple val(meta), path("${meta.id}/*.mtx.gz" ), emit: matrix
tuple val(meta), path("${meta.id}/features.tsv.gz" ), emit: features
tuple val(meta), path("${meta.id}/classification_report_${meta.id}" ), emit: classification_report
tuple val(meta), path("summary_report_${meta.id}.txt" ), emit: summary_report, optional: true
path "versions.yml" , emit: versions
Copy link
Contributor

@tstoeriko tstoeriko Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tuple val(meta), path("${meta.id}/barcodes.tsv.gz" ), emit: barcodes
tuple val(meta), path("${meta.id}/*.mtx.gz" ), emit: matrix
tuple val(meta), path("${meta.id}/features.tsv.gz" ), emit: features
tuple val(meta), path("${meta.id}/classification_report_${meta.id}" ), emit: classification_report
tuple val(meta), path("summary_report_${meta.id}.txt" ), emit: summary_report, optional: true
path "versions.yml" , emit: versions
tuple val(meta), path("${prefix}/barcodes.tsv.gz") , emit: barcodes
tuple val(meta), path("${prefix}/*.mtx.gz") , emit: matrix
tuple val(meta), path("${prefix}/features.tsv.gz") , emit: features
tuple val(meta), path("${prefix}/classification_report_${prefix}"), emit: classification_report
tuple val(meta), path("summary_report_${prefix}.txt") , emit: summary_report, optional: true
path "versions.yml" , emit: versions

Comment on lines +36 to +39
if [[ ${summary_report} == true ]]; then
cat /dev/null > summary_report_${prefix}.txt
echo "summary report file created"
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary because the tool itself does not create the summary report file?

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def type_report = type_report ? "-f test/classification_report_${prefix}" : "-s test/classification_report_${prefix}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def type_report = type_report ? "-f test/classification_report_${prefix}" : "-s test/classification_report_${prefix}"
def type_report = type_report ? "-f ${prefix}/classification_report_${prefix}" : "-s ${prefix}/classification_report_${prefix}"

@jfy133
Copy link
Member

jfy133 commented Jun 24, 2024

@mari-ga I have a feeling this PR would be better to be started from scratch due to the large number of conflicts comments... would this be OK?

@mari-ga mari-ga closed this Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Ready for review
Development

Successfully merging this pull request may close these issues.

3 participants