Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new module: cellranger multi #3229

Merged
merged 32 commits into from
May 16, 2023
Merged

Conversation

klkeys
Copy link
Contributor

@klkeys klkeys commented Mar 31, 2023

PR checklist

Closes #2515

  • This comment contains a description of changes (with reason).
    adds cellranger multi module, with concomitant modifications to other cellranger code and test data as needed
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements. only works with nfcore/cellranger Docker image

@klkeys klkeys self-assigned this Mar 31, 2023
@klkeys klkeys mentioned this pull request Mar 31, 2023
4 tasks
@klkeys
Copy link
Contributor Author

klkeys commented Apr 6, 2023

module test data PR: nf-core/test-datasets#848

@klkeys
Copy link
Contributor Author

klkeys commented Apr 15, 2023

update: after a lot of testing, it seems like my original design won't work

CELLRANGER_MULTI ( [ meta, path(cellranger_config_csv) }

the problem is that the module process needs to stage all references (GEX, VDJ, etc.) as well as the FASTQs themselves, or else the Cellranger container cannot see them

but staging won't happen if it is only fed the cellranger multi config itself

current design approach is

CELLRANGER_MULTI (
    [ meta, path(cellranger_config_csv) ], // config file with metadata
    [ gex_ref, vdj_ref, ... ], // a list of genomic references for the desired analysis
    [ fastqs ] // a list of fastqs used for the analysis
)

and then stage the references + FASTQs according to the config, with empty channels for unused references if need be

@klkeys
Copy link
Contributor Author

klkeys commented Apr 18, 2023

linking to related efforts with Cumulus for reference

@klkeys
Copy link
Contributor Author

klkeys commented Apr 21, 2023

the current paradigm is to write the config file before running the cellranger multi process, e.g.

WRITE_CONFIG(...) // a custom process to write a custom Cellranger config
CELLRANGER_MULTI( WRITE_CONFIG.out.config, ch_references, ch_fastqs )

this makes for a tidy module but unfortunately won't work

from the Cellranger docs we see example configs with lines like

[gene-expression]
reference,/path/to/transcriptome

and instructions

Make sure to replace /path/to with the actual full path to your data [...]

as far as I can tell, cellranger multi called from a Docker container really wants absolute paths, not relative ones, in direct contrast to Nextflow's file staging design

this isn't bad until the config file is written in a different process than where Cellranger is run, since the working directories (and even the compute nodes themselves) need not be the same across both processes

ergo it's probably wisest to build the config and run cellranger multi in the same process

the downside here is that this omnibus process will need many arguments to account for all possible parametrizations of the config, something like this or maybe

gex_options = [ 'expect-cells':1000, 'chemistry':'SC5P-PE', ...]
CELLRANGER_MULTI (
    gex_reference // path to gene expression reference
    gex_options // map of options for gene expression analysis
    vdj_reference // path to vdj reference
    vdj_options // ...
...
)

the module will get long and ugly, but should hopefully still work

@klkeys
Copy link
Contributor Author

klkeys commented Apr 28, 2023

good progress so far when creating config from within module itself, though the module code got ugly fast

as a note of just how sensitive cellranger multi is to its inputs:

  [error] Pipestance failed. Error log at:
  test/SC_MULTI_CS/MULTI_PREFLIGHT_LOCAL/fork0/chnk0-u00134c48f4/_errors
  
  Log message:
  Your [vdj] reference does not contain the expected file (references/vdj/vdj_reference/fasta/regions.fa), or they are not readable. Please check your reference folder on 6f1571c9ccfa.

the regions.fa file in question is pulled from nf-core/test-datasets but in my test I (...thoughtlessly?...) staged it with file extension .fasta instead of .fa 🤦🏼

@klkeys
Copy link
Contributor Author

klkeys commented May 2, 2023

the process executes correctly but doesn't publish correctly

the test chokes on theversions.yml file:

/private/var/folders/fy/gr30f0h12dz833zbd8_z5ls40000gn/T/pytest_workflow_8abz5ym2/cellranger_multi_test_cellranger_multi/work/45/702b77731e7de86bdb5864d58fbd22/.command.sh: line 87: warning: here-document at line 84 delimited by end-of-file (wanted `END_VERSIONS')

a previous note suggests that multiline strings with newlines, such as whatever meta_* maps are used to add cellranger multi options, are probably the culprit

not sure how else to pass multiple sectioned options without parsing each potential option individually

looks like we need another redesign 😣

@klkeys
Copy link
Contributor Author

klkeys commented May 3, 2023

it's been quite the odyssey but this module is now in a "working" state

could subject this code to independent review now, but will first look for some extra test data

currently, the only test configuration is GEX + VDJ

the module needs tests with AB + BEAM + CRISPR + FB, if possible

Copy link
Contributor

@sateeshperi sateeshperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

@klkeys klkeys added this pull request to the merge queue May 16, 2023
Merged via the queue into nf-core:master with commit d9077a6 May 16, 2023
@klkeys klkeys deleted the add_cellranger_multi branch May 16, 2023 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

new module: cellranger/multi
2 participants