Skip to content

Latest commit

 

History

History
96 lines (79 loc) · 5.74 KB

Guideline_for_adding_deconvolution_tools.md

File metadata and controls

96 lines (79 loc) · 5.74 KB

General steps

  1. Create a new folder in subworkflows/deconvolution, containing

    • A script which (minimally) takes as input a single-cell reference matrix, spatial expression matrix, and cell type annotation column,
      and returns as output a TSV file of the predicted proportions (spots in rows and cell types in columns)
    • A Nextflow process which calls this script
  2. Include the process in subworkflows/deconvolution/run_methods.nf

Example: implementing NNLS in R

We will be adding a simple algorithm of non-negative least squares regression to the pipeline. For R methods, we assume the single-cell reference is a Seurat Object with a cell type annotation column, and the spatial object is either a Seurat object or a synthetic dataset generated by synthspot (a named list with the expression matrix in "counts").

  1. Create a directory subworkflows/deconvolution/nnls containing

    • script_nf.R: a script that runs NNLS. We use R.utils::commandArgs to parse command line arguments, with sc_input and sp_input the path to the single-cell and spatial objects, and annot the name of the cell type annotation column.
      The script returns a TSV file of the spot x cell type proportion matrix. We recommend copying the template of results printing,
      because we 1) do not include rownames, 2) remove non-alphanumeric characters from cell types and 3) shell sort the cell types.
    • run_method.nf: a Nextflow process that runs script_nf.R. For simple cases you can simply replace "nnls" with your method name. You can remove the container directive if you only wish to run it locally.
    • OPTIONAL: build a docker container with your method (see Dockerfile)
  2. Add nnls to subworkflows/deconvolution/run_methods.nf

    • In the include statement at the beginning of the file (include { runNNLS } from './nnls/run_method.nf')
    • In parameters all_methods (all_methods = "music,rctd, ... ,dstg,nnls")
    • In the runMethods workflow
        if ( methods =~ /nnls/ ){
            runNNLS(pair_input_ch)
            output_ch = output_ch.mix(runNNLS.out)
        }
    
  3. Test it out with

nextflow run main.nf --methods nnls -profile local \
--sc_input unit-test/test_sc_data.rds --sp_input unit-test/test_sp_data.rds \
--annot subclass 

Extra information for Python methods

For simple algorithms like NNLS the workflow is exactly the same, but the inputs are expected to be h5ad files instead of Seurat objects. However, most Python methods make use of Bayesian probabilistic models (i.e., cell2location, stereoscope, and DestVI) and comprise model building and model fitting steps. Hence, you would need two scripts and two Nextflow processes.

Typically, the model building script (build_model.py) only takes the single-cell object and annotation column as input, while the model fitting script (fit_model.py) takes the model and spatial dataset as input. The Nextflow processes in run_method.nf would minimally look something like

process buildModel {
    input:
        path (sc_input)
    output:
        path (model)

    script:
        """
        python build_model.py $sc_input --annot $params.annot
	# Assume the script outputs a file called "model" containing the built model
        """

}

process fitModel {
    input:
        path (sp_input)
        path (model)
    output:
        path (output_props)
    script:
        """
        python fit_model.py $sp_input $model
	# Assume the script outputs a file called "output_props" containing output proportions
        """
}

Note: In the cell2location/stereoscope/DestVI processes you will instead see input: tuple path (sp_input), path (sp_input_rds). This is because although we internally converted the RDS file to a H5AD file, the original RDS file is still needed for metric computation. So you will also need to follow this format while implementing your own method.

Then, you can also add the method in subworkflow/deconvolution/run_methods.nf:

  • In the include statement at the beginning of the file (include { runMethod } from './method_name/run_method.nf')
  • In parameters all_methods (all_methods = "music,rctd, ... ,dstg,nnls,method_name")
  • In python_methods (python_methods = ['stereoscope', ... 'method_name'])
  • In the runMethods workflow

Adding your process to the runMethods workflow is slightly more complicated than the R case, since we want to be able to run multiple spatial datasets by building the model only once. In Nextflow, a channel can only be used once, so we will need to replicate the model channel to the amount of spatial datasets. As an example, we will see how this was done in the case of cell2location:

buildCell2locationModel(sc_input_conv)				--> build model using single-cell dataset

// Repeat model output for each spatial file
buildCell2locationModel.out.combine(sp_input_pair)		--> .out refers to the model file; we "combine" (cartesian product) the model channel with each spatial input channel
.multiMap { model_sc_file, sp_file_h5ad, sp_file_rds ->		--> we will remap this combined channel which contains three components (the model file, H5AD spatial file, and RDS spatial file)
            model: model_sc_file				--> in the redefined channel, the model file can be accessed via "model"
            sp_input: tuple sp_file_h5ad, sp_file_rds }		--> the spatial files are grouped as a tuple, accessed via "sp_input"
.set{ c2l_combined_ch }						--> name this channel as c2l_combined_ch

fitCell2locationModel(c2l_combined_ch.sp_input,			--> fit the model using sp_input tuple and the model file
                      c2l_combined_ch.model)			    
		      
formatC2L(fitCell2locationModel.out)				--> format the TSV file output by the cell2location model
output_ch = output_ch.mix(formatC2L.out)