Skip to content

CITE-seq pipeline using Salmon and Alevin (HuBMAP scRNA-seq pipeline)

License

Notifications You must be signed in to change notification settings

hubmapconsortium/citeseq-pipeline

Repository files navigation

HBM-CITEseq: HuBMAP CITE-seq pipeline

Overview

HBM-CITEseq is an extension for HuBMAP scRNA-seq pipeline (https://github.com/hubmapconsortium/salmon-rnaseq) to process CITE-seq data. It is built on Alevin, Scanpy and Muon, and is implemented as a CWL workflow wrapping commamd-line tools encapsulated in Docker containers.

Requirements

Running the pipeline requires a CWL workflow execution engine and container runtime; we recommend Docker and the cwltool reference implementation. cwltool is written in Python and can be installed into a sufficiently recent Python environment with pip install cwltool. Afterward, clone this repository, check out a tag, and invoke the pipeline as:

cwltool pipeline.cwl --fastq_dir_rna RNA_FASTQ_DIR --fastq_dir_adt ADT_FASTQ_DIR --adt_tsv ADT_BARCODE --assay ASSAY --threads THREADS

Supplementary info for input:

  • HTO data is an optional input for HBM-CITEseq. Extend the command to input HTO raw data and barcode information:
--fastq_dir_hto HTO_FASTQ_DIR --hto_tsv HTO_BARCODE
--trans_dir MAPPING_FILE_DIR --trans_filename MAPPING_FILE_NAME

Possible issue

For the HTO quantification, all entries in the output HTO expression matrix might be zero. This is due to the malformed MTX file in HTO quantification output of Salmon. We've reported the issue here: COMBINE-lab/salmon#791

Environments

To avoid possible reading errors, HBM-CITEseq requires anndata >= 0.8.0 and mudata >= 0.2.0.
More information from Github: scverse/scanpy#1351

Outputs

  • Multimodal data (stored using muon package in .h5mu format)
  1. Consolidated expression per cell for RNA, ADT and HTO(optional): mudata.h5mu
  2. Processed version of mudata.h5mu (including all downstream analysis information): citeseq_downstream.h5mu
  • Multi-omics integration result
  1. MOFA model: citeseq_mofa.hdf5
  • Embedding results
  1. Leiden clustering result on rna modality: leiden_cluster_rna.pdf
  2. Leiden clustering result on adt modality: leiden_cluster_adt.pdf
  3. Leiden clustering result on joint modality: leiden_cluster_combined.pdf

About

CITE-seq pipeline using Salmon and Alevin (HuBMAP scRNA-seq pipeline)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published