HBM-CITEseq: HuBMAP CITE-seq pipeline

Overview

HBM-CITEseq is an extension for HuBMAP scRNA-seq pipeline (https://github.com/hubmapconsortium/salmon-rnaseq) to process CITE-seq data. It is built on Alevin, Scanpy and Muon, and is implemented as a CWL workflow wrapping commamd-line tools encapsulated in Docker containers.

Requirements

Running the pipeline requires a CWL workflow execution engine and container runtime; we recommend Docker and the cwltool reference implementation. cwltool is written in Python and can be installed into a sufficiently recent Python environment with pip install cwltool. Afterward, clone this repository, check out a tag, and invoke the pipeline as:

cwltool pipeline.cwl --fastq_dir_rna RNA_FASTQ_DIR --fastq_dir_adt ADT_FASTQ_DIR --adt_tsv ADT_BARCODE --assay ASSAY --threads THREADS

Supplementary info for input:

HTO data is an optional input for HBM-CITEseq. Extend the command to input HTO raw data and barcode information:

--fastq_dir_hto HTO_FASTQ_DIR --hto_tsv HTO_BARCODE

In order to quantify ADT and HTO data, we need to index the feature barcodes. The indexing provided by salmon should base on a tab separated file (tsv format), where each line represents one reference, id and the reference sequence are separated by the tab. We provided the ADT_BARCODE file for TotalSeq-A/B/C under /adt_barcode and the HTO_BARCODE file for TotalSeq-A/B/C under /hto_barcode. The first column is in the format of 'Description_Clone'. (https://www.biolegend.com/en-us/totalseq/barcode-lookup serves as a reference.)
If you want to specify your own barcode information, example ADT_BARCODE and HTO_BARCODE files can be found under example_data.
If the feature barcoding protocol use TotalSeq B or C, the cellular barcodes of RNA and the feature barcodes of ADT/HTO might not exactly be the same. Transformation needs to be performed based on a mapping file.
To download the mapping file: https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/translation/3M-february-2018.txt.gz
More information from 10XGENOMICS: https://kb.10xgenomics.com/hc/en-us/articles/360031133451-Why-is-there-a-discrepancy-in-the-3M-february-2018-txt-barcode-whitelist-
To perform the mapping, extend the original command by:

--trans_dir MAPPING_FILE_DIR --trans_filename MAPPING_FILE_NAME

Possible issue

For the HTO quantification, all entries in the output HTO expression matrix might be zero. This is due to the malformed MTX file in HTO quantification output of Salmon. We've reported the issue here: COMBINE-lab/salmon#791

Environments

To avoid possible reading errors, HBM-CITEseq requires anndata >= 0.8.0 and mudata >= 0.2.0.
More information from Github: scverse/scanpy#1351

Outputs

Multimodal data (stored using muon package in .h5mu format)

Consolidated expression per cell for RNA, ADT and HTO(optional): mudata.h5mu
Processed version of mudata.h5mu (including all downstream analysis information): citeseq_downstream.h5mu

Multi-omics integration result

MOFA model: citeseq_mofa.hdf5

Embedding results

Leiden clustering result on rna modality: leiden_cluster_rna.pdf
Leiden clustering result on adt modality: leiden_cluster_adt.pdf
Leiden clustering result on joint modality: leiden_cluster_combined.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
adt_barcode		adt_barcode
bin		bin
docker/salmon		docker/salmon
example_data		example_data
hto_barcode		hto_barcode
salmon-rnaseq @ 21de89f		salmon-rnaseq @ 21de89f
steps		steps
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
docker_images.txt		docker_images.txt
pipeline.cwl		pipeline.cwl
pyproject.toml		pyproject.toml

License

hubmapconsortium/citeseq-pipeline

Folders and files

Latest commit

History

Repository files navigation

HBM-CITEseq: HuBMAP CITE-seq pipeline

Overview

Requirements

Possible issue

Environments

Outputs

About

Resources

License

Stars

Watchers

Forks

Languages