scAVENGERS demultiplexes snATAC-seq data by genotype, referring to the variant information.
The command below clones the repository and installs dependencies.
wget https://github.com/kaistcbfg/scAVENGERS/archive/refs/tags/v1.0.0.tar.gz
tar -xvzf v1.0.0.tar.gz
conda env create -f scAVENGERS/envs/environment.yml
Via the config file in yaml format, you can set parameters for the execution of the pipeline. The parameters include the path of the input and output data and settings for each program in the pipeline. The format is provided in config.yaml
.
Below is a command to run the whole pipeline for demultiplexing. scAVENGERS require an indexed alignment file, an indexed reference genome file, and a line-seperated list of barcode sequences in a text file.
conda activate scavengers
$scAVENGERS_directory/scAVENGERS pipeline --configfile config.yaml -j $THREADS
Running scAVENGERS pipeline results in a tab-delimited file. This result file clusters.tsv
is structured like the below. To note, the format is compatible with the cluster result file generated from souporcell (https://github.com/wheaton5/souporcell).
column name | description |
---|---|
barcode | barcode sequence |
status | status of the barcode sequence: singlet or doublet |
assignment | cluster number where the barcode is assigned. Doublets are expressed in the form {n}/{n}. |
log_prob_singleton | log singleton probability calculated by troublet |
log_prob_doublet | log doublet probability calculated by troublet |
cluster{n} | log likelihood of the assignment on cluster n. |
First, prepare data by running the script below.
wget http://junglab.kaist.ac.kr/Dataset/scAVENGERS_example.tar.gz
tar -xvf scAVENGERS_example.tar.gz
cd scAVENGERS_example
gzip -d genome.fa.gz
Then, you can run the whole pipeline by running below:
$scAVENGERS_directory/scAVENGERS pipeline -j 10 --configfile config.yaml
Seungbeom Han, Kyukwang Kim, Seongwan Park, Andrew J Lee, Hyonho Chun, Inkyung Jung, scAVENGERS: a genotype-based deconvolution of individuals in multiplexed single-cell ATAC-seq data without reference genotypes, NAR Genomics and Bioinformatics, Volume 4, Issue 4, December 2022, lqac095, https://doi.org/10.1093/nargab/lqac095
For further information, please refer to the documentation at https://scavengers.readthedocs.io/en/latest/.