It's the repo for benchmarking of cellsnp-lite. Cellsnp-lite is implemented in C and performs per cell genotyping, supporting both with (mode 1) and without (mode 2) given SNPs. In the latter case, heterozygous SNPs will be detected automatically. Cellsnp-lite is applicable for both droplet-based (e.g., 10x Genomics data) and well-based platforms (e.g., SMART-seq2 data).
See Table 1 of the preprint for summary of all four options, and example alternatives in each mode.
This repo includes six runs in dir run, each is for a specific benchmarking task. A wrapper script benchmark.sh is provided to make it easier to run single task.
To use the repo, please first clone it to your local machine,
git clone https://github.com/hxj5/csp-benchmark.git
Before running any benchmarking task, all dependent softwares and datasets should have been installed or well prepared. To achieve this, please firstly check and modify config.sh and then follow the instructions in doc/software.rst and doc/dataset.rst.
Once softwares and datasets have been well prepared, you could run single benchmark task with the wrapper script benchmark.sh,
This script is a wrapper for benchmarking cellsnp-lite
Usage: ./benchmark.sh <mode> <action>
<mode> is the target mode for benchmarking, could be one of:
1a-demuxlet Demuxlet dataset with given SNPs
1a-souporcell Souporcell dataset with given SNPs
1b-cardelino Cardelino dataset with given SNPs
2a-souporcell Souporcell dataset without given SNPs
2b-cardelino Cardelino dataset without given SNPs
2b-souporcell Souporcell dataset (bulk mode) without given SNPs
<action> could be one of:
run Execute the run.sh to get time & memory usage
analysis Execute the stat_efficiency.sh and stat_accuracy.sh
Note:
Please make sure all software dependencies and datasets have
been installed and check config.sh before using this script
The latest benchmarking results are now published in Bioinformatics.
(The benchmark results were initially described in the preprint and the corresponding scripts are in scripts/benchmark_v1.)