Enhancement and imputation of peak signal enables accurate identification of cell type in scATAC-seq
svmATAC is fully open source and under MIT license.
- R 3.6
- Python 3.7
- Cicero 3.11
The pipeline of svmATAC is consist of two steps:
-
1.Pre-process:
- If you are trying to reproduce the results in svmATAC paper from raw sequencing data(Can be found in preprocess/XXX/input folder), you may need this step.
- If you are trying to reproduce the results in svmATAC paper using our provided data(Can be found in XXX-dataset/XXX/input folder), or your data are already merged to 0-1 matrix (peak-cell) and labelled, you may skip this step.
-
- The training scripts and classification scripts are stored in bin/ folder of each experiment, you can try these scripts and reproduce the results in svmATAC paper through following chapters:
- intra-dataset experiment
- Corces2016
- Buenrostro2018
- 10xPBMCsV1
- 10xPBMCsNextGem
- 10xPBMCsV1-labeled
- 10xPBMCsNextGem-labeled
- inter-dataset experiment
- 10xPBMCs-labeled
- 10xPBMCs-unlabeled
- intra-dataset experiment
- For each single experiment:
- All the scripts are available in the bin/ folder
- All scripts are numbered and users should execute one by one.
- The intermediate temporary files are stored in tmp/ folder and you can ignore these files.
- All the input data required are available in the input/ folder
- All the output data generated are stored in the output/ folder
- All the scripts are available in the bin/ folder
- The training scripts and classification scripts are stored in bin/ folder of each experiment, you can try these scripts and reproduce the results in svmATAC paper through following chapters:
This chapter stores the scripts for processing raw data.
-
The Corces2016 dataset (folder 'Corces2016')
-
The Buenrostro2018 dataset (folder 'Buenrostro2018')
-
The 10xPBMCsV1 dataset (folder '10xPBMCsV1')
-
The 10xPBMCsNextGem dataset (folder '10xPBMCsNextGem')
This chapter stores the scripts for assigning labels to 10x PBMCs v1 and nextGem scATAC-seq data from labeled scRNA-seq data using Seurat.
-
The scRNA-seq-5k-v3 dataset (folder 'scRNA-seq-5k-v3')
-
The scATAC-seq-5k-v1 dataset (folder 'scATAC-seq-5k-v1')
-
The scATAC-seq-5k-nextgem dataset (folder 'scATAC-seq-5k-nextgem')
This chapter stores the scripts for intra-dataset experiments which are described in manuscript.
-
The Corces2016 dataset (folder 'Corces2016')
-
The Buenrostro2018 dataset (folder 'Buenrostro2018')
-
The 10xPBMCsV1 dataset (folder '10xPBMCsV1')
-
The 10xPBMCsNextGem dataset (folder '10xPBMCsNextGem')
-
The 10xPBMCsV1-labeled dataset (folder '10xPBMCsV1_labeled')
-
The 10xPBMCsNextGem-labeled dataset (folder '10xPBMCsNextGem_labeled')
This chapter stores the scripts for inter-dataset experiments which are described in manuscript.