Skip to content

Enhancement and integration of peak signal enables accurate identification of cell type in scATAC-seq

License

Notifications You must be signed in to change notification settings

mrcuizhe/svmATAC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

svmATAC

Enhancement and imputation of peak signal enables accurate identification of cell type in scATAC-seq

svmATAC is fully open source and under MIT license.

Dependencies

  • R 3.6
  • Python 3.7
  • Cicero 3.11

Quick Start

The pipeline of svmATAC is consist of two steps:

  • 1.Pre-process:

    • If you are trying to reproduce the results in svmATAC paper from raw sequencing data(Can be found in preprocess/XXX/input folder), you may need this step.
    • If you are trying to reproduce the results in svmATAC paper using our provided data(Can be found in XXX-dataset/XXX/input folder), or your data are already merged to 0-1 matrix (peak-cell) and labelled, you may skip this step.
  • 2.Training-Classification:

    • The training scripts and classification scripts are stored in bin/ folder of each experiment, you can try these scripts and reproduce the results in svmATAC paper through following chapters:
    • For each single experiment:
      • All the scripts are available in the bin/ folder
        • All scripts are numbered and users should execute one by one.
        • The intermediate temporary files are stored in tmp/ folder and you can ignore these files.
      • All the input data required are available in the input/ folder
      • All the output data generated are stored in the output/ folder

Content

Pre-process

This chapter stores the scripts for processing raw data.

This chapter stores the scripts for assigning labels to 10x PBMCs v1 and nextGem scATAC-seq data from labeled scRNA-seq data using Seurat.

  • The scRNA-seq-5k-v3 dataset (folder 'scRNA-seq-5k-v3')

  • The scATAC-seq-5k-v1 dataset (folder 'scATAC-seq-5k-v1')

  • The scATAC-seq-5k-nextgem dataset (folder 'scATAC-seq-5k-nextgem')

Training-Classification

intra-dataset

This chapter stores the scripts for intra-dataset experiments which are described in manuscript.

  • The Corces2016 dataset (folder 'Corces2016')

  • The Buenrostro2018 dataset (folder 'Buenrostro2018')

  • The 10xPBMCsV1 dataset (folder '10xPBMCsV1')

  • The 10xPBMCsNextGem dataset (folder '10xPBMCsNextGem')

  • The 10xPBMCsV1-labeled dataset (folder '10xPBMCsV1_labeled')

  • The 10xPBMCsNextGem-labeled dataset (folder '10xPBMCsNextGem_labeled')

inter-dataset

This chapter stores the scripts for inter-dataset experiments which are described in manuscript.

  • The 10xPBMCs-labeled dataset (folder '10xPBMCs_labeled')

  • The 10xPBMCs-unlabeled dataset (folder '10xPBMCs_unlabeled')

About

Enhancement and integration of peak signal enables accurate identification of cell type in scATAC-seq

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published