Skip to content

Mouse Atlas

jsxlei edited this page Sep 28, 2020 · 10 revisions

Tutorial on Mouse Atlas dataset

SCALE supports for scATAC-seq of large dataset (>100k cells) with scalable time. Usually, these data are stored in sparse data form, like mtx format. Here we introduce how to use SCALE for this format.

There should be three files store together in one folder:

  • count matrix: counts.mtx or counts.mtx.gz
  • peaks: peaks.txt
  • barcodes: barcodes.txt

We define a class SingleCellDataset to load scATAC-seq data.

Get started

Download the mouse atlas dataset [Download]

Run SCALE (command line)

SCALE.py -d mouse_atlas -k 30 -x 0.04 --binary

Notes

  • We remove peaks accessible (value > 0) in less than 4% of total cells. There are ~20k peaks left after filtering.
  • We recommend the kept peaks should in the range of 10-30k for the default structure of SCALE.
  • If the data quality is very good and the original peaks number is no more than 100,000, you can try to keep them all.
  • The imputed data can very large, since the data is filled with float values, we recommend transform the imputed data into binary with option [--binary]

visualization and clustering

t-SNE embedding is saved in tsne.txt and tsne.pdf labeled by cluster assignments in cluster_assignments.txt

Select the interesting peaks and change the sparse matrix into dense matrix,

from scale.dataset import read_mtx
import pandas as pd

imputed, peaks, cell_id = read_mtx('output/binary_imputed/')
imputed = pd.DataFrame(imputed.toarray().T, index=peaks, columns=cell_id)

then perform downstream analysis similar to tutorial of Forebrain

Clone this wiki locally