-
Notifications
You must be signed in to change notification settings - Fork 19
Mouse Atlas
jsxlei edited this page Sep 28, 2020
·
10 revisions
SCALE supports for scATAC-seq of large dataset (>100k cells) with scalable time. Usually, these data are stored in sparse data form, like mtx format. Here we introduce how to use SCALE for this format.
There should be three files store together in one folder:
- count matrix: counts.mtx or counts.mtx.gz
- peaks: peaks.txt
- barcodes: barcodes.txt
We define a class SingleCellDataset to load scATAC-seq data.
Download the mouse atlas dataset [Download]
SCALE.py -d mouse_atlas -k 30 -x 0.04 --binary
- We remove peaks accessible (value > 0) in less than 4% of total cells. There are ~20k peaks left after filtering.
- We recommend the kept peaks should in the range of 10-30k for the default structure of SCALE.
- If the data quality is very good and the original peaks number is no more than 100,000, you can try to keep them all.
- The imputed data can very large, since the data is filled with float values, we recommend transform the imputed data into binary with option [--binary]
t-SNE embedding is saved in tsne.txt and tsne.pdf labeled by cluster assignments in cluster_assignments.txt
Select the interesting peaks and change the sparse matrix into dense matrix,
from scale.dataset import read_mtx
import pandas as pd
imputed, peaks, cell_id = read_mtx('output/binary_imputed/')
imputed = pd.DataFrame(imputed.toarray().T, index=peaks, columns=cell_id)
then perform downstream analysis similar to tutorial of Forebrain