This repository provides the full source code for the DendroSplit framework described in the paper "An Interpretable Framework for Clustering Single-Cell RNA-Seq Datasets" by Zhang, Fan, Fan, Rosenfeld, and Tse. It also contains the scripts necessary for reproducing the results in the paper. Please see this Bitbucket repository for the version of the package used and maintained by BD Genomics.
In our paper we analyzed 9 publicly available single-cell RNA-Seq datasets:
- Biase et al.: paper, data
- Yan et al.: paper, data
- Pollen et al.: paper, data
- Kolodzieczyk et al.: paper, data
- Patel et al.: paper, data
- Zeisel et al.: paper, data
- Macosko et al.: paper, data
- Birey et al.: paper, data
- Zheng et al.: paper, data
We also analyzed some synthetic datasets. Please see the Jupyter notebooks in the Figures directory for the code used to reproduce all the figures in the paper. Some wrapper code used in the notebooks is also provided. For each dataset, processing requires 4 inputs which are saved in directory DATAPREFIX/
as:
DATAPREFIX_expr.txt
(orDATAPREFIX_expr.h5
for larger datasets): a matrix of gene/transcript expression values where the rows correspond to cells and the columns correspond to featuresDATAPREFIX_labels.txt
: a set of labels for all the cellsDATAPREFIX_features.txt
: a set of feature namesDATAPREFIX_reducedim_coor.txt
: a 2D representation of the data for visualizing results
DendroSplit is written in Python 2.7 and has the following dependencies (Python modules):
- numpy (1.12.1)
- scipy (0.19.0)
- matplotlib (1.5.3)
- sklearn (0.18.1)
- networkx (1.11)
- community
The tutorial Jupyter notebook also uses tsne (0.1.7) and pandas (0.20.1) for preparing the example data.
DendroSplit can be installed via pip:
pip install dendrosplit
Import DendroSplit by adding the following line of code to your Python script:
from dendrosplit import split, merge, utils
A tutorial for using the main DendroSplit functions is given in the tutorial Jupyter notebook. Please refer to the Jupyter notebooks used to generate the figures in the paper for more examples.
DendroSplit is licensed and distributed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.