Home

Welcome to our project wiki!

Motivation

There are important differences among individual cells. Recent advances allow single-cell-resolution measurement of many molecular parameters, such as the expression of surface markers. Analyzing, visualizing, and interpreting this data presents many challenges, but could lead to better understanding of disease processes.

Problem statement

Given a large number of high-dimensional single-cell observations, we would like to:

Characterize rare cellular subtypes
- How many cells are in the subtype?
- What are the average values of observables within the subtype? How is this different from other cells in the population?
Visualize all observations in 2D
Illustrate relations between sub-types
Summarize the full dataset in terms of distinct clusters and/or continuous spectra of variation, as appropriate

Related work

In addition to applications of generic algorithms like PCA, these new datasets have motivated the development / adaptation of domain-specific algorithms.

Spanning-Tree Progression Analysis of Density-Normalized Events (SPADE)

http://www.nature.com/nbt/journal/v29/n10/full/nbt.1991.html

Implementations

https://github.com/nolanlab/spade

Possible problems

Generates qualitatively different outputs when run multiple times on the same data (due to stochastic down-sampling step)
The number of cellular subtypes identified is a user-defined parameter, not an output
Always returns a "progression tree", even when the cluster centers are, for example, mutually equidistant
The local density estimator used is non-standard and is a nonlinear function of actual local density

viSNE

Dana Pe'er's lab used t-distributed stochastic neighbor embedding (re-branded as "viSNE") to visualize cytometry data in 2D.

Implementations

viSNE wraps Laurens van der Maaten's original C++ implementation, described in the paper "Barnes-Hut SNE" and available at: http://lvdmaaten.github.io/tsne/

Possible problems

t-SNE learns an embedding, not a map: cannot extend to new out-of-sample observations
t-SNE is a force-based method and sensitive to "resolution parameters:" it can artificially create clusters if attractive forces between similar points are too strong relative to repulsive forces between dissimilar points (cf. http://www.pnas.org/content/108/41/16916.full)

References for Autoencoder Implementation

Theano Homepage: http://deeplearning.net/software/theano/index.html
Hinton Science Paper: http://www.cs.toronto.edu/~hinton/science.pdf
Hinton Training RBMs Paper: https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
deeplearning.net Theano RBM tutorial: http://deeplearning.net/tutorial/rbm.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Motivation

Problem statement

Related work

Spanning-Tree Progression Analysis of Density-Normalized Events (SPADE)

Implementations

Possible problems

viSNE

Implementations

Possible problems

References for Autoencoder Implementation

Clone this wiki locally