Skip to content
NcoderA edited this page Mar 9, 2015 · 2 revisions

Welcome to our project wiki!

Motivation

There are important differences among individual cells. Recent advances allow single-cell-resolution measurement of many molecular parameters, such as the expression of surface markers. Analyzing, visualizing, and interpreting this data presents many challenges, but could lead to better understanding of disease processes.

Problem statement

Given a large number of high-dimensional single-cell observations, we would like to:

  • Characterize rare cellular subtypes
    • How many cells are in the subtype?
    • What are the average values of observables within the subtype? How is this different from other cells in the population?
  • Visualize all observations in 2D
  • Illustrate relations between sub-types
  • Summarize the full dataset in terms of distinct clusters and/or continuous spectra of variation, as appropriate

Related work

In addition to applications of generic algorithms like PCA, these new datasets have motivated the development / adaptation of domain-specific algorithms.

Spanning-Tree Progression Analysis of Density-Normalized Events (SPADE)

http://www.nature.com/nbt/journal/v29/n10/full/nbt.1991.html

Implementations

https://github.com/nolanlab/spade

Possible problems

  • Generates qualitatively different outputs when run multiple times on the same data (due to stochastic down-sampling step)
  • The number of cellular subtypes identified is a user-defined parameter, not an output
  • Always returns a "progression tree", even when the cluster centers are, for example, mutually equidistant
  • The local density estimator used is non-standard and is a nonlinear function of actual local density

viSNE

Dana Pe'er's lab used t-distributed stochastic neighbor embedding (re-branded as "viSNE") to visualize cytometry data in 2D.

Implementations

viSNE wraps Laurens van der Maaten's original C++ implementation, described in the paper "Barnes-Hut SNE" and available at: http://lvdmaaten.github.io/tsne/

Possible problems

  • t-SNE learns an embedding, not a map: cannot extend to new out-of-sample observations
  • t-SNE is a force-based method and sensitive to "resolution parameters:" it can artificially create clusters if attractive forces between similar points are too strong relative to repulsive forces between dissimilar points (cf. http://www.pnas.org/content/108/41/16916.full)

References for Autoencoder Implementation