Skip to content

jlmelville/sneer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sneer

Stochastic Neighbor Embedding Experiments in R

Note: This package is unlikely to see further major updates, but much of it lives on in smallvis.

An R package for experimenting with dimensionality reduction techniques, including the popular t-Distributed Stochastic Neighbor Embedding (t-SNE).

Installing

# install.packages("devtools")
devtools::install_github("jlmelville/sneer")

Documentation

package?sneer
# sneer function knows how to do lots of embedding
?sneer

Also see the (currently under-construction) documentation web pages for a more detailed explanation.

Examples

# t-SNE on the iris dataset:
res <- sneer(iris)
# then do what you want with the embedded coordinates in res$coords

# sneer does t-SNE, looks for numeric columns and a factor column to color 
# points with automatically, and does tSNE by default, but you can get specific:
res <- sneer(iris[, 1:4], labels = iris$Species, method = "tsne", 
             scale_type = "tsne", opt = "tsne", init = "r", 
             exaggerate = 4, exaggerate_off_iter = 100,
             perplexity = 25)

There is a section of the documentation that has (many) more examples.

Motivation

There are a lot of dimensionality reduction techniques out there, and many that take inspiration from t-SNE, but understanding what makes them work (or not) is complicated by the differences in dataset preparation, preprocessing, output initialization, optimization, and other heuristics.

Sneer is my attempt to write a package that not only provides a way to run multiple embedding algorithms with complete control over all the various twiddly bits, but also exposed lots of twiddly bits to twiddle on if that was what you wanted to do (and I do).

Its basic code was based heavily on Justin Donaldson's tsne R package, but is now mangled so far beyond its original form that I've made it a separate project rather than a fork. It does, however, inherit its license (GPL-2 or later).

Features

Currently sneer offers:

Limitations and Issues

  • It's in pure R, so it's slow.
  • It doesn't implement any of the Barnes-Hut or multipole or related approaches to speed up the distance calculations from O(N2), so it's slow.
  • It doesn't work with sparse matrices... so it's slow and it can't work with large datasets.

Consider this package designed for experimenting on smaller datasets, not production-readiness.

Also, fitting everything I wanted to do into one package has involved splitting everything up into lots of little functions, so good luck finding where anything actually gets done. Thus, its pedagogical value is negligible, unless you were looking for an insight into my questionable design, naming and decision making skills. But this is a hobby project, so I get to make it as over-engineered as I want.

See also

I have some other packages that create or download datasets often used in SNE-related research:

Acknowledgements

I reverse engineered some specifics of the Spectral Directions gradient by translating the relevant part of the Matlab implementation provided on the Carreira-Perpiñán group's software page. Professor Carreira-Perpiñán kindly agreed to allow the resulting R code to be under the GPL license of this package. Obviously, assume any mistakes, errors or resulting destruction of your computer is a bug in sneer.

License

GPLv2 or later. The optimization part of sneer is provided by the mize package, which is available under the BSD 2-Clause license.

About

Stochastic Neighbor Embedding Experiments in R

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages