There are a number of alternatives to choose from when you need to reduce dimensions of a large data set (for visualization, etc.). This repository was put together to allow for easy head to head evaluation of a number of tsne and umap implementations, two of the top contendenders for dimensionality reductions that perserves localized grouping. This is primarily meant for performance evaluation, but we include visualization of results for a few of the candidate libraries for a qualitative comparision as well.
The libraries included in the comparision are:
- lvdmaaten - Initial implementation
- 10XDev - A python wrapper around lvdmaaten's implementation (and some improvements)
- danielfrg - The "standard" python version available from pypi and conda
- resero-labs - A fork of 10XDev that uses openmp to take advantage of multiple cores (available on pypi as tsne-mp)
- tsne-cuda - A CUDA implementation still in early development but very promising
- Fit-SNE - FFT Implementation of tsne
- umap - A python/numba implementation of UMAP
To run the test:
-
pre-requisites
- python 3.6 or greater (used to run utilities to build and run docker files
- dockerutils -
pip install dockerutils
(allows seemless interaction with both local and remote docker images and containers) - setup EC2 instances (M5.12xlarge and P3.2xlarge for CPU and GPU testing)
-
Build & run
- build tsne-cuda python wheel and libfaiss.so, place them in ./docker/gpu (tsne-cuda is under active development and getting the correct version of the wheel and libfaiss.so is a bit tricky, see dwr/explorations branch on this fork of tsne-cuda)
- uncomment the comparisons you'd like to run (in tsne-perf-test.py in the docker/cpu and docker/gpu directories)
build-image all
(builds two docker images, tsne-perf-test-cpu and tsne-perf-test-gpu)- For CPU test,
run-image cpu -c full.mnist
(or iris, or 2500.mnist or cifar) - for GPU test,
run-image -g gpu -c full.mnist
Because of implementation differences we don't include scikit-learn in the performance test. scikit-learn performs significantly slower than any of these implementations (approximately twice as long as the lvdmaaten implementation with informal testing).
We initially looked at including this in the evaluation, but the qualitative results were poor in our judgement.
Using the docker images built we ran a number of tests on m5 and p3 EC2 instances recording the elapsed wall clock time,
the amount of memory used and the cumulative CPU utilization observed during the execution of the test. We are using
standard datasets (iris, mnist, cifar) readily available, but converted to the lvdmaaten
.dat
file format) (there are utilities available in this repo
to read/write/convert that file format).
Repo | Wall Time (s) | Max Memory (kb) | Cumulative CPU % |
---|---|---|---|
fit-sne | 25.22 | 108808 | 124 |
lvdmaaten | 16.04 | 14516 | 99 |
umap | 11.80 | 285512 | 169 |
danielfrg | 7.80 | 34096 | 99 |
10XDev | 7.69 | 14612 | 99 |
resero-labs | 3.15 | 13020 | 3948 |
Repo | Wall Time (s) | Max Memory (kb) | Cumulative CPU % |
---|---|---|---|
lvdmaaten | 6064.91 | 1426784 | 99 |
10XDev | 3753.59 | 1426692 | 99 |
danielfrg | 2100.58 | 1426288 | 99 |
resero-labs | 329.98 | 1436172 | 3588 |
fit-sne | 125.05 | 1599288 | 596 |
umap | 102.73 | 2127828 | 243 |
Repo | Wall Time (s) | Max Memory (kb) | Cumulative CPU % |
---|---|---|---|
resero-labs | 798.18 | 1504632 | 714 |
tsne-cuda | 22.59 | 2456588 | 123 |
If you have a gpu and you are willing to work with early days code, the tsne-cuda implementation beats all others handsdown from a performance perspective. From limited test samples, it lags a bit qualitatively, but is likely sufficient for most purposes.
If you don't have a gpu, use Fit-SNE, tsne-mp or umap depending on your dataset size, and your concerns surrounding qualitative results.