t-SNE (t-Stochastic Neighbor Embedding)

Julia implementation of L.J.P. van der Maaten and G.E. Hintons t-SNE visualisation technique.

The scripts in the examples folder require Plots, MLDatasets and RDatasets Julia packages.

Installation

julia> Pkg.add("TSne")

Basic API usage

tsne(X, ndim, reduce_dims, max_iter, perplexit; [keyword arguments])

Apply t-SNE (t-Distributed Stochastic Neighbor Embedding) to X, i.e. embed its points (rows) into ndims dimensions preserving close neighbours. Returns the points×ndims matrix of calculated embedded coordinates.

X: AbstractMatrix or AbstractVector. If X is a matrix, then rows are observations and columns are features.
ndims: Dimension of the embedded space.
reduce_dims the number of the first dimensions of X PCA to use for t-SNE, if 0, all available dimension are used
max_iter: Maximum number of iterations for the optimization
`perplexity': The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results

Optional Arguments

distance if true, specifies that X is a distance matrix, if of type Function or Distances.SemiMetric, specifies the function to use for calculating the distances between the rows (or elements, if X is a vector) of X
pca_init whether to use the first ndims of X PCA as the initial t-SNE layout, if false (the default), the method is initialized with the random layout
max_iter how many iterations of t-SNE to do
perplexity the number of "effective neighbours" of a datapoint, typical values are from 5 to 50, the default is 30
verbose output informational and diagnostic messages
progress display progress meter during t-SNE optimization
min_gain, eta, initial_momentum, final_momentum, momentum_switch_iter, stop_cheat_iter, cheat_scale low-level parameters of t-SNE optimization
extended_output if true, returns a tuple of embedded coordinates matrix, point perplexities and final Kullback-Leibler divergence

Example usage

using TSne, Statistics, MLDatasets

rescale(A; dims=1) = (A .- mean(A, dims=dims)) ./ max.(std(A, dims=dims), eps())

alldata, allabels = MNIST.traindata(Float64);
data = reshape(permutedims(alldata[:, :, 1:2500], (3, 1, 2)),
               2500, size(alldata, 1)*size(alldata, 2));
# Normalize the data, this should be done if there are large scale differences in the dataset
X = rescale(data, dims=1);

Y = tsne(X, 2, 50, 1000, 20.0);

using Plots
theplot = scatter(Y[:,1], Y[:,2], marker=(2,2,:auto,stroke(0)), color=Int.(allabels[1:size(Y,1)]))
Plots.pdf(theplot, "myplot.pdf")

Command line usage

julia demo-csv.jl haveheader --labelcol=5 iris-headers.csv

Creates myplot.pdf with t-SNE result visualized using Gadfly.jl.

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
.github/workflows		.github/workflows
examples		examples
src		src
test		test
LICENSE.md		LICENSE.md
Project.toml		Project.toml
README.md		README.md
example.png		example.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

t-SNE (t-Stochastic Neighbor Embedding)

Installation

Basic API usage

Example usage

Command line usage

See also

About

Releases 5

Packages

Contributors 7

Languages

License

lejon/TSne.jl

Folders and files

Latest commit

History

Repository files navigation

t-SNE (t-Stochastic Neighbor Embedding)

Installation

Basic API usage

Example usage

Command line usage

See also

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Contributors 7

Languages

Packages