-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9 from alyst/enh_perf
Performance and Documenation Enhancements
- Loading branch information
Showing
7 changed files
with
159 additions
and
132 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,54 +1,51 @@ | ||
[![Travis](https://travis-ci.org/lejon/TSne.jl.svg?branch=master)](https://travis-ci.org/lejon/TSne.jl) | ||
[![Coveralls](https://coveralls.io/repos/github/lejon/TSne.jl/badge.svg?branch=master)](https://coveralls.io/github/lejon/TSne.jl?branch=master) | ||
|
||
Julia t-SNE | ||
=========== | ||
t-SNE (t-Stochastic Neighbor Embedding) | ||
======================================= | ||
|
||
Julia port of L.J.P. van der Maaten and G.E. Hintons T-SNE visualisation technique. | ||
Julia implementation of L.J.P. van der Maaten and G.E. Hintons [t-SNE visualisation technique](https://lvdmaaten.github.io/tsne/). | ||
|
||
Please observe, that it is not extensively tested. | ||
Please observe that it is not yet extensively tested. | ||
|
||
The examples in the 'examples' dir requires you to have Gadfly and RDatasets installed | ||
The scripts in the `examples` folder require `Gadfly`, `MNIST` and `RDatasets` Julia packages. | ||
|
||
**Please note:** At some point something changed in Julia which caused poor results, it took a while before I noted this but now I have updated the implementation so that it works again. See the link below for images rendered using this implementation. | ||
## Installation | ||
|
||
For some tips working with t-sne [Klick here] (http://lejon.github.io) | ||
`julia> Pkg.clone("git://github.com/lejon/TSne.jl.git")` | ||
|
||
## Basic installation: | ||
## Basic API usage | ||
|
||
`julia> Pkg.clone("git://github.com/lejon/TSne.jl.git")` | ||
|
||
## Basic API usage: | ||
|
||
```jl | ||
using TSne, MNIST | ||
|
||
function normalize(A) | ||
for col in 1:size(A)[2] | ||
std(A[:,col]) == 0 && continue | ||
A[:,col] = (A[:,col]-mean(A[:,col])) / std(A[:,col]) | ||
end | ||
A | ||
function rescale(A, dim::Integer=1) | ||
res = A .- mean(A, dim) | ||
res ./= map!(x -> x > 0.0 ? x : 1.0, std(A, dim)) | ||
res | ||
end | ||
|
||
data, labels = traindata() | ||
data = data' | ||
data = data[1:2500,:] | ||
data = convert(Matrix{Float64}, data[:, 1:2500])' | ||
# Normalize the data, this should be done if there are large scale differences in the dataset | ||
X = normalize(float(data)) | ||
X = rescale(data, 1) | ||
|
||
Y = tsne(X, 2, 50, 1000, 20.0) | ||
|
||
using Gadfly | ||
labels = [string(i) for i in labels[1:2500]] | ||
labels = convert(Vector{String}, labels[1:2500]) | ||
theplot = plot(x=Y[:,1], y=Y[:,2], color=labels) | ||
draw(PDF("myplot.pdf", 4inch, 3inch), theplot) | ||
``` | ||
|
||
![](example.png) | ||
|
||
## Stand Alone Usage | ||
## Command line usage | ||
|
||
```julia demo-csv.jl haveheader --labelcol=5 iris-headers.csv``` | ||
|
||
Creates myplot.pdf with TSne result visuallized using Gadfly. | ||
Creates `myplot.pdf` with t-SNE result visualized using `Gadfly.jl`. | ||
|
||
## See also | ||
* [Some tips working with t-SNE](http://lejon.github.io) | ||
* [How to Use t-SNE Effectively](http://distill.pub/2016/misread-tsne/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
julia 0.4 | ||
Compat 0.9 | ||
FactCheck | ||
BaseTestNext | ||
RDatasets | ||
MNIST | ||
ProgressMeter |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,55 +1,55 @@ | ||
using Gadfly | ||
using TSne | ||
|
||
function normalize(A) | ||
for col in 1:size(A)[2] | ||
std(A[:,col]) == 0 && continue | ||
A[:,col] = (A[:,col]-mean(A[:,col])) / std(A[:,col]) | ||
end | ||
A | ||
""" | ||
Normalize `A` columns, so that the mean and standard deviation | ||
of each column are 0 and 1, resp. | ||
""" | ||
function rescale(A, dim::Integer=1) | ||
res = A .- mean(A, dim) | ||
res ./= map!(x -> x > 0.0 ? x : 1.0, std(A, dim)) | ||
res | ||
end | ||
|
||
if length(ARGS)==0 | ||
if length(ARGS)==0 | ||
println("usage:\n\tjulia demo.jl iris\n\tjulia demo.jl mnist") | ||
exit(0) | ||
end | ||
|
||
use_iris = ARGS[1] == "iris" | ||
lables = () | ||
|
||
if use_iris | ||
if ARGS[1] == "iris" | ||
using RDatasets | ||
println("Using Iris dataset.") | ||
iris = dataset("datasets","iris") | ||
X = float(convert(Array,iris[:,1:4])) | ||
labels = iris[:,5] | ||
X = convert(Matrix{Float64}, iris[:, 1:4]) | ||
labels = iris[:, 5] | ||
plotname = "iris" | ||
initial_dims = -1 | ||
iterations = 1500 | ||
perplexity = 15 | ||
else | ||
elseif ARGS[1] == "mnist" | ||
using MNIST | ||
println("Using MNIST dataset.") | ||
X, labels = traindata() | ||
labels = labels[1:2500] | ||
X = X' | ||
X = X[1:2500,:] | ||
X = normalize(X) | ||
npts = min(2500, size(X, 2), size(labels)) | ||
labels = labels[1:npts] | ||
X = rescale(X[:, 1:npts]') | ||
plotname = "mnist" | ||
initial_dims = 50 | ||
iterations = 1000 | ||
perplexity = 20 | ||
else | ||
error("Unknown dataset \"", ARGS[1], "\"") | ||
end | ||
|
||
println("X dimensions are: " * string(size(X))) | ||
println("X dimensions are: ", size(X)) | ||
Y = tsne(X, 2, initial_dims, iterations, perplexity) | ||
println("Y dimensions are: " * string(size(Y))) | ||
println("Y dimensions are: ", size(Y)) | ||
|
||
writecsv(plotname*"_tsne_out.csv",Y) | ||
lbloutfile = open("labels.txt", "w") | ||
writedlm(lbloutfile,labels) | ||
close(lbloutfile) | ||
writecsv(plotname*"_tsne_out.csv", Y) | ||
open("labels.txt", "w") do io | ||
writedlm(io, labels) | ||
end | ||
|
||
theplot = plot(x=Y[:,1], y=Y[:,2], color=labels) | ||
|
||
draw(PDF(plotname*".pdf", 4inch, 3inch), theplot) | ||
#draw(SVG(plotname*".svg", 4inch, 3inch), theplot) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.