[MRG+1] t-SNE #2822

AlexanderFabisch · 2014-02-03T23:35:58Z

This is an implementation of (non-parametric) t-SNE for visualization.

See Laurens van der Maaten's paper or his website about t-SNE for details. In comparison to other implementations and the original paper this version has these features:

it is designed and optimized for Python
the degrees of freedom of the Student's t-distribution are determined with a heuristic
it has only a few parameters to control the optimization: learning_rate, n_iter and early_exaggeration, the momentum etc. are fixed and work well for most datasets

TODO

Learning Schedules

In the literature:

original paper: initialization with standard deviation 1e-4, 1000 episodes, learning rate 100, momentum 0.5 for 250 episodes, 0.8 for the rest, early exaggeration with 4 for 50 episodes
matlab implementation: learning rate 500, early exaggeration for 100 episodes
python implementation: initialization with standard deviation 1, learning rate 500, early exaggeration for 100 episodes, momentum 0.5 for 20 episodes
divvy: initialization with standard deviation 1e-4, 1000 episodes, learning rate 1000, momentum 0.5 for 100 episodes, 0.8 for the rest, early exaggeration with 4 for 100 episodes
parametric t-sne (not comparable): conjugate gradient
barnes-hut t-sne: initialization with standard deviation 1e-4, 1000 episodes, learning rate 200, momentum 0.5 for 250 episodes, 0.8 for the rest, early exaggeration with 12 for 250 episodes

My experiences:

the learning rate has to be set manually for optimal performance, something between 100 and 1000
a high momentum (0.8) during early exaggeration improves the result

This implementation uses the following schedule:

initialization with standard deviation 1e-4, 1000 episodes, learning rate 1000, momentum 0.5 for 50 episodes, 0.8 for the rest, early exaggeration with 4 for 100 episodes

Observations

early compression (L2 penalty at the beginning of the optimization) did not give significant advantage in my experiments
L-BFGS is faster for smaller datasets and creates larger gaps between natural clusters than gradient descent in larger datasets
usually visualizations look better with gradient descent even though L-BFGS finds better local minima
binary search requires 2.3 seconds in Cython and 3.9 seconds in Python on the digits dataset

Tips

reducing the dimensionality of data to its first 50 principal components often results in better t-SNE visualizations
if the cost function increases during initial optimization, the early exaggeration factor or the learning rate might be too high
if the cost function gets stuck in a bad local minimum increasing the learning rate helps sometimes

Examples

Visualizations of some datasets can be found here, e.g.

Work for other pull requests

mblondel · 2014-02-04T01:13:00Z

sklearn/manifold/tsne.py

+        """
+        return trustworthiness(self, X, n_neighbors=n_neighbors)
+
+    def transform(self, X):


Just implement fit_transform and remove transform.

That would make it impossible to use grid search.

Why not? Does grid search need transform?

transform must be able to generalize to new data. Transductive transformers should implement fit and fit_transform only.

Your score method seems to be able to generalize to new data so this line should be fine:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py#L1199

No, score doesn't generalise. There should really be a fit_score method to parallel fit_predict and fit_transform.

mblondel · 2014-02-04T01:13:50Z

@dwf might be interested in reviewing this :)

jnothman · 2014-02-04T01:21:03Z

sklearn/manifold/tsne.py

+    return p, error
+
+
+def trustworthiness(estimator, X, n_neighbors=5):


I think this makes more sense as a function of X and X_embedded.

Nitpick: I'd prefer the file to be named t_sne.py

And thanks for this contribution!

Thanks for the tips

jnothman · 2014-02-04T01:23:09Z

I'm not sure that this should be a Transformer, rather than a function. Had you intended to perform a grid search over its parameters?

GaelVaroquaux · 2014-02-04T07:21:59Z

I think that it would be useful to have both the function and the transformer. The transformer is a standard in scikit-learn, but the function is also useful.

-------- Original message --------

From: jnothman notifications@github.com

Date:04/02/2014 02:23 (GMT+01:00)

To: scikit-learn/scikit-learn scikit-learn@noreply.github.com

Subject: Re: [scikit-learn] [WIP] t-SNE (#2822)

I'm not sure that this should be a Transformer, rather than a function. Had you intended to perform a grid search over its parameters?

—
Reply to this email directly or view it on GitHub.

AlexanderFabisch · 2014-02-04T07:24:41Z

I agree with Gael. Grid search might be very useful since t-SNE has at least the hyperparameter perplexity and maybe I will also add some parameters that control the optimizer (learning_rate, momentum, ...).

GaelVaroquaux · 2014-02-04T09:35:11Z

The transformer is a standard in scikit-learn
But this isn't transforming into something

OK, I agree that it shouldn't be a transformer. Just an estimator.

jnothman · 2014-02-04T11:14:34Z

But this isn't transforming into something

Ha :P I didn't mean for the message to send like that. It's not
transforming in the sense of a pipeline, etc. But then I recalled that we
have transformers for targets as well as features...

Using GridSearchCV doesn't make sense either, because we can't do CV.

So to parallel the current API, this should really have fit() and
fit_score() (and perhaps other unsupervised, non-inductive estimators
should too).

On 4 February 2014 20:35, Gael Varoquaux notifications@github.com wrote:

The transformer is a standard in scikit-learn
But this isn't transforming into something

OK, I agree that it shouldn't be a transformer. Just an estimator.

Reply to this email directly or view it on GitHubhttps://github.com//pull/2822#issuecomment-34043259
.

mblondel · 2014-02-04T11:20:34Z

@dwf Do you confirm that tSNE cannot generalize to new data? (including heuristics)

mblondel · 2014-02-04T11:27:09Z

Using GridSearchCV doesn't make sense either, because we can't do CV.

I think @AlexanderFabisch wants to use CV to obtain a trustworthiness score on unseen data (the unseen data doesn't need to be transformed to obtain that score). What I don't understand is why score can generalize to new data but not transform. Does tSNE optimize for a different criterion from the one implemented in score?

mblondel · 2014-02-04T12:00:00Z

For unsupervised probabilistic models, I think it is sometimes possible to do model selection by maximizing the likelihood of the training data but we don't have mechanisms for that in scikit-learn.

AlexanderFabisch · 2014-02-04T19:43:07Z

I think there is currently no other method available than parametric t-SNE
which builds on a stack of RBMs and is very difficult to tune because there
are so many hyperparameters. I could implement something like an average of
nearest neighbors. That could also be used for inverse_transform.

Am 04.02.2014 12:20 schrieb "Mathieu Blondel" notifications@github.com:

@dwf https://github.com/dwf Do you confirm that tSNE cannot generalize to
new data? (including heuristics)

Reply to this email directly or view it on GitHub.[image]

AlexanderFabisch · 2014-02-04T19:50:43Z

T-SNE cannot generalize at all in this implementation. We can only optimize
the training score. T-SNE creates two distribution P an Q based on the
distances of samples in the original and the embedded space and minimizes
the Kulback-Leibler divergence of both. The trustworthiness tells us how
well neighbors are preserved in the embedded space.

Am 04.02.2014 12:27 schrieb "Mathieu Blondel" notifications@github.com:

Using GridSearchCV doesn't make sense either, because we can't do CV.

I think @AlexanderFabisch https://github.com/AlexanderFabisch wants to
use CV to obtain a trustworthiness score on unseen data (the unseen data
doesn't need to be transformed to obtain that score). What I don't
understand is why score can generalize to new data but not transform. Does
tSNE optimize for a different criterion from the one implemented in score?

Reply to this email directly or view it on GitHub.[image]

GaelVaroquaux · 2014-02-04T20:11:11Z

I could implement something like an average of nearest neighbors. That
could also be used for inverse_transform.

It might be worth trying.

amueller · 2014-02-04T20:18:17Z

I think we used the term "transformer" pretty loosely in the context of the manifold module. Spectral embedding doesn't have a transform but also inherits from TransformerMixin. I had to check to actually see that LLE and Isomap have transform methods.

I don't think we should implement non-standard hacks in transform and inverse_transform.
I would definitely make this an Estimator and possibly inherit from TransformerMixin. Currently each Estimator inherits from one of the four base mixins ClusterMixin, ClassifierMixin, TransformerMixin or RegressorMixin --- except the algorithm working on labels and the meta-estimators.

I don't think we should tie the decision whether this is a class or not to whether we want to be able to GridSearch it. I never use any sklearn function, because the estimator have such a nice interface.

amueller · 2014-02-04T20:24:02Z

Also: wohoo, t-SNE!

agramfort · 2014-02-04T20:28:34Z

sklearn/manifold/t_sne.py

+from scipy.optimize import fmin_l_bfgs_b
+from scipy.spatial.distance import pdist
+from scipy.spatial.distance import squareform
+import binary_search


use relative import

AlexanderFabisch · 2014-02-04T21:52:48Z

It seems like there is a huge interest in having this in sklearn. ;) I think this will take a while but I think it is worth it.

mblondel · 2014-02-05T01:43:31Z

I could implement something like an average of nearest neighbors

I think you could reuse the same approach for SpectralEmbedding (which only implements fit and fit_transform too, by the way). This can be done in another PR.

mblondel · 2014-02-05T01:58:19Z

T-SNE cannot generalize at all in this implementation

Then, how are you planning to use grid search?!

jnothman · 2014-02-05T02:11:04Z

Then, how are you planning to use grid search?!

Could use cv=[(arange(X.shape[0]), arange(X.shape[0]))]?

On 5 February 2014 12:58, Mathieu Blondel notifications@github.com wrote:

T-SNE cannot generalize at all in this implementation

Then, how are you planning to use grid search?!

Reply to this email directly or view it on GitHubhttps://github.com//pull/2822#issuecomment-34130662
.

jnothman · 2014-02-05T02:11:38Z

sklearn/manifold/t_sne.py

+        if self.distances == "precomputed" and X.shape[0] != X.shape[1]:
+            raise ValueError("X should be a square distance matrix")
+
+        self.Y_ = self._tsne(X)


Would this attribute be better called embedding_ as it is in SpectralEmbedding?

Actually, I it has been called embedding_ previously. :) I will change that again.

mblondel · 2014-02-05T02:25:33Z

Could use cv=[(arange(X.shape[0]), arange(X.shape[0]))]?

Indeed that would work in the current state of this PR but score is supposed to generalize to new data. In the future, we will need a way to do model selection for such "transductive" algorithms. That would be a useful addition to scikit-learn.

Rename affinity to metric and affinities to distances

AlexanderFabisch · 2014-05-28T18:13:16Z

@ogrisel The branch has been rebased on master.

ogrisel · 2014-06-04T09:02:14Z

This looks good to me. Shall we merge?

GaelVaroquaux · 2014-06-04T09:40:48Z

This looks good to me. Shall we merge?

👍!

agramfort · 2014-06-04T09:44:45Z

+1 on my side too

mblondel · 2014-06-04T11:21:02Z

+1

Thanks for putting up with our nitpicking @AlexanderFabisch :b

AlexanderFabisch · 2014-06-04T11:42:40Z

I have to thank you and all the other reviewers, in particular @ogrisel and @GaelVaroquaux . Your comments really improved the quality of the code! ;)

[MRG+1] t-SNE

GaelVaroquaux · 2014-06-04T11:55:39Z

Merged #2822.

Hurray! 🍻

amueller · 2015-01-23T21:25:35Z

Was there no whatsnew for this or am I blind? Wasn't this one of the highlights of 0.15?

AlexanderFabisch · 2015-01-23T23:02:17Z

No, I can't find it either.

amueller · 2015-01-24T00:57:05Z

Do you want to add it? I feel whatsnew is a good way to check when something was added.

AlexanderFabisch · 2015-01-24T08:04:15Z

OK, should I open a pull request for that or is it possible to commit that directly to master?

GaelVaroquaux · 2015-01-24T10:00:23Z

OK, should I open a pull request for that or is it possible to commit that
directly to master?

I think that you can commit this directly to master. Thanks!

AlexanderFabisch · 2015-01-24T19:24:10Z

Was there no whatsnew for this or am I blind? Wasn't this one of the highlights of 0.15?

I added an entry to the list of highlights and one to the list of new features with bdcea50

AlexanderFabisch · 2015-01-24T22:24:53Z

@GaelVaroquaux travis complains about a failing unit test. I can't see how this is related to my commit. Do you have any idea?

ERROR: sklearn.tests.test_common.test_regressors('OrthogonalMatchingPursuitCV', <class 'sklearn.linear_model.omp.OrthogonalMatchingPursuitCV'>)

----------------------------------------------------------------------

Traceback (most recent call last):

File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/nose/case.py", line 197, in runTest

self.test(*self.arg)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/utils/estimator_checks.py", line 880, in check_regressor_data_not_an_array

check_estimators_data_not_an_array(name, Estimator, X, y)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/utils/estimator_checks.py", line 901, in check_estimators_data_not_an_array

estimator_1.fit(X_, y_)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 817, in fit

for train, test in cv)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/externals/joblib/parallel.py", line 659, in __call__

self.dispatch(function, args, kwargs)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/externals/joblib/parallel.py", line 406, in dispatch

job = ImmediateApply(func, args, kwargs)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/externals/joblib/parallel.py", line 140, in __init__

self.results = func(*args, **kwargs)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 711, in _omp_path_residues

return_path=True)

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 376, in orthogonal_mp

copy_X=copy_X, return_path=return_path

File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 110, in _cholesky_omp

**solve_triangular_args)

File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/scipy/linalg/basic.py", line 137, in solve_triangular

a1, b1 = map(np.asarray_chkfinite,(a,b))

File "/home/travis/miniconda/envs/testenv/lib/python2.6/site-packages/numpy/lib/function_base.py", line 590, in asarray_chkfinite

"array must not contain infs or NaNs")

ValueError: array must not contain infs or NaNs

GaelVaroquaux · 2015-01-24T22:34:51Z

@GaelVaroquaux travis complains about a failing unit test. I can't see how this is related to my commit. Do you have any idea?

Heisenbug, maybe? I restarted the travis job. We'll see what it gives.

GaelVaroquaux · 2015-01-24T22:44:58Z

Heisenbug, maybe? I restarted the travis job. We'll see what it gives.

Yup. Heisenbug...

AlexanderFabisch · 2015-01-24T22:54:19Z

Thanks for checking.

GaelVaroquaux · 2015-01-24T22:55:00Z

Well, thanks for mentioning that something broke. That's important!

mblondel reviewed Feb 4, 2014
View reviewed changes

jnothman reviewed Feb 4, 2014
View reviewed changes

jnothman closed this Feb 4, 2014

jnothman reopened this Feb 4, 2014

agramfort reviewed Feb 4, 2014
View reviewed changes

jnothman reviewed Feb 5, 2014
View reviewed changes

AlexanderFabisch added 8 commits May 28, 2014 19:56

Use PCA initialization in examples

08c17d2

Fix docstring

6487885

Use euclidean_distances in original space

39fe650

Mention TruncatedSVD and clean up (PEP8, Pyflakes)

e718603

Affinity must be 'precomputed' or 'euclidean'

222250d

Rename arguments

cda03fa

Rename affinity to metric and affinities to distances

Allow sparse data

6b353c4

Correct examples

f59e08f

agramfort added a commit that referenced this pull request Jun 4, 2014

Merge pull request #2822 from AlexanderFabisch/tsne

0a4ba72

[MRG+1] t-SNE

agramfort merged commit 0a4ba72 into scikit-learn:master Jun 4, 2014

AlexanderFabisch mentioned this pull request Aug 13, 2014

bug fix for t-SNE (issue #3526) #3532

Closed

AlexanderFabisch mentioned this pull request Nov 17, 2016

[MRG+1] Fixes bugs in t-SNE argument passing, and a bug with running using n dfs #7885

Closed

		return p, error


		def trustworthiness(estimator, X, n_neighbors=5):

[MRG+1] t-SNE #2822

[MRG+1] t-SNE #2822

Conversation

AlexanderFabisch commented Feb 3, 2014

TODO

Learning Schedules

Observations

Tips

Examples

Work for other pull requests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mblondel commented Feb 4, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Feb 4, 2014

GaelVaroquaux commented Feb 4, 2014

AlexanderFabisch commented Feb 4, 2014

GaelVaroquaux commented Feb 4, 2014

jnothman commented Feb 4, 2014

mblondel commented Feb 4, 2014

mblondel commented Feb 4, 2014

mblondel commented Feb 4, 2014

AlexanderFabisch commented Feb 4, 2014

AlexanderFabisch commented Feb 4, 2014

GaelVaroquaux commented Feb 4, 2014

amueller commented Feb 4, 2014

amueller commented Feb 4, 2014

Choose a reason for hiding this comment

AlexanderFabisch commented Feb 4, 2014

mblondel commented Feb 5, 2014

mblondel commented Feb 5, 2014

jnothman commented Feb 5, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mblondel commented Feb 5, 2014

AlexanderFabisch commented May 28, 2014

ogrisel commented Jun 4, 2014

GaelVaroquaux commented Jun 4, 2014

agramfort commented Jun 4, 2014

mblondel commented Jun 4, 2014

AlexanderFabisch commented Jun 4, 2014

GaelVaroquaux commented Jun 4, 2014

amueller commented Jan 23, 2015

AlexanderFabisch commented Jan 23, 2015

amueller commented Jan 24, 2015

AlexanderFabisch commented Jan 24, 2015

GaelVaroquaux commented Jan 24, 2015

AlexanderFabisch commented Jan 24, 2015

AlexanderFabisch commented Jan 24, 2015

GaelVaroquaux commented Jan 24, 2015

GaelVaroquaux commented Jan 24, 2015

AlexanderFabisch commented Jan 24, 2015

GaelVaroquaux commented Jan 24, 2015