[MRG] Add early exaggeration iterations as argument to t-SNE #12476

cciccole · 2018-10-28T22:07:56Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Converting early exaggeration iterations from a private constant to a public member variable of the TSNE class. The default value for this new argument is set at 250 such that code using previous versions of sklearn will still get identical results if no action is taken to account for the change.

Being able to set this variable is an important component of running t-SNE and it shouldn't be hidden. Other implementations (e.g., LvdM's) expose it. In the LvdM case it's actually via two arguments but sklearn doesn't distinguish between momentum switch and stop lying. It doesn't seem necessary to treat the two arguments differently.

This change is motivated by recent work that shows a high quality embedding can be achieved with much fewer than 250 early exaggeration iterations.

Any other comments?

Validated that results are identical before and after change for the same inputs.

Converting early exaggeration iterations from a private variable to a public member variable of the TSNE class. The default value for this new argument is set at 250 such that code using previous versions of sklearn will still get identical results if no action is taken to account for the change. This behavior was validated. Being able to set this variable is an important component of running t-SNE and it shouldn't be hidden. Other implementations, for example, LvdM's (https://github.com/lvdmaaten/bhtsne/blob/master/tsne.h#L44-L45) expose it. In LvdM case it's actually via two arguments but sklearn doesn't distinguish between momentum switch and stop lying. It doesn't seem necessary to treat the two arguments differently. This change is motivated by recent work (https://doi.org/10.1101/451690) that shows high quality embeddings can be achieved with much fewer than 250 early exaggeration iterations.

tighten up width in a couple places. Remove trailing whitespace.

Missed a spot.

jnothman

Are you able to add a test that this at least has some effect, and that the validation is now correct?

jnothman · 2018-10-30T02:41:44Z

sklearn/manifold/t_sne.py

@@ -518,15 +518,27 @@ class TSNE(BaseEstimator):
        learning rate is too low, most points may look compressed in a dense
        cloud with few outliers. If the cost function gets stuck in a bad local
        minimum increasing the learning rate may help.
+        Some discussion on how to set learning rate optimally can be found
+        at https://doi.org/10.1101/451690. Effective use of this parameter has


Use the References section and ReST citation format instead.

jnothman · 2018-10-30T02:42:30Z

sklearn/manifold/t_sne.py

+        Number of iterations out of total n_iter that t-SNE should spend
+        in the early exaggeration phase. If embedding quality is suffering as a
+        consequence of increasing number of samples being embedded, increasing
+        this value and n_iter proportionately can help.


would it often be useful to specify this as a fraction of n_iter?

That is one way to do it but I think it is better to specify the number as the actual number. This is for two main reasons:

Aligns with other existing implementations which have the equivalent of this argument specified as the number (see my note about LvdM implementation above).

Ensures that existing client code will get the same results as before, regardless of the number of iterations they had set with the n_iter argument. If we suddenly start using a percentage as the default argument, code with n_iter set to something different than 1000 will suddenly use a different early exaggeration phase length.

eamanu · 2018-10-30T12:26:20Z

Hi! IMHO this represent changes on the functionality and interface on t_sne and this can have some problems of compatibility right? Maybe this could be introduce this change in future version and add some Deprecation message to let know that t_sne will change.

cciccole · 2018-10-31T06:24:49Z

Hi! IMHO this represent changes on the functionality and interface on t_sne and this can have some problems of compatibility right? Maybe this could be introduce this change in future version and add some Deprecation message to let know that t_sne will change.

@eamanu Hello! Technically it modifies the interface but not in such a way as to require a deprecation warning. That's because it's just an optional argument being added, the default value of which is equal to what it was before when this variable was hidden. After this update, code that was using the previous version will get identical results with no action necessary.

That being said, let me know if there are conventions or principles for this project that I'm not aware of that would require more careful treatment as you suggest.

cciccole · 2018-10-31T07:26:55Z

@jnothman I'll look at addressing your other concerns, thanks.

eamanu · 2018-10-31T11:49:54Z

@cciccole yeah! you are right. IMO this change must be written on doc on a versionchanged title.

jnothman · 2018-11-01T00:53:38Z

versionadded, not changed, is appropriate here. And of course there will be a changelog entry for each new feature

thomasjpfan

Needs test to "Validated that results are identical before and after change for the same inputs."

thomasjpfan · 2019-08-05T23:38:21Z

sklearn/manifold/t_sne.py

-                 n_iter_without_progress=300, min_grad_norm=1e-7,
-                 metric="euclidean", init="random", verbose=0,
-                 random_state=None, method='barnes_hut', angle=0.5):
+                 n_iter_early_exag=250, n_iter_without_progress=300,


For backward compatibility, please place, n_iter_early_exag at the end of the function signature.

thomasjpfan · 2019-08-05T23:38:54Z

sklearn/manifold/t_sne.py

@@ -518,15 +518,27 @@ class TSNE(BaseEstimator):
        learning rate is too low, most points may look compressed in a dense
        cloud with few outliers. If the cost function gets stuck in a bad local
        minimum increasing the learning rate may help.
+        Some discussion on how to set learning rate optimally can be found
+        at https://doi.org/10.1101/451690. Effective use of this parameter has


Use the References section and ReST citation format instead.

cciccole changed the title ~~[MRG] Add early exaggeration iterations as argument~~ [MRG] Add early exaggeration iterations as argument to t-SNE Oct 28, 2018

cciccole added 2 commits October 28, 2018 23:31

fixing style guideline issues

aeef0a5

tighten up width in a couple places. Remove trailing whitespace.

white space

da15784

Missed a spot.

jnothman reviewed Oct 30, 2018

View reviewed changes

thomasjpfan reviewed Aug 5, 2019

View reviewed changes

thomasjpfan added Needs work and removed Needs work labels Aug 5, 2019

github-actions bot added the module:manifold label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Add early exaggeration iterations as argument to t-SNE #12476

[MRG] Add early exaggeration iterations as argument to t-SNE #12476

cciccole commented Oct 28, 2018

jnothman left a comment

jnothman Oct 30, 2018

thomasjpfan Aug 5, 2019

jnothman Oct 30, 2018

cciccole Oct 31, 2018

eamanu commented Oct 30, 2018

cciccole commented Oct 31, 2018

cciccole commented Oct 31, 2018

eamanu commented Oct 31, 2018

jnothman commented Nov 1, 2018 via email

thomasjpfan left a comment •

edited

thomasjpfan Aug 5, 2019

thomasjpfan Aug 5, 2019

[MRG] Add early exaggeration iterations as argument to t-SNE #12476

Are you sure you want to change the base?

[MRG] Add early exaggeration iterations as argument to t-SNE #12476

Conversation

cciccole commented Oct 28, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

jnothman left a comment

Choose a reason for hiding this comment

jnothman Oct 30, 2018

Choose a reason for hiding this comment

thomasjpfan Aug 5, 2019

Choose a reason for hiding this comment

jnothman Oct 30, 2018

Choose a reason for hiding this comment

cciccole Oct 31, 2018

Choose a reason for hiding this comment

eamanu commented Oct 30, 2018

cciccole commented Oct 31, 2018

cciccole commented Oct 31, 2018

eamanu commented Oct 31, 2018

jnothman commented Nov 1, 2018 via email

thomasjpfan left a comment • edited

Choose a reason for hiding this comment

thomasjpfan Aug 5, 2019

Choose a reason for hiding this comment

thomasjpfan Aug 5, 2019

Choose a reason for hiding this comment

thomasjpfan left a comment •

edited