Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix gradient descent in TSNE #8768

Merged
merged 6 commits into from Jun 1, 2017
Merged

[MRG] Fix gradient descent in TSNE #8768

merged 6 commits into from Jun 1, 2017

Conversation

@deto
Copy link
Contributor

@deto deto commented Apr 20, 2017

Reference Issue

Fixes #8766

What does this implement/fix? Explain your changes.

The way the delta-bar-delta is implemented in the Python TSNE is incorrect. The sign is flipped and as a result, the algorithm has difficulty converging. For reference, see the MATLAB, R, and C++ implementations (which I've linked in the issue).

This fix corrects the sign issue and modifies the constants used to be the same as implemented in other languages.

Any other comments?

@deto deto changed the title Fix gradient descent in TSNE [MRG] Fix gradient descent in TSNE Apr 20, 2017
@jnothman
Copy link
Member

@jnothman jnothman commented Apr 20, 2017

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 20, 2017

Does fixing this resolve issues related to the default learning rate (#6204, #8752)?

@iosonofabio
Copy link

@iosonofabio iosonofabio commented Apr 20, 2017

That's a separate PR, our learning rate is a lot higher than those other implementations anyway. That PR fixes convergence into local minima, this one speed of convergence. But of course one could rerun the example plots in the other issues and PR and compare.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 20, 2017

@iosonofabio
Copy link

@iosonofabio iosonofabio commented Apr 20, 2017

@deto
Copy link
Contributor Author

@deto deto commented Apr 20, 2017

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 20, 2017

@lesteve
Copy link
Member

@lesteve lesteve commented Apr 21, 2017

Not an expert on t-SNE at all unfortunately, @AlexanderFabisch do you have an opinion on this fix?

@AlexanderFabisch
Copy link
Member

@AlexanderFabisch AlexanderFabisch commented Apr 21, 2017

Yes, it seems like I made a mistake there. Good catch and thanks! :)

The number 0.05 and 0.95 come from Geoff Hinton's Coursera course where the same trick has been used to train neural networks with SGD. We can change it to 0.2 and 0.8 to be closer to the original implementation. I believe it won't make a big difference anyway.

We still have to check if all the examples with t-SNE still work as expected and the unit tests have to be fixed.

@AlexanderFabisch
Copy link
Member

@AlexanderFabisch AlexanderFabisch commented Apr 21, 2017

OK, I just checked examples/manifold/plot_compare_methods.py, examples/manifold/plot_manifold_sphere.py, and examples/manifold/plot_lle_digits.py (with learning_rate=200.0). The result seems to be almost the same.

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 27, 2017

Is there a way to test this?

@jnothman
Copy link
Member

@jnothman jnothman commented Apr 27, 2017

Could you please add a bug fix entry to what's new?

gains[inc] += 0.05
gains[dec] *= 0.95
gains[inc] += 0.2
gains[dec] *= 0.8
np.clip(gains, min_gain, np.inf)

This comment has been minimized.

@AlexanderFabisch

AlexanderFabisch Apr 27, 2017
Member

I just noticed that this line is completely useless. It should be

np.clip(gains, min_gain, np.inf, out=gains)

Do we add it to this PR?

This comment has been minimized.

@deto

deto Apr 27, 2017
Author Contributor

Seems silly to open up a whole new issue for it. I can add to this one.

@deto
Copy link
Contributor Author

@deto deto commented Apr 27, 2017

Added a bugfix entry, but now tests are failing? Did I do something wrong?

@deto
Copy link
Contributor Author

@deto deto commented Apr 27, 2017

I'm not sure the best way to test this. I just ran on the MNIST Digits data set, before and after, and can see that it converges to a better solution.

'Exact' implementation:
-- Before fix: 250 iterations, error= 0.776
-- After fix: 675 iterations, error= 0.573

'Barnes-hut' implementation:
-- Before fix: 225 iterations, error=0.84
-- After fix: 650 iterations, error=0.50

Note that it appears to take less iterations before the fix because the inability to converge results in the gradient descent quitting early due to lack of progress.

The resulting plots look fine either way, but on the personal problem I've been applying this to (single-cell gene expression data), I'm getting better looking results with the fix.

@AlexanderFabisch
Copy link
Member

@AlexanderFabisch AlexanderFabisch commented Apr 28, 2017

Added a bugfix entry, but now tests are failing? Did I do something wrong?

I guess there is a merge conflict. You should rebase on master:

git fetch original_sklearn_origin
git rebase original_sklearn_origin/master
# solve conflict
@lesteve
Copy link
Member

@lesteve lesteve commented Apr 28, 2017

I guess there is a merge conflict. You should rebase on master:

For this kind of simple conflicts it may be easier to do it via the github web interface. Click on the "Resolve conflicts" button.

@deto deto force-pushed the deto:master branch to 49ee8ef Apr 28, 2017
@deto
Copy link
Contributor Author

@deto deto commented Apr 28, 2017

Fixed the conflict with some rebasing/squashing/force pushing.

@lesteve
Copy link
Member

@lesteve lesteve commented Apr 28, 2017

You need to change the docstring for Travis to be green. Other than this LGTM.

Out of interest do you have an example where we can visually see the improvement with this fix? You were mentioning convergence issues in #8766. Does this PR improve the convergence in your case?

@AlexanderFabisch
Copy link
Member

@AlexanderFabisch AlexanderFabisch commented Apr 28, 2017

There are 2 failing unit tests. I think you can update the expected numbers.

@lesteve
Copy link
Member

@lesteve lesteve commented Apr 28, 2017

It is fine to update the expected numbers for the doctest.

The test_n_iter_without_progress is probably a bit more tricky to fix. Basically you need to find some data and some TSNE parameters where did not make any progress during the last 2 episodes. Finished. is printed to stdout. Maybe you have a suggestion @AlexanderFabisch?

@AlexanderFabisch
Copy link
Member

@AlexanderFabisch AlexanderFabisch commented Apr 28, 2017

I don't have an idea. Now I see why it was not a good unit test. :) Well the main idea is to test whether n_iter_without_progress has an influence. So maybe we can play around with the value a little bit and we can set min_grad_norm to a lower value. I think that should help.

edit: Forget the last part.

@lesteve
Copy link
Member

@lesteve lesteve commented May 11, 2017

I pushed an update of the doctests of TSNE in order to have a better picture of the tests still failing.

@deto
Copy link
Contributor Author

@deto deto commented May 20, 2017

Thanks @lesteve for moving this along

Now it looks like it's only failing a test on "n_iter_without_progress"....but only on a specific install configuration. Perhaps because the algorithm happened to make progress on every iteration during that run? Hard to tell as only 1/25 iterations are printed to standard out.

I can't think of a good way to generate a problem that would definitely trigger that test - I'd recommend that test is either removed or modified. Was that parameter added just due to convergence issues in the first place?

@AlexanderFabisch
Copy link
Member

@AlexanderFabisch AlexanderFabisch commented May 20, 2017

I think we all agree that it is a bad test. I cannot think of any other way to test this. Maybe we should just remove it.

@lesteve
Copy link
Member

@lesteve lesteve commented May 22, 2017

OK I pushed a tiny change in the random_state which seems to fix the issue on the only situation I can reproduce the problem (OSX with OpenBLAS). This is just a work-around but at least, if that works, it will allow to merge this PR.

@lesteve
Copy link
Member

@lesteve lesteve commented May 23, 2017

And now AppVeyor is failing on Python 3.5 32bit 😢

@lesteve
Copy link
Member

@lesteve lesteve commented May 23, 2017

OK this is a bit of a kludge but I went for a negative n_iter_without_progress. Hopefully that should do it.

lesteve added 2 commits May 23, 2017
@lesteve
Copy link
Member

@lesteve lesteve commented Jun 1, 2017

OK I skipped the test on 32bit and now everything pass.

@lesteve
Copy link
Member

@lesteve lesteve commented Jun 1, 2017

Merging this one, thanks a lot @deto for catching the bug and fixing it!

@lesteve lesteve merged commit 57415c4 into scikit-learn:master Jun 1, 2017
5 checks passed
5 checks passed
ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95.52%)
Details
codecov/project Absolute coverage decreased by -0.64% but relative coverage increased by +4.47% compared to ee82c3f
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
Skip flaky test_n_iter_without_progress on 32bit
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
Skip flaky test_n_iter_without_progress on 32bit
dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
Skip flaky test_n_iter_without_progress on 32bit
NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
Skip flaky test_n_iter_without_progress on 32bit
paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
Skip flaky test_n_iter_without_progress on 32bit
AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
Skip flaky test_n_iter_without_progress on 32bit
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
Skip flaky test_n_iter_without_progress on 32bit
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Skip flaky test_n_iter_without_progress on 32bit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants
You can’t perform that action at this time.