-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Improve initialization and learning rate in t-SNE #19491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pull-request!
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
@TomDLT I fixed and added t-SNE tests and they seem to be working fine, but something is failing in
and I don't know where to fix it :-/ |
@thomasjpfan Thanks a lot for reviewing! I pushed your suggestions and added TODO comments and docstrings as you suggested everywhere else too.
Sorry, I am not sure I understand the rationale here. I already have everything regarding PCA SD implemented here, what's the point of taking it out? Also, I would be uncomfortable setting PCA init as default if it does not get scaled to correct SD. To be honest, I think these changes should go together. Update: I mean, the SD change is currently implemented but commented out, because it should only go live in version 1.2. Not sure what's the better way to do it? I think the future warning should be happening already in this PR.
This I can do. Update: done! Also, what do you think about this suggested change (not yet implemented in this PR):
PS. Not sure why the milestone check is not suddenly failing... |
@thomasjpfan Could you clarify what you meant by "removing the PCA SD change for now"? See also my comment above for more considerations. Everything is fixed btw. Cheers! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what you meant by "removing the PCA SD change for now"? See also my comment above for more considerations. Everything is fixed btw. Cheers!
What I meant was that the commented out PCA change can be its own pull request. But looking at this again, I am okay with leaving it in.
I can see that the depredations in this PR are all related, but they could be done with three separate PRs. This makes it easier to review and helps with merging faster. In general, a PR with bigger scope increases the chances of something blocking it from merging.
So we could also adopt the same convention here, cutting down the number of default iterations from 1000 to 750. This of course would need to go through a deprecation cycle, together with the learning_rate='auto'. What do you think?
We can work on this in a follow up PR. This PR is already a net improvement as is.
As for the review, I left comments about using pytest.mark.filterwarnings
instead of ignore_warnings
that applies to all the tests. Otherwise this looks good to go.
sklearn/manifold/_t_sne.py
Outdated
where N is the sample size, following Belkina et al. 2019 and | ||
Kobak et al. 2019, Nature Communications (or to 50.0, if | ||
N / early_exaggeration / 4 < 50). This will become default in 1.2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this references into the References section below and link it here?
scikit-learn/sklearn/manifold/_t_sne.py
Line 638 in 962bd9a
References |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, fixed.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@thomasjpfan Thanks a lot! I changed the handling of the future warnings. All checks pass.
My suggestion is simply to replace
with
I think this does not deserve its own PR... There would nothing else to do really at this point. Or what do you think? |
I am not fully convinced that changing the default number of iteration from 1000 to 750 is necessary. It would probably benefit from a dedicated discussion in a small separate PR. |
Fair enough. This change would not have any other consequences apart from decreasing the runtime by 25%. I'm fine merging this PR without this change if you guys prefer that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this issue @dkobak !
LGTM
This implements suggestions from #18018 (see there for some discussion):
I would still have to implement unit tests for future warnings (haven't done it before) and add the changes to
whats_new
(not quite sure which of the above changes need to be mentioned there). But I'd like to get some feedback from the core developers about whether these suggested changes are all fine. @TomDLT @ogriselUpdate: tests added, changes added.