New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement auto early exaggeration #220
Merged
Merged
Changes from 2 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
421598e
Implement auto early exaggeration
dkobak f5959e2
Fix the defaults
dkobak b68b100
Switch to lr=N/ex and momentum=.8
dkobak 34952c1
fix import in tests
dkobak 9b222d1
Fix the test
dkobak 1316d3b
Fix assertion call
dkobak 5ec7e79
Fix n_iter in tests
dkobak File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the correct put to put this. This would set the learning rate for the duration of the whole embedding, not just the early exaggeration phase, right? I think the best place to put this might actually be inside the
gradient_descent
call function, so the correct rescaling will happen during any call to.optimize
, not just the standardTSNE().fit
call.I wasn't aware that this was an issue during the early exaggeration phase at all. It is actually necessary to rescale the learning rate, if the exaggeration isn't 1? In that case, wouldn't it also make sense to rescale it given any exaggeration value, not just the early exaggeration phase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current default openTSNE behaviour (which is the same in FIt-SNE and about to become default in soon-to-be-released sklearn 1.2) is to use learning rate N/12. Here 12 corresponds to the early exaggeration, the idea being that if learning_rate * exaggeration > N, then the optimization may diverge (as shown by Linderman & Steinerberger). One COULD use learning rate N, instead of N/12, for the 2nd optimization phase, once early exaggeration is turned off. But we do not do it. We could decide to do it, and I think it may even be a good idea, but this will make openTSNE default behaviour different from other implementations. In any case, this I think requires a different PR and a separate issue, maybe comparing current and suggested scheme on a range of datasets.
In this PR I assumed that we want to keep the learning rate constant throughout optimization. Which is why I set it to N/early_exaggeration and keep the same learning rate for the second phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that makes sense, I'd forgotten about that. However, I'm a bit unsure what the best way about implementing would be. We currently handle
learning_rate="auto"
here in the_handle_nice_params
function. However, this function is called in every.optimize
step, so that we can perform whatever optimization sequence we want.With the way you've implemented it now, we'd be going around this function because we'd change the
learning_rate
fromauto
to a number before we even got to this function. This would also cause inconsistent behaviour betweenand
since the first one would then use the "correct" learning rate, while the second would use
N/12
in both cases. I am very much opposed to this inconsistency, but I'm not sure what the best way to handle this would be.Perhaps the best course of action would be to investigate rescaling the learning rate based on exaggeration, so we'd end up with a different learning rate for the early exaggeration and the standard optimization phase. I think this would best fit into our framework, and it seems the most principled approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. I will start a separate issue.