-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: New and Improved LR Suggestion Paradigm #3377
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
||
# Cell | ||
@patch | ||
def lr_find(self:Learner, start_lr=1e-7, end_lr=10, num_it=100, stop_div=True, show_plot=True, suggestions=True): | ||
"Launch a mock training to find a good learning rate, return lr_min, lr_steep if `suggestions` is True" | ||
def lr_find(self:Learner, start_lr=1e-7, end_lr=10, num_it=100, stop_div=True, show_plot=True, suggest_funcs=(SuggestionMethod.Valley)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to leave the suggestions
argument in lr_find to allow for backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's two routes we an take here:
Leave it in and have it check that both suggest_funcs
is not None
and that suggestions=True
, only then having it give suggestions.
Or just raise a deprecation warning. One is more explicit than the other, and maintains the API, so I don't think it would be a bad thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be an @jph00 decision, but I feel leaving it in might make it more a complex api, as now you need to make sure that you have both suggest_funcs and set suggestions to True. So deprecating it and easing it out of the API is favored in my head
If you agree Jeremy then I can have it go and raise a depreciation warning as its set to True
- add missing fastai dependency - Using pytorch 1.9 and fastai 2.5, the default interface for Learner.lr_find changed. See fastai/fastai#3377 Direct: fastai/fastai@b2a6944#diff-6a13257afa218993cdb52a22aa00b2a93943766466e1323b6dd5f6dcfb886ebe
- add missing fastai dependency - Using pytorch 1.9 and fastai 2.5, the default interface for Learner.lr_find changed. See fastai/fastai#3377 Direct: fastai/fastai@b2a6944#diff-6a13257afa218993cdb52a22aa00b2a93943766466e1323b6dd5f6dcfb886ebe
Summary of Changes
This PR introduces two new suggestion methods as well as a new way to plot LR suggestions, based on research performed at Novetta.
Suggestion Methods
I've introduced two new methods, and kept the two originally from Leslie Smith's paper. These are:
valley
slide
minimum
steep
minimum
andsteep
were the two originally in fastai.During experiments we found that Valley consistently out performed the other three, so as a result the default for fastai's learning rate finder has been changed to use the
valley
suggestion, though as many LR's as wanted can be passed in (if more than 9, likely will have issues with the coloring).Each suggestion method should take in three params: a list of the learning rates, losses, and the number of iterations, all found from the
LRFinder
class.Along with this, each func currently lives in a named class
SuggestFunc
, should these functions be privatized or is the namespace class to much? It's a nice convenience function, but I know fastai wants to keep naming to be pretty unique.Below is an example plot with a usage:
What This Does For End Users
Since fastai wants to be on the edge of Deep Learning, this PR ensures that we are ahead of the curve and using so far, what has been seen to be the best way to estimate learning rates automatically.
Added Tests
No tests were added, since the LR Finder can be pretty randomized, but under some hide tags are the outputted results and usage when passing in the raw LRFinder results to each paradigm so devs can see an example.
cc: @jph00