Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement 1cycle learning rate policy #21258

Closed
wants to merge 23 commits into from

Conversation

@mjacar
Copy link
Contributor

mjacar commented Jun 2, 2019

What is this?

This is an implementation of the 1cycle learning rate policy as implemented in the fastai library and as initially introduced in the paper Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.

This was requested in #19129. It has been suggested in both #19129 and #2016 that the 1cycle policy could be implemented in terms of the existing CyclicLR scheduler. However, that isn't actually true. In particular, the 1cycle policy calls for annealing from the maximum learning rate down to a minimum learning rate that is multiple orders of magnitude lower than the initial learning rate. In the CyclicLR scheduler, the initial learning rate and the minimum learning rate are the same.

Why is this important?

It has been shown that the 1cycle learning rate policy can speed up convergence tremendously. In one example from the original paper, a model trained on MNIST could reach 99.03% accuracy after 85 epochs using previous techniques while the same model could reach 99.3% accuracy in just 12 epochs using the 1 cycle policy.

How was this tested?

Test cases were added to test_optim.py.

mjacar added 2 commits Jun 2, 2019
@mjacar

This comment has been minimized.

Copy link
Contributor Author

mjacar commented Jun 2, 2019

The linter is still complaining about the \: sequence on line 864 of lr_scheduler.py. However, unless my eyes are deceiving me, that appears to be exactly the same as the \: sequence on line 707. Am I not seeing something or did line 707 get grandfathered in somehow?

@ezyang ezyang added facebook open source and removed facebook labels Jun 5, 2019
@mjacar

This comment has been minimized.

Copy link
Contributor Author

mjacar commented Jun 18, 2019

@soumith Fair enough if it's an inherently unanswerable question, but is there any rough ETA on when this might get triaged?

@soumith

This comment has been minimized.

Copy link
Member

soumith commented Jun 20, 2019

cc: @vincentqb can you review this and get it to completion if it makes sense. Use the guidelines that I shared with you separately.

@mjacar again, sorry for the delay -- we have Vincent now, who is owning the reviews for optimizers and LR schedulers (LR schedulers are sufficently owned by @ezyang as well :) )

@Ir1d

This comment has been minimized.

Copy link
Contributor

Ir1d commented Jul 10, 2019

@vincentqb Hi, any update on this PR? Really looking forward to this 1cycle policy. Thx in advance.

@mjacar mjacar force-pushed the mjacar:one-cycle branch from 98304d7 to 95040a4 Jul 11, 2019
@vincentqb

This comment has been minimized.

Copy link
Contributor

vincentqb commented Jul 19, 2019

Before moving forward, I'd like to make sure it is well-established (as mentioned in the guidelines). Do you have other applications/demonstrations of 1cycle beyond the one mentioned in the original paper?

@mjacar

This comment has been minimized.

Copy link
Contributor Author

mjacar commented Jul 21, 2019

@vincentqb

The popularity of this method comes largely from Jeremy Howard's fast.ai MOOC course where he has taught this method as essentially the go-to method for training models quickly for about a year now. As such, there are a number of demonstrations of the method's usefulness across a variety of different domains and model architectures in just the course notes themselves.

Just going through the course notes and finding all references to fit_one_cycle yields the following:
lesson1-pets: CNN image classifier on the Oxford-IIIT Pet dataset
lesson2-download: Another CNN image classifier on a custom dataset
lesson3-planet: Multi-label CNN image classifier on the Planet dataset
lesson3-imdb: Finetuned an RNN language model and then used that as a basis for transfer learning for training an RNN text classifier on the Large Movie Review Dataset
lesson3-head-pose: CNN head pose estimator on the Biwi Kinect Head Pose dataset
lesson3-camvid: U-Net image segmentation model on the Camvid dataset
lesson3-camvid-tiramisu: Another U-Net based image segmentation model on the Camvid dataset
lesson4-collab: Collaborative filtering model on the Movielens 100k dataset
lesson6-rossmann: Feedforward model for the tabular Rossman dataset
lesson6-pets-more: Another CNN pet classifier on the Oxford Pet dataset
lesson7-superres: U-Net based super-resolution model
lesson7-superres-imagenet: Another U-Net based super-resolution model
lesson7-superres-gan: Another U-Net based super-resolution model and another CNN based image classifier
lesson7-resnet-mnist: More CNN image classifiers
lesson-7-human-numbers: Some feedforward and RNN models

Plus, I would argue the fact that this has come up already before in #19129 and #2016 gives more credence to the argument of this method's notability/usefulness.

@mjacar

This comment has been minimized.

Copy link
Contributor Author

mjacar commented Aug 7, 2019

@vincentqb Any ETA on when this might get looked at? Is resolving #23306 a necessary prerequisite?

Copy link
Contributor

vincentqb left a comment

Thanks for the information above. Yes, I recommend that we proceed with this PR. :)

  • As you pointed out, #23306 is planning to introduce chainable schedulers. As far as I can see, this is not a chainable scheduler, and overrides the current learning rate value, is that correct?
  • It doesn't seem like the epoch parameter for step needed, is that correct? We are moving away from it as part of #23306.
torch/optim/lr_scheduler.py Outdated Show resolved Hide resolved
@mjacar

This comment has been minimized.

Copy link
Contributor Author

mjacar commented Aug 13, 2019

As you pointed out, #23306 is planning to introduce chainable schedulers. As far as I can see, this is not a chainable scheduler, and overrides the current learning rate value, is that correct?

Yes. In fact, I'm not really sure what making a 1cycle policy "chainable" would even mean necessarily. It's a specific policy and not really a composable abstraction in any sense that is obvious to me. Do all schedulers need to be chainable?

It doesn't seem like the epoch parameter for step needed, is that correct? We are moving away from it as part of #23306.

I believe that's correct.

@vincentqb

This comment has been minimized.

Copy link
Contributor

vincentqb commented Aug 13, 2019

Yes. In fact, I'm not really sure what making a 1cycle policy "chainable" would even mean necessarily. It's a specific policy and not really a composable abstraction in any sense that is obvious to me. Do all schedulers need to be chainable?

No, they don't need to all be chainable. I just wanted to make sure we were implementing such a form if available. We should mention this is not chainable in the docstring though.

torch/optim/lr_scheduler.py Show resolved Hide resolved
test/test_optim.py Show resolved Hide resolved
torch/optim/lr_scheduler.py Outdated Show resolved Hide resolved
Ubuntu and others added 5 commits Aug 20, 2019
Ubuntu
Ubuntu
@mjacar

This comment has been minimized.

Copy link
Contributor Author

mjacar commented Aug 21, 2019

@vincentqb I believe I've responded to all unresolved feedback with the latest changes. Feel free to take a look whenever you get the chance.

@vincentqb

This comment has been minimized.

Copy link
Contributor

vincentqb commented Aug 21, 2019

@pytorchbot retest this please

Copy link
Contributor

facebook-github-bot left a comment

@vincentqb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@vincentqb vincentqb mentioned this pull request Aug 22, 2019
@vincentqb

This comment has been minimized.

Copy link
Contributor

vincentqb commented Aug 26, 2019

The linter is still complaining about the \: sequence on line 864 of lr_scheduler.py. However, unless my eyes are deceiving me, that appears to be exactly the same as the \: sequence on line 707. Am I not seeing something or did line 707 get grandfathered in somehow?

The string has to be prefixed: r"\:", see here. Updated.

This was referenced Aug 28, 2019
@vincentqb

This comment has been minimized.

Copy link
Contributor

vincentqb commented Aug 28, 2019

Landing from #25325

facebook-github-bot added a commit that referenced this pull request Aug 28, 2019
Summary:
Squash rebase of #21258

ghstack-source-id: 7d3ce522ac4dd3050bc6c6bbda1eaaeb8bc4b2c1
Pull Request resolved: #25324
Pull Request resolved: #25325

Differential Revision: D17095722

Pulled By: vincentqb

fbshipit-source-id: 7fe69b210924ee3b39223dd78122aea61267234a
@vincentqb vincentqb closed this Aug 29, 2019
@mjacar mjacar deleted the mjacar:one-cycle branch Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.