-
Notifications
You must be signed in to change notification settings - Fork 565
demonstration of scheduler-style learning rate #1012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that none of the existing pytorch schedulers allow for warmup epochs: https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html |
yeah I vote for this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I have a slight preference for the optimizer style as I see it cluttering less the main loop.
No very strong opinion though.
What do others think?
I think this scheduler-style fits best with the Pytorch model. They use scheduler objects to wrap optimizers to update learning rates, and now I realize they are already moving toward a step-based scheduler update system. They have 1 scheduler already where they hack in a step-dependent scheduler.step() call: https://github.com/pytorch/pytorch/blob/master/torch/optim/lr_scheduler.py#L715-L729 The version I've made in this PR is very similar, except I explicitly provide the step arg instead of the hack they do with encoding step+epoch in the epoch arg in the example linked above. The resulting train loop I have in this PR is very similar to their example train loop for that step-dependent scheduler. Theirs looks like this:
|
As discussed offline on Friday, I prefer the |
Subclassing, like I commented in that PR, would have been the way. Wrapping would have. As I said, I only have a slight preference for the optimizer wrapping, so I am essentially OK with both. But let's see if @ailzhang can ask inside the pytorch community for guidance ... |
…enet-learning-rate-scheduler
…enet-learning-rate-scheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you update tensorflow commit ID?
I haven't touched anything in third_party/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure this works on XLA CPU, and pytorch CPU (pass []
as devices to DataParallel
).
Just checked using --num_cores=0 and also
Both versions were updating the optimizer learning rate correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I'm still working through a new accuracy issue here. When I kick off resnet50 on imagenet now, the job gets to 12% accuracy on epoch 2 or 3 and then stays at that accuracy forever. Trying to tell if this is related to this change or to new code that went in since I created the original red VM. Also trying to tell if this is the result of building from head (my runs where I got good accuracy were using the pytorch-nightly conda env, not building from head) |
An update: I patched this PR into a fresh pytorch-nightly conda env in a new red VM and accuracy was >76% at 90 epochs. Therefore, this PR seems pretty safe to submit. I also wanted to patch this PR and build from head using build_torch_wheels.sh. However, AFAICT at the time I grabbed the source from master, Torch had submitted this PR, which broke this PR and I wasn't able to run. I didn't have time to investigate until now, but they fixed it soon after here. I want to make another red VM, pull latest source, patch in this PR, build from head, and run for 90 epochs to verify that the torch bug is fixed and also that we can pass 76% accuracy using this PR + build_torch_wheels.sh |
Showing option #3 for the learning rate scheduling that we need for resnet50 to pass 76% accuracy.
See here for option #2 and here for option #1.
At first I was hesitant about this route since so it differs from the Pytorch paradigm for how they intend schedulers to be used (which is to call scheduler.step() once per epoch).
However, the code is much cleaner in this version and it will be easier to assign models to different schedulers as we expand the number of models we support.