New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lr scheduler #1370
Lr scheduler #1370
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks generally good. I've added some inline comments.
This needs some unit tests. Basically:
- create an optimizer and LR scheduler
- for a few different values of 'epoch', call step on the scheduler and check that the LR of the optimizer is correct
torch/optim/lr_scheduler.py
Outdated
self.zip = zip(optimizer.param_groups, base_lrs, lr_lambdas) | ||
|
||
def step(self, epoch): | ||
for param_group, base_lr, lr_lambda in self.zip: |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
if self.mode not in ['min', 'max']: | ||
raise RuntimeError('Learning Rate Plateau Reducing mode %s is unknown!') | ||
if self.mode == 'min': | ||
self.monitor_op = lambda a, b: np.less(a, b - self.epsilon) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
def _reset(self): | ||
"""Resets wait counter and cooldown counter. | ||
""" | ||
if self.mode not in ['min', 'max']: |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
self.wait = 0 | ||
self.lr_epsilon = self.min_lr * 1e-4 | ||
|
||
def reset(self): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
|
||
def step(self, epoch, metrics): | ||
current = metrics | ||
if current is None: |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Not sure where to put the unit tester. I have put it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for the most part, but I think some parts could be simplified. Thanks for the PR!
torch/optim/lr_scheduler.py
Outdated
param_group['lr'] = self.base_lr * self.lr_lambda(epoch) | ||
|
||
|
||
class GroupLambdaLR(object): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
from torch.optim.optimizer import Optimizer | ||
|
||
|
||
class LambdaLR(object): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
>>> validate(...) | ||
""" | ||
|
||
def __init__(self, optimizer, base_lr=0.1, gamma=0.1, step_size=30): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
>>> validate(...) | ||
""" | ||
|
||
def __init__(self, optimizer, base_lr=0.1, gamma=0.1, milestones=(10, 20, 30)): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
be reduced. new_lr = lr * factor | ||
patience: number of epochs with no improvement | ||
after which learning rate will be reduced. | ||
verbose: int. 0: quiet, 1: update messages. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
raise RuntimeError('Learning Rate Plateau threshold mode %s is unknown!') | ||
if mode == 'min' and threshold_mode == 'rel': | ||
rel_epsilon = 1. - threshold | ||
self.monitor_op = lambda a, best: np.less(a, best * rel_epsilon) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
raise RuntimeError('Learning Rate Plateau threshold mode %s is unknown!') | ||
if mode == 'min' and threshold_mode == 'rel': | ||
rel_epsilon = 1. - threshold | ||
self.monitor_op = lambda a, best: np.less(a, best * rel_epsilon) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
self.cooldown_counter -= 1 | ||
self.wait = 0 | ||
|
||
if self.monitor_op(current, self.best): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
self.best = self.monitor_op.worse | ||
self.cooldown_counter = 0 | ||
self.wait = 0 | ||
self.lr_epsilon = self.min_lr * 1e-4 |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
new_lr = max(new_lr, self.min_lr) | ||
param_group['lr'] = new_lr | ||
if self.verbose > 0: | ||
print('Epoch %05d: reducing learning rate of group %d to %s.' % (epoch, inx_group, new_lr)) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Also, tests should go to |
@Jiaming-Liu i think this is good to go. You should also add docstrings for LambdaLR and GroupLambdaLR and add references in https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/optim.rst so that they will show up in documentation as well.
and then locally generated html documentation similar to pytorch.org/docs/ will be in docs/build/html |
@pytorchbot test this please |
@pytorchbot test this please |
1 similar comment
@pytorchbot test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the extensive of lambdas will prevent these classes from being pickled. (They probably could just be instance methods)
torch/optim/lr_scheduler.py
Outdated
|
||
|
||
class LambdaLR(object): | ||
def __init__(self, optimizer, lr_lambda): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
self.lr_lambdas = list(lr_lambda) | ||
self.last_epoch = -1 | ||
|
||
def step(self, epoch=None): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Yes, |
This error is weird. Any idea? @apaszke
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now and should be ready to merge after these final fixes. Can you also add the schedulers to docs/source/optim.rst
?
raise KeyError("param 'initial_lr' is not specified " | ||
"in param_groups[{}] when resuming an optimizer".format(i)) | ||
self.base_lrs = list(map(lambda group: group['initial_lr'], optimizer.param_groups)) | ||
self.step(last_epoch + 1) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
param_group['lr'] = lr | ||
|
||
|
||
class LambdaLR(_LRScheduler): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/optim/lr_scheduler.py
Outdated
class LambdaLR(_LRScheduler): | ||
def __init__(self, optimizer, lr_lambda, last_epoch=-1): | ||
self.optimizer = optimizer | ||
self.base_lrs = list(map(lambda group: group['lr'], optimizer.param_groups)) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Also, can you try rebasing on top of master? The test might have been fixed in some other commit. |
Now it seems like a good time to update the documentation. I will have it done within a week. Thanks for the reviews! |
Rebasing doesn't help the error :(. Any ideas? |
The test looks like it's been fixed at 368ecb4. Rebasing on top of master fixes the error. |
|
Can one of the admins verify this patch? |
3 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
@pytorchbot test this please |
As far as I see there is only one optimizer being kept, so on learning rate drop all other parameters are also kept. How would one add momentum resetting on learning rate drops in SGD? |
@soumith Kindly mention this pr in some release note to increase visibility. |
hey jiaming. I'm really sorry for missing this commit in the release notes. It looks like I missed 4 commits by mistake. I've updated the release notes now, and I've made a note for myself to check if repeating the note about learning rate schedules will be appropriate for the next release as well (to increase visibility) |
so this is in the new release 0.2.0 ? great! |
@soumith My post on PyTorch forum about LR schedules is still getting more likes every week, so I think people are not aware of this PR. You should consider megaphoning this PR in future release notes. |
>>> lambda2 = lambda epoch: 0.95 ** epoch | ||
>>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) | ||
>>> for epoch in range(100): | ||
>>> scheduler.step() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
* force segment un-connected graphs * derive heuristic on empty groups * add test * lint * handled aliased output in batchnorm * empty tensor * lint and comment * clang format * check reference tv available in pointwise scheduler * comment * cleanup test and check utils
* fix typo * Update test_pipeline_parallel_fwd_bwd.py
* Triton build conditionalized on ROCM_VERSION (cherry picked from commit 1a7e1fa) * Update pinned commit for rocm6.1 conditionalisation --------- Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com>
Providing a unified LR scheduler.
Currently supports: