Lr scheduler #1370

Jiaming-Liu · 2017-04-26T22:44:30Z

Providing a unified LR scheduler.

Currently supports:

ReduceLROnPlateau (ported from Keras)
LambdaLR
StepLR
MultiStepLR
ExponentialLR
GroupLambdaLR (Need testing)

colesbury

This looks generally good. I've added some inline comments.

This needs some unit tests. Basically:

create an optimizer and LR scheduler
for a few different values of 'epoch', call step on the scheduler and check that the LR of the optimizer is correct

torch/optim/lr_scheduler.py

+        self.zip = zip(optimizer.param_groups, base_lrs, lr_lambdas)
+
+    def step(self, epoch):
+        for param_group, base_lr, lr_lambda in self.zip:


torch/optim/lr_scheduler.py

+        if self.mode not in ['min', 'max']:
+            raise RuntimeError('Learning Rate Plateau Reducing mode %s is unknown!')
+        if self.mode == 'min':
+            self.monitor_op = lambda a, b: np.less(a, b - self.epsilon)


torch/optim/lr_scheduler.py

+    def _reset(self):
+        """Resets wait counter and cooldown counter.
+        """
+        if self.mode not in ['min', 'max']:


torch/optim/lr_scheduler.py

+        self.wait = 0
+        self.lr_epsilon = self.min_lr * 1e-4
+
+    def reset(self):


torch/optim/lr_scheduler.py

+
+    def step(self, epoch, metrics):
+        current = metrics
+        if current is None:


Jiaming-Liu · 2017-04-27T22:22:52Z

Not sure where to put the unit tester. I have put it here.

apaszke

Looks good for the most part, but I think some parts could be simplified. Thanks for the PR!

torch/optim/lr_scheduler.py

+            param_group['lr'] = self.base_lr * self.lr_lambda(epoch)
+
+
+class GroupLambdaLR(object):


torch/optim/lr_scheduler.py

+from torch.optim.optimizer import Optimizer
+
+
+class LambdaLR(object):


torch/optim/lr_scheduler.py

+        >>>     validate(...)
+    """
+
+    def __init__(self, optimizer, base_lr=0.1, gamma=0.1, step_size=30):


torch/optim/lr_scheduler.py

+        >>>     validate(...)
+    """
+
+    def __init__(self, optimizer, base_lr=0.1, gamma=0.1, milestones=(10, 20, 30)):


torch/optim/lr_scheduler.py

+            be reduced. new_lr = lr * factor
+        patience: number of epochs with no improvement
+            after which learning rate will be reduced.
+        verbose: int. 0: quiet, 1: update messages.


torch/optim/lr_scheduler.py

+            raise RuntimeError('Learning Rate Plateau threshold mode %s is unknown!')
+        if mode == 'min' and threshold_mode == 'rel':
+            rel_epsilon = 1. - threshold
+            self.monitor_op = lambda a, best: np.less(a, best * rel_epsilon)


torch/optim/lr_scheduler.py

+            raise RuntimeError('Learning Rate Plateau threshold mode %s is unknown!')
+        if mode == 'min' and threshold_mode == 'rel':
+            rel_epsilon = 1. - threshold
+            self.monitor_op = lambda a, best: np.less(a, best * rel_epsilon)


torch/optim/lr_scheduler.py

+            self.cooldown_counter -= 1
+            self.wait = 0
+
+        if self.monitor_op(current, self.best):


torch/optim/lr_scheduler.py

+        self.best = self.monitor_op.worse
+        self.cooldown_counter = 0
+        self.wait = 0
+        self.lr_epsilon = self.min_lr * 1e-4


torch/optim/lr_scheduler.py

+                new_lr = max(new_lr, self.min_lr)
+                param_group['lr'] = new_lr
+                if self.verbose > 0:
+                    print('Epoch %05d: reducing learning rate of group %d to %s.' % (epoch, inx_group, new_lr))


apaszke · 2017-04-28T20:44:39Z

Also, tests should go to test_optim.py

soumith · 2017-05-04T11:20:24Z

@Jiaming-Liu i think this is good to go. You should also add docstrings for LambdaLR and GroupLambdaLR and add references in https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/optim.rst so that they will show up in documentation as well.
You can test documentation locally by doing this:

cd docs
pip install -r requirements.txt
make clean && make html

and then locally generated html documentation similar to pytorch.org/docs/ will be in docs/build/html

apaszke · 2017-05-04T12:08:35Z

@pytorchbot test this please

apaszke · 2017-05-04T12:20:52Z

@pytorchbot test this please

apaszke · 2017-05-04T12:21:09Z

@pytorchbot test this please

colesbury

I think the extensive of lambdas will prevent these classes from being pickled. (They probably could just be instance methods)

torch/optim/lr_scheduler.py

+
+
+class LambdaLR(object):
+    def __init__(self, optimizer, lr_lambda):


torch/optim/lr_scheduler.py

+            self.lr_lambdas = list(lr_lambda)
+        self.last_epoch = -1
+
+    def step(self, epoch=None):


apaszke · 2017-05-10T15:23:58Z

Yes, get_lr would be a better name. Also, please remove base_lr and use the code I wrote in the commit comments.

Jiaming-Liu · 2017-05-12T00:28:07Z

This error is weird. Any idea? @apaszke

Running optim tests
..............F........
FAIL: test_adagrad_sparse (main.TestOptim)
Traceback (most recent call last):
File "test_optim.py", line 285, in test_adagrad_sparse
lambda params: optim.Adagrad(params, lr=1e-1)
File "test_optim.py", line 103, in _test_rosenbrock_sparse
self.assertLessEqual(params.data.dist(solution), initial_dist)
AssertionError: 0.7290316658655626 not less than or equal to 0.7071067811865476

apaszke

Looks good now and should be ready to merge after these final fixes. Can you also add the schedulers to docs/source/optim.rst?

torch/optim/lr_scheduler.py

+                    raise KeyError("param 'initial_lr' is not specified "
+                                   "in param_groups[{}] when resuming an optimizer".format(i))
+        self.base_lrs = list(map(lambda group: group['initial_lr'], optimizer.param_groups))
+        self.step(last_epoch + 1)


torch/optim/lr_scheduler.py

+            param_group['lr'] = lr
+
+
+class LambdaLR(_LRScheduler):


torch/optim/lr_scheduler.py

+class LambdaLR(_LRScheduler):
+    def __init__(self, optimizer, lr_lambda, last_epoch=-1):
+        self.optimizer = optimizer
+        self.base_lrs = list(map(lambda group: group['lr'], optimizer.param_groups))


apaszke · 2017-05-14T21:00:50Z

Also, can you try rebasing on top of master? The test might have been fixed in some other commit.

Jiaming-Liu · 2017-05-14T22:51:17Z

Now it seems like a good time to update the documentation. I will have it done within a week. Thanks for the reviews!

Jiaming-Liu · 2017-05-14T23:16:21Z

Rebasing doesn't help the error :(. Any ideas?

thomasjpfan · 2017-05-18T02:09:59Z

The test looks like it's been fixed at 368ecb4. Rebasing on top of master fixes the error.

Jiaming-Liu · 2017-05-18T02:18:27Z

~~umm... something is wrong while rebasing~~

~~I will try to solve it tomorrow~~ ~~Solved, but 591ea75 is still here~~ Solved

pytorchbot · 2017-05-20T19:25:08Z

Can one of the admins verify this patch?

pytorchbot · 2017-05-20T19:25:08Z

Can one of the admins verify this patch?

pytorchbot · 2017-05-20T19:25:08Z

Can one of the admins verify this patch?

pytorchbot · 2017-05-20T19:25:08Z

Can one of the admins verify this patch?

soumith · 2017-05-25T19:44:39Z

@pytorchbot test this please

szagoruyko · 2017-05-25T20:37:27Z

As far as I see there is only one optimizer being kept, so on learning rate drop all other parameters are also kept. How would one add momentum resetting on learning rate drops in SGD?

Jiaming-Liu · 2017-08-06T15:04:59Z

@soumith Kindly mention this pr in some release note to increase visibility.

soumith · 2017-08-06T19:05:31Z

hey jiaming. I'm really sorry for missing this commit in the release notes. It looks like I missed 4 commits by mistake. I've updated the release notes now, and I've made a note for myself to check if repeating the note about learning rate schedules will be appropriate for the next release as well (to increase visibility)

jtoy · 2017-08-07T18:49:30Z

so this is in the new release 0.2.0 ? great!

FuriouslyCurious · 2017-08-18T22:01:31Z

@soumith My post on PyTorch forum about LR schedules is still getting more likes every week, so I think people are not aware of this PR. You should consider megaphoning this PR in future release notes.

https://discuss.pytorch.org/t/adaptive-learning-rate/320/10

torch/optim/lr_scheduler.py

+        >>> lambda2 = lambda epoch: 0.95 ** epoch
+        >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
+        >>> for epoch in range(100):
+        >>>     scheduler.step()


* force segment un-connected graphs * derive heuristic on empty groups * add test * lint * handled aliased output in batchnorm * empty tensor * lint and comment * clang format * check reference tv available in pointwise scheduler * comment * cleanup test and check utils

* fix typo * Update test_pipeline_parallel_fwd_bwd.py

* Triton build conditionalized on ROCM_VERSION (cherry picked from commit 1a7e1fa) * Update pinned commit for rocm6.1 conditionalisation --------- Co-authored-by: Pruthvi Madugundu <pruthvigithub@gmail.com>

colesbury reviewed Apr 27, 2017

View reviewed changes

apaszke reviewed Apr 28, 2017

View reviewed changes

soumith approved these changes May 3, 2017

View reviewed changes

apaszke force-pushed the LR-Scheduler branch from d682bd0 to 58d8b60 Compare May 4, 2017 12:10

colesbury reviewed May 8, 2017

View reviewed changes

szagoruyko mentioned this pull request May 9, 2017

[WIP] Hooks OOP design pytorch/tnt#26

Closed

2 tasks

apaszke reviewed May 14, 2017

View reviewed changes

Jiaming-Liu force-pushed the LR-Scheduler branch from 7ab4a64 to c67dfbe Compare May 14, 2017 23:07

Jiaming-Liu force-pushed the LR-Scheduler branch from c6b6ef7 to ca45f12 Compare May 18, 2017 02:16

Jiaming-Liu force-pushed the LR-Scheduler branch from ca45f12 to 0ede38e Compare May 18, 2017 05:26

Jiaming-Liu added 7 commits May 17, 2017 22:41

Update README.md

33f48b3

Create lr_scheduler.py

9f48a2c

Update README.md

6efd13b

Update README.md

91e1ba7

Update README.md

872470c

Update README.md

6ccbbd8

Update README.md

dfcbe1e

Minor Change in Test

e9bdc98

Jiaming-Liu force-pushed the LR-Scheduler branch from 0ede38e to e9bdc98 Compare May 18, 2017 05:46

Update docs

daf95ee

Jiaming-Liu added 2 commits May 20, 2017 13:19

Update optim.rst

c7750a1

Minor doc fix

db59187

soumith merged commit 630af4d into pytorch:master May 25, 2017

chsasank mentioned this pull request May 27, 2017

Better way to update LR etc in optim.Optimizer #1244

Closed

Kaixhin mentioned this pull request Jun 26, 2017

[Feature Request] Cyclical Learning Rates #1909

Closed

nasimrahaman mentioned this pull request Jul 6, 2017

New optimizer for L1 and L2 penalty inferno-pytorch/inferno#2

Merged

This was referenced Aug 23, 2017

ReduceLROnPlateau with a naive Backtracking #2478

Open

Counter-intuitive Patience & Cooldown of ReduceLROnPlateau #2545

Open

victorhcm reviewed Dec 1, 2017

View reviewed changes

torch/optim/lr_scheduler.py

>>> lambda2 = lambda epoch: 0.95 ** epoch

>>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])

>>> for epoch in range(100):

>>> scheduler.step()

This comment was marked as off-topic.

Sign in to view

zou3519 mentioned this pull request May 2, 2018

Fix ReduceLROnPlateau's patience mechanic. #7151

Closed

vincentqb mentioned this pull request Jun 10, 2019

StepLR, MultiStepLR, ExponentialLR and CosineAnnealingLR scheduler wrong lr value #20527

Closed

ezyang added the open source label Jun 24, 2019

vincentqb mentioned this pull request Jul 18, 2019

[WIP] Revert "Revert "Redefine scheduler to set learning rate using recursive formula" #14010 (#21463)" and enable closed form with non-sequential epoch parameter #21800

Closed

hubertlu-tw pushed a commit to hubertlu-tw/pytorch that referenced this pull request Nov 1, 2022

[transformer][pipeline parallel] fix typo in test (pytorch#1370)

c3018b1

* fix typo * Update test_pipeline_parallel_fwd_bwd.py

snadampal mentioned this pull request Apr 4, 2023

[v2.0.1] Release Tracker #97272

Closed

		param_group['lr'] = self.base_lr * self.lr_lambda(epoch)


		class GroupLambdaLR(object):

		from torch.optim.optimizer import Optimizer


		class LambdaLR(object):



		class LambdaLR(object):
		def __init__(self, optimizer, lr_lambda):

Lr scheduler #1370

Lr scheduler #1370

Conversation

Jiaming-Liu commented Apr 26, 2017 • edited

colesbury left a comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Jiaming-Liu commented Apr 27, 2017

apaszke left a comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

apaszke commented Apr 28, 2017

soumith commented May 4, 2017

apaszke commented May 4, 2017

apaszke commented May 4, 2017 • edited

apaszke commented May 4, 2017

colesbury left a comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

apaszke commented May 10, 2017

Jiaming-Liu commented May 12, 2017

apaszke left a comment

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

apaszke commented May 14, 2017

Jiaming-Liu commented May 14, 2017

Jiaming-Liu commented May 14, 2017

thomasjpfan commented May 18, 2017

Jiaming-Liu commented May 18, 2017 • edited

pytorchbot commented May 20, 2017

pytorchbot commented May 20, 2017

pytorchbot commented May 20, 2017

pytorchbot commented May 20, 2017

soumith commented May 25, 2017

szagoruyko commented May 25, 2017

Jiaming-Liu commented Aug 6, 2017

soumith commented Aug 6, 2017

jtoy commented Aug 7, 2017

FuriouslyCurious commented Aug 18, 2017

This comment was marked as off-topic.

Jiaming-Liu commented Apr 26, 2017 •

edited

apaszke commented May 4, 2017 •

edited

Jiaming-Liu commented May 18, 2017 •

edited