Add support for ReduceLROnPlateau #98

terkelbo · 2019-08-11T16:17:13Z

Is your feature request related to a problem? Please describe.
As of now it does not seem like it is possible to use ReduceLROnPlateau as a metric has to be passed to the step method of the lr_scheduler.

Describe the solution you'd like
A possibility to use ReduceLROnPlateau on some or any of the metrics calculated during training or validation.

Describe alternatives you've considered
In my use case I want to do the step based on a metric calculated on the validation set. As a workaround I define the lr_scheduler in the init of the model and perform the step in the validation_end function

williamFalcon · 2019-08-12T20:05:04Z

@terkelbo good suggestion. can you propose a clean way of doing this? maybe it can be merged with #29

Ir1d · 2019-08-17T03:27:20Z

I think we can directly adjust lr in the optimizer as keras, which means we don't need ReduceLROnPlateau as a metric.

Specifically, perhaps we can consider adding a hook in pytorch_lightning/root_module/root_module.py like optimizer_step, so that this function can be exposed to callbacks?

williamFalcon · 2019-08-21T14:35:27Z

@lr1d good point. you can just adjust the LR in the current callback also (optimizer_step). or in any of the other callbacks

williamFalcon · 2019-08-23T13:15:46Z

closing because this should be implicitly supported since we can pass a ReduceLROnPlateau object... nothing we need to do, this is standard PyTorch functionality

Ir1d · 2019-08-27T17:54:06Z

When using the pytorch object, it requires to pass the metric as a param. for exmaple lr_scheduler.step(val_loss). How do you overwrite the origianl schedulers in thie way?

stolpa4 · 2019-09-16T05:38:52Z

Please, I can't follow. Why the issue is closed? Do you suggest using some hooks like optimizer_step and implement reducing on plateau scheduling by hand?

terkelbo · 2019-09-16T05:47:32Z

I’m not sure either how to pass the ReduceLROnPlateau object as it needs the metric argument as pointed out by @Ir1d. @williamFalcon would it be possible that you give an example of how to use this scheduler with pytorch lightning?

kvhooreb · 2019-09-25T09:49:18Z

It feels rather dirty to me, but you can save the loss in your pl.LightningModule's training_step() method, and then use it in the optimizer_step method. I can't verify whether it works as expected right now though.

def training_step(self, batch, batch_nb):
        # REQUIRED
        x, y = batch
        y_hat = self.forward(x)
        self.loss = F.mse_loss(y, y_hat)
        return {'loss': self.loss}

def configure_optimizers(self):
        # REQUIRED
        # can return multiple optimizers and learning_rate schedulers
        self.optimizer = torch.optim.Adam(self.parameters(), lr=self.lrinit)
        self.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(self.optimizer, mode='min', 
                                                           factor=0.1, patience=10, 
                                                           verbose=False, threshold=0.0001, 
                                                           threshold_mode='rel', cooldown=0, 
                                                           min_lr=0, eps=1e-08)
        return self.optimizer

def optimizer_step(self, epoch_nb, batch_nb, optimizer, optimizer_i):
        """
        Do something instead of the standard optimizer behavior
        :param epoch_nb:
        :param batch_nb:
        :param optimizer:
        :param optimizer_i:
        :return:
        """
        # Sometimes (if closure is not None), this step method should return the loss. Now its None
        loss = optimizer.step()
        # So let's use self.loss, set in training_step, for LR scheduling
        self.scheduler.step(self.loss)
        # clear gradients
        optimizer.zero_grad()

Edit: This will not work as intended, since the optimizer step is called after every batch.

williamFalcon · 2019-09-25T09:53:44Z

we could modify the scheduler step to take in the loss when it needs it? i have to look at this more carefully though

kvhooreb · 2019-09-25T13:09:28Z

That would be the best solution indeed, but then you'd need a way to figure out which ones need the loss when calling step. I'm not sure how to do that at this moment.

For now, I solved it as follows, which may help people until there is a better solution:

Use the on_epoch_start callback on your LightningModule to initialize an empty list
In every training_step call, append the loss to the list
In the on_epoch_end callback, process the list to get the averga loss, call scheduler.step(mean_loss) and clear the list

Ir1d · 2019-09-26T12:39:10Z

Also, at this moment https://github.com/williamFalcon/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L958 lr_scheduler.step() is called at the end of the epoch, while some scheduler (eg: https://pytorch.org/docs/master/optim.html#torch.optim.lr_scheduler.OneCycleLR) should be called at the end of the batch

williamFalcon · 2019-09-26T13:05:27Z

From PT docs.
https://pytorch.org/docs/stable/optim.html

Prior to PyTorch 1.1.0, the learning rate scheduler was expected to be called before the optimizer’s update; 1.1.0 changed this behavior in a BC-breaking way. If you use the learning rate scheduler (calling scheduler.step()) before the optimizer’s update (calling optimizer.step()), this will skip the first value of the learning rate schedule. If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check if you are calling scheduler.step() at the wrong time.

williamFalcon · 2019-09-26T13:10:55Z

Either way, @kvhooreb i added the .step(epoch) fix. Let me know if this works for you.

* always calls the lr scheduler with epoch nb * added docs for cluster grid search * added docs for cluster grid search * undo test changes * undo test changes

LarryThermo · 2020-02-28T01:57:45Z

Would you consider providing (or point me to an example) a simple but complete example of using ReduceLROnPlateau. Thanks, Lars

shawwn · 2020-02-29T19:55:05Z

Yes, a simple example would be great please.

terkelbo added feature Is an improvement or enhancement help wanted Open to be worked on labels Aug 11, 2019

williamFalcon added this to the 0.4.5 milestone Aug 12, 2019

williamFalcon added this to To do in Key features - Roadmap v1.0 Aug 19, 2019

williamFalcon mentioned this issue Aug 21, 2019

Gradient Accumulation Scheduler #149

Closed

williamFalcon closed this as completed Aug 23, 2019

Key features - Roadmap v1.0 automation moved this from backlog to Done Aug 23, 2019

williamFalcon mentioned this issue Sep 26, 2019

always calls the lr scheduler with epoch nb. Fixes #98 #252

Merged

moi90 mentioned this issue Apr 9, 2020

Correctly document ReduceLROnPlateau #1436

Closed

Borda modified the milestones: 0.4.7, v0.4.x Apr 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ReduceLROnPlateau #98

Add support for ReduceLROnPlateau #98

terkelbo commented Aug 11, 2019

williamFalcon commented Aug 12, 2019

Ir1d commented Aug 17, 2019

williamFalcon commented Aug 21, 2019

williamFalcon commented Aug 23, 2019

Ir1d commented Aug 27, 2019

stolpa4 commented Sep 16, 2019

terkelbo commented Sep 16, 2019

kvhooreb commented Sep 25, 2019 •

edited

williamFalcon commented Sep 25, 2019 •

edited

kvhooreb commented Sep 25, 2019

Ir1d commented Sep 26, 2019

williamFalcon commented Sep 26, 2019

williamFalcon commented Sep 26, 2019

LarryThermo commented Feb 28, 2020

shawwn commented Feb 29, 2020

Add support for ReduceLROnPlateau #98

Add support for ReduceLROnPlateau #98

Comments

terkelbo commented Aug 11, 2019

williamFalcon commented Aug 12, 2019

Ir1d commented Aug 17, 2019

williamFalcon commented Aug 21, 2019

williamFalcon commented Aug 23, 2019

Ir1d commented Aug 27, 2019

stolpa4 commented Sep 16, 2019

terkelbo commented Sep 16, 2019

kvhooreb commented Sep 25, 2019 • edited

williamFalcon commented Sep 25, 2019 • edited

kvhooreb commented Sep 25, 2019

Ir1d commented Sep 26, 2019

williamFalcon commented Sep 26, 2019

williamFalcon commented Sep 26, 2019

LarryThermo commented Feb 28, 2020

shawwn commented Feb 29, 2020

kvhooreb commented Sep 25, 2019 •

edited

williamFalcon commented Sep 25, 2019 •

edited