SequentialLR cannot be used with ReduceLROnPlateau due to .step() not allowing for optional arguments #68978

marcm-ml · 2021-11-29T09:02:48Z

🐛 Bug

Currently, SequentialLR can only be used with schedulers that inherit from _LRScheduler or in other words adhere to the way .step() is called, i.e. without any arguments. This is not the case for the built in ReduceLROnPlateau as it takes in a metric value. Therefore, when calling SequentialLR.step() without arguments, ReduceLROnPlateau will raise an error once its milestone is reached. However, when calling SequentialLR.step(metric), SequentialLR will raise an error due to too many arguments.

Also see Lightning-AI/pytorch-lightning#10759

To Reproduce

scheduler1 = ConstantLR(self.opt, factor=0.1, total_iters=2)
scheduler2 = ReduceLROnPlateau (self.opt, gamma=0.9)
scheduler = SequentialLR(self.opt, schedulers=[scheduler1, scheduler2], milestones=[2])
for epoch in range(100):
   train(...)
   validate(...)
   scheduler.step()  # raises error after epoch>=2
   scheduler.step(metric)  # raises error immediately

Expected behavior

SequentialLR should at least be working with all internal pytorch LR schedulers, hence allow for a metric valued to be passed to ReduceLROnPlateau if it used (or ReduceLROnPlateau needs a rewrite). However, I think SequentialLR should allow for arbitrary arguments in step which allows for arbitrary schedulers (probably another issue in itself).

Either the user takes care of passing the right argument at the right time or there is a built in mechanic (if that is even possible).

cc @vincentqb @jbschlosser @albanD

jaketae · 2022-01-07T14:54:46Z

I would like to take a jab at this if no one hasn't already!

Before I open a PR, I'd like to discuss a few proposals.

Convenient edge-casing

We can simply edge case ReduceLROnPlateau in SequentialLR.step() function. The step function should be able to take an optional argument metrics and will pass that argument to the scheduler if it is an instance of ReduceLROnPlateau.

# class SequentialLR
    def step(self, metrics=None):
        self.last_epoch += 1
        idx = bisect_right(self._milestones, self.last_epoch)
        scheduler = self._schedulers[idx]
        is_reduce_lr_on_plateau = isinstance(scheduler, ReduceLROnPlateau)
        is_new_scheduler = idx > 0 and self._milestones[idx - 1] == self.last_epoch
        if is_reduce_lr_on_plateau and is_new_scheduler:
            scheduler.step(metrics, 0)
        elif is_reduce_lr_on_plateau:
            scheduler.step(metrics)
        elif is_new_scheduler:
            scheduler.step(0)
        else:
            scheduler.step()

More general edge-casing

A problem with the previous approach is that users might create custom schedulers that inherit from ReduceLROnPlateau in ways that may not be accounted for. One solution is to make the step function even more general so that the schedulers can do what they want to do with optional args and kwargs. Instead of checking if a scheduler is an instance of ReduceLROnPlateau, we can check if it is an instance of the _LRScheduler base class.

# class SequentialLR
    def step(self, *args, **kwargs):
        self.last_epoch += 1
        idx = bisect_right(self._milestones, self.last_epoch)
        scheduler = self._schedulers[idx]
        is_lr_scheduler = isinstance(scheduler, _LRScheduler)
        is_new_scheduler = idx > 0 and self._milestones[idx - 1] == self.last_epoch
        if is_lr_scheduler and is_new_scheduler:
            scheduler.step(0)
        elif is_lr_scheduler:
            scheduler.step()
        elif is_new_scheduler:
            scheduler.step(*args, **kwargs, epoch=0)
        else:
            scheduler.step(*args, **kwargs)

Inheritance and modifying _LRScheduler

Given issues like #67760, #68332, and #68979, perhaps we should rewrite ReduceLROnPlateau to inherit from _LRScheduler and modify parts of the base class API. This direction would likely be the most time-consuming but arguably beneficial. I don't have code snippets to show right off the bat, as this would require more thinking.

What do you think? @jbschlosser @albanD

albanD · 2022-01-11T19:31:31Z

Hi!
Thanks for writing down a proposal. I'll take a look with @jbschlosser and come back to you by the end of the week!

francois-rozet · 2022-01-24T08:53:32Z

An alternative is to make the step function of _LRScheduler absorb and ignore all arguments (with *args, **kwargs in the signature) which allows to propagate the arguments of SequentialLR.step without worry. The children of _LRScheduler would be responsible for using or not these arguments. There are several ways to do that.

For instance, we could add a method (say get_get_lr) in _LRScheduler that would absorb all arguments and return self.get_lr(). step would call this function instead of get_lr.

def get_get_lr(self, *args, **kwargs) -> List[float]:
    return self.get_lr()

A scheduler with arguments would simply need to overwrite get_get_lr. For instance, this would allow to rewrite the ReduceLROnPlateau class as

def lrs(optimizer: Optimizer) -> List[float]:
    return [group['lr'] for group in optimizer.param_groups]

class ReduceLROnPlateau(_LRScheduler):
    r"""Reduce learning rate when a metric has stopped improving"""

    def __init__(
        self,
        optimizer: Optimizer,
        gamma: float = 0.5,  # <= 1
        patience: int = 7,
        cooldown: int = 1,
        threshold: float = 1e-2,
        mode: str = 'minimize',  # or 'maximize'
        min_lr: Union[float, List[float]] = 1e-6,
        last_epoch: int = -1,
        verbose: bool = False,
    ):
        self.gamma = gamma
        self.patience = patience
        self.cooldown = cooldown
        self.threshold = threshold
        self.mode = mode

        if type(min_lr) is float:
            min_lr = [min_lr] * len(optimizer.param_groups)
        self.min_lrs = min_lr

        self.best = float('-inf') if self.mode == 'maximize' else float('inf')
        self.last_best = last_epoch
        self.last_reduce = last_epoch

        super().__init__(optimizer, last_epoch, verbose)

    def get_get_lr(self, last: float, *args, **kwargs) -> List[float]:
        return self.get_lr(last)

    def get_lr(self, last: float) -> List[float]:
        if self.mode == 'maximize':
            accept = last >= self.best * (1 + self.threshold)
        else:  # mode == 'minimize'
            accept = last <= self.best * (1 - self.threshold)

        if accept:
            self.best = last
            self.last_best = self.last_epoch

            return lrs(self.optimizer)

        if self.last_epoch - max(self.last_best, self.last_reduce + self.cooldown) <= self.patience:
            return lrs(self.optimizer)

        self.last_reduce = self.last_epoch

        return [
            max(lr * self.gamma, min_lr)
            for lr, min_lr in zip(lrs(self.optimizer), self.min_lrs)
        ]

What do you think? @albanD, @jaketae

jaketae · 2022-01-27T23:14:01Z

Hey @francois-rozet, thanks for the reply and suggestion! It definitely makes sense. I'm personally in support of any solution that prioritizes implementation simplicity and BC.

Gently pinging @albanD to see if there are any updates from the internal PyTorch team on this!

imarquart · 2023-04-18T08:21:22Z

Any updates on this?

albanD · 2023-04-18T13:43:45Z

No update I'm afraid.
The Core team is pretty small and we don't have anyone working on LRScheduler.

My general stance of LRScheduler is that they would require a much bigger (most likely BC-breaking) cleanup to get them in a position where they are reliable.
I would be happy to review a fix for this if they are small and BC but I'm not sure a complex change is worth it given the general state.

AngledLuffa · 2023-05-17T14:30:40Z

What about the schedulers is currently not "reliable"?

One problem I encountered which led me to finding this existing issue is that a ReduceLROnPlateau cannot be used inside a ChainedScheduler, as it needs the metrics parameter to function

AngledLuffa · 2024-03-03T23:59:58Z

Just to ping about this without opening a new issue - is there any possibility of a PR which allows for a scheduler such as ReduceLROnPlateau to be used in a Sequential... or ChainedScheduler?

abbas695 · 2024-05-18T19:01:33Z

is there anyone that found a workaround for this as the above solutions could not help me it says that the metrics is not passed and always none . but when using reducelronplateu alone it works

marcm-ml mentioned this issue Nov 29, 2021

Support for arbitrary schedulers in SequentialLR #68979

Open

soulitzer added module: optimizer Related to torch.optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 29, 2021

jbschlosser added the module: LrScheduler label Dec 13, 2021

GdoongMathew mentioned this issue May 4, 2024

LRScheduler Refactoring #125531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SequentialLR cannot be used with ReduceLROnPlateau due to .step() not allowing for optional arguments #68978

SequentialLR cannot be used with ReduceLROnPlateau due to .step() not allowing for optional arguments #68978

marcm-ml commented Nov 29, 2021 •

edited by pytorch-probot bot

jaketae commented Jan 7, 2022 •

edited

albanD commented Jan 11, 2022

francois-rozet commented Jan 24, 2022

jaketae commented Jan 27, 2022

imarquart commented Apr 18, 2023

albanD commented Apr 18, 2023

AngledLuffa commented May 17, 2023

AngledLuffa commented Mar 3, 2024

abbas695 commented May 18, 2024

SequentialLR cannot be used with ReduceLROnPlateau due to .step() not allowing for optional arguments #68978

SequentialLR cannot be used with ReduceLROnPlateau due to .step() not allowing for optional arguments #68978

Comments

marcm-ml commented Nov 29, 2021 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

jaketae commented Jan 7, 2022 • edited

albanD commented Jan 11, 2022

francois-rozet commented Jan 24, 2022

jaketae commented Jan 27, 2022

imarquart commented Apr 18, 2023

albanD commented Apr 18, 2023

AngledLuffa commented May 17, 2023

AngledLuffa commented Mar 3, 2024

abbas695 commented May 18, 2024

marcm-ml commented Nov 29, 2021 •

edited by pytorch-probot bot

jaketae commented Jan 7, 2022 •

edited