ENH: Implement learning rate scheduling for variational inference #7010

alvaropp · 2023-11-14T13:26:03Z

Before

I’ve started using PyMC’s variational inference to fit my models. I have no prior experience with this, but I guess there’s parallels with training neural networks, where the choice of optimiser and learning rate have a huge impact on training quality and speed. A common technique for training neural networks is using learning rate schedulers which reduce the learning rate on a schedule, to get faster convergence by starting high and reducing it in successive epochs where you want to be more precise.

Currently, in PyMC, you need to specify a suitable learning rate that is used for fitting the model. Too large and it won't converge, too small and it will be too slow. Training once with a large-ish learning rate and then taking the results of that training round as a starting point for another training round with a smaller learning rate is not trivial and not very elegant.

After

You use a callback to dynamically reduce the optimiser's learning rate if the loss has stagnated and keep training:

learning_rate = pytensor.shared(1e-1, "learning_rate")
optimiser = pm.adam(learning_rate=learning_rate)

fit = pm.fit(
    obj_optimizer=optimiser,
    callbacks=[
        LearningRateScheduler(
            initial_learning_rate=learning_rate,
            factor=0.1,
            patience=10,
            min_lr=1e-4,
            cooldown=10,
            verbose=False,
        ),
    ],
)

Context for the issue:

No response

welcome · 2023-11-14T13:26:05Z

🎉 Welcome to PyMC! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

alvaropp · 2023-11-14T13:27:21Z

I've been discussing this with @jessegrabowski in the PyMC forum: https://discourse.pymc.io/t/learning-rate-scheduling-for-variational-inference/13075/1

alvaropp · 2023-11-14T13:28:09Z

I suggest we follow Keras' ReduceLROnPlateau and implement this learning rate scheduling as a callback as follows:

import numpy as np
from pymc.variational.callbacks import Callback

class ReduceLROnPlateau(Callback):
    """Reduce learning rate when the loss has stopped improving.

    This is inspired by Keras' homonymous callback:
    https://github.com/keras-team/keras/blob/v2.14.0/keras/callbacks.py

    Parameters
    ----------
    learning_rate: pytensor.shared
        shared variable containing the learning rate
    factor: float
        factor by which the learning rate will be reduced: `new_lr = lr * factor`
    patience: int
        number of epochs with no improvement after which learning rate will be reduced
    min_lr: float
        lower bound on the learning rate
    cooldown: int
        number of iterations to wait before resuming normal operation after lr has been reduced
    verbose: bool
        False: quiet, True: update messages
    """

    def __init__(
        self,
        initial_learning_rate: pytensor.shared,
        factor=0.1,
        patience=10,
        min_lr=1e-6,
        cooldown=0,
        verbose=True,
    ):
        self.learning_rate = initial_learning_rate
        self.factor = factor
        self.patience = patience
        self.min_lr = min_lr
        self.cooldown = cooldown
        self.verbose = verbose
        self.cooldown_counter = 0
        self.wait = 0
        self.best = float("inf")
        self.old_lr = None

    def __call__(self, approx, loss_hist, i):
        current = loss_hist[-1]

        if np.isinf(current):
            return

        if self.in_cooldown():
            self.cooldown_counter -= 1
            self.wait = 0
            return

        if current < self.best:
            self.best = current
            self.wait = 0
        elif not np.isinf(self.best):
            self.wait += 1
            if self.wait >= self.patience:
                self.reduce_lr()
                self.cooldown_counter = self.cooldown
                self.wait = 0

    def reduce_lr(self):
        old_lr = float(self.learning_rate.get_value())
        if old_lr > self.min_lr:
            new_lr = max(old_lr * self.factor, self.min_lr)
            self.learning_rate.set_value(new_lr)
            if self.verbose:
                print(
                    f"Reduced learning rate to {new_lr} after {self.patience} iterations without improvement."
                )

    def in_cooldown(self):
        return self.cooldown_counter > 0

This callback can then be nicely combined with CheckParametersConvergence() for early stopping:

import matplotlib.pyplot as plt
import numpy as np
from pymc.variational.callbacks import Callback, CheckParametersConvergence, Tracker

# Toy data
length = 720
rng = np.random.default_rng(1337)
x = np.linspace(1e-2, 1, num=length)
true_regression_line = 5 * x + 4
y = true_regression_line + rng.normal(0, 1, size=length)
y[rng.integers(0, length, size=10)] += rng.normal(0, 4, size=10)
y = (y - y.mean()) / y.std()

# Model with early stopping and learning rate scheduling
with pm.Model() as lr_model:
    sigma = pm.HalfCauchy("sigma", beta=10)
    intercept = pm.Normal("intercept", 0, sigma=20)
    slope = pm.Normal("slope", 0, sigma=20)
    likelihood = pm.Normal("y", mu=intercept + slope * x, sigma=sigma, observed=y)

    learning_rate = pytensor.shared(1e-1, "learning_rate")
    optimiser = pm.adam(learning_rate=learning_rate)
    tracker = Tracker(lr=lambda: optimiser.keywords["learning_rate"].get_value())

    fit = pm.fit(
        obj_optimizer=optimiser,
        callbacks=[
            tracker,
            CheckParametersConvergence(),
            ReduceLROnPlateau(
                initial_learning_rate=learning_rate,
                factor=0.1,
                patience=10,
                min_lr=1e-4,
                cooldown=10,
                verbose=True,
            ),
        ],
        random_seed=1337,
    )

_, axs = plt.subplots(1, 2, figsize=(8, 4))
axs[0].plot(np.arange(len(fit.hist)), fit.hist)
axs[0].set_yscale("log")
axs[0].set_title("Fit history")
axs[1].plot(tracker.hist["lr"], ".-")
axs[1].set_yscale("log")
axs[1].set_title("LR")
plt.tight_layout()

fonnesbeck · 2023-11-14T13:51:17Z

This is a good strategy, thanks for taking the time to work on this. Feel free to open a pull request and move the discussion there.

jessegrabowski · 2023-11-14T14:16:54Z

+1 for a full PR.

My only concerns with the current approach are:

I don't think users should have to know to make the learning rate a shared variable to use the feature (this needs to be checked and handled automatically somehow)
Is this approach robust to arbitrary combinations of learning rate schedulers? If I want, for example, a cosine schedule combined with a linear decrease, can this be done using callbacks only?

fonnesbeck · 2023-12-01T02:02:36Z

Can most of this be abstracted to something like pm.rate_scheduler which is then passed to pm.fit?

jessegrabowski · 2023-12-01T21:31:08Z

@fonnesbeck we had some back-and-forth on discourse here about using callbacks (what was implemented in the PR) vs a wrapper around an optimizer (that is then passed to pm.fit).

alvaropp added the feature request label Nov 14, 2023

alvaropp mentioned this issue Nov 14, 2023

Added ReduceLROnPlateau callback for VI. #7011

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Implement learning rate scheduling for variational inference #7010

ENH: Implement learning rate scheduling for variational inference #7010

alvaropp commented Nov 14, 2023 •

edited

welcome bot commented Nov 14, 2023

alvaropp commented Nov 14, 2023

alvaropp commented Nov 14, 2023 •

edited

fonnesbeck commented Nov 14, 2023

jessegrabowski commented Nov 14, 2023

fonnesbeck commented Dec 1, 2023

jessegrabowski commented Dec 1, 2023

ENH: Implement learning rate scheduling for variational inference #7010

ENH: Implement learning rate scheduling for variational inference #7010

Comments

alvaropp commented Nov 14, 2023 • edited

Before

After

Context for the issue:

welcome bot commented Nov 14, 2023

alvaropp commented Nov 14, 2023

alvaropp commented Nov 14, 2023 • edited

fonnesbeck commented Nov 14, 2023

jessegrabowski commented Nov 14, 2023

fonnesbeck commented Dec 1, 2023

jessegrabowski commented Dec 1, 2023

alvaropp commented Nov 14, 2023 •

edited

alvaropp commented Nov 14, 2023 •

edited