# Hyperparameter Tuning
## Calibrating the learning rate of the Gradient Descent optimiser in PyBOP

In this notebook, we calibrate the learning rate for the gradient descent optimiser on a parameter identification problem. The gradient descent learning rate is taken as the `sigma` value passed via the `pybop.PintsOptions` to the `pybop.GradientDescent` class.

### Setting up the Environment

If you don't already have PyBOP installed, check out the [installation guide](https://pybop-docs.readthedocs.io/en/latest/installation.html) first.

We begin by importing the necessary libraries. Let's also fix the random seed to generate consistent output during development.

In [None]:
import numpy as np
import pybamm

import pybop

pybop.plot.PlotlyManager().pio.renderers.default = "notebook_connected"

np.random.seed(8)  # users can remove this line

## Generating Synthetic Data

To demonstrate parameter estimation, we first need some data. We will generate synthetic data using the single particle model (SPM) as the forward model, which requires defining the parameter values and the model itself.

In [None]:
model = pybamm.lithium_ion.SPM()
parameter_values = pybamm.ParameterValues("Chen2020")
parameter_values.update(
    {
        "Negative electrode active material volume fraction": 0.65,
        "Positive electrode active material volume fraction": 0.51,
    }
)
parameter_values.set_initial_state(0.4)
experiment = pybamm.Experiment(
    [
        "Discharge at 0.5C for 6 minutes (5 second period)",
        "Charge at 0.5C for 6 minutes (5 second period)",
    ]
    * 2
)
sim = pybamm.Simulation(model, parameter_values=parameter_values, experiment=experiment)
sol = sim.solve()

To make the parameter estimation more realistic, we add Gaussian noise to the data. The dataset for optimisation is composed of time, current, and the noisy voltage data:

In [None]:
sigma = 0.002  # 2 mV
corrupt_values = sol["Voltage [V]"].data + np.random.normal(
    0, sigma, len(sol["Voltage [V]"].data)
)

dataset = pybop.Dataset(
    {
        "Time [s]": sol["Time [s]"].data,
        "Current function [A]": sol["Current [A]"].data,
        "Voltage [V]": corrupt_values,
    }
)

## Identifying the parameters

We select the parameters for estimation and set up their prior distributions and bounds:

In [None]:
parameters = [
    pybop.Parameter(
        "Negative electrode active material volume fraction",
        prior=pybop.Uniform(0.45, 0.7),
    ),
    pybop.Parameter(
        "Positive electrode active material volume fraction",
        prior=pybop.Uniform(0.45, 0.7),
    ),
]

### Setting up the problem with an unsuitable sigma value

With the datasets and parameters defined, we can set up the optimisation problem and the optimiser. For gradient descent, the `sigma` value corresponds to the learning rate. Let's begin by setting this hyperparmeter to be quite small.

In [None]:
builder = (
    pybop.builders.Pybamm()
    .set_dataset(dataset)
    .set_simulation(model, parameter_values=parameter_values)
    .add_cost(pybop.costs.pybamm.SumSquaredError("Voltage [V]"))
)
for param in parameters:
    builder.add_parameter(param)

problem = builder.build()

options = pybop.PintsOptions(
    sigma=0.01,
    max_iterations=100,
)
optim = pybop.GradientDescent(problem, options=options)

NOTE: Boundaries ignored by <class 'pybop.optimisers._gradient_descent.GradientDescentImpl'>


We proceed to run the optimiser with the given learning rate (`sigma`). After the optimisation, we can examine the estimated parameter values. In this case, the optimised values differ from the ground truth values.

In [None]:
results = optim.run()

print("True values:", [parameter_values[p.name] for p in parameters])
print("Estimates:", results.x)

True values: [0.65, 0.51]
Estimates: [0.61444896 0.54451995]


## Calibrating the Learning Rate

Now that we've seen how an unsuitable `sigma` value prevents the optimiser from converging within the maximum number of iterations, let's calibrate this value to find the optimal solution using fewer iterations.

In [None]:
sigmas = np.linspace(0.02, 0.62, 4)
optims = []
results = []
for sigma in sigmas:
    print("Sigma:", sigma)
    options = pybop.PintsOptions(sigma=sigma, max_iterations=100)
    optim = pybop.GradientDescent(problem, options=options)
    res = optim.run()

    optims.append(optim)
    results.append(res)

Sigma: 0.02
NOTE: Boundaries ignored by <class 'pybop.optimisers._gradient_descent.GradientDescentImpl'>


Sigma: 0.21999999999999997
NOTE: Boundaries ignored by <class 'pybop.optimisers._gradient_descent.GradientDescentImpl'>


Sigma: 0.42
NOTE: Boundaries ignored by <class 'pybop.optimisers._gradient_descent.GradientDescentImpl'>


Sigma: 0.62
NOTE: Boundaries ignored by <class 'pybop.optimisers._gradient_descent.GradientDescentImpl'>


In [None]:
for sigma, res in zip(sigmas, results, strict=False):
    print(
        f"| Sigma: {sigma} | Num Iterations: {res.n_iterations} | Best Cost: {res.best_cost} | Results: {res.x} |"
    )

| Sigma: 0.02 | Num Iterations: 100 | Best Cost: 0.0017014545830233317 | Results: [0.6210766  0.53404141] |
| Sigma: 0.21999999999999997 | Num Iterations: 61 | Best Cost: 0.0012837254827783552 | Results: [0.64710847 0.51304159] |
| Sigma: 0.42 | Num Iterations: 45 | Best Cost: 0.0012815375152771392 | Results: [0.64906898 0.51155419] |
| Sigma: 0.62 | Num Iterations: 22 | Best Cost: 0.0018657701421383906 | Results: [0.62556616 0.51225597] |


From these results, we can see that `sigma=0.42` returns the best cost value by balancing fast convergence with a small enough step size to avoid jumping over fine changes in the landscape.

### Cost Landscapes

An additional way to view this information is to plot the optimiser trace on the cost landscape.

In [None]:
# Plot the cost landscape with optimisation path and updated bounds
bounds = np.array([[0.4, 0.8], [0.4, 0.8]])
for optim, sigma in zip(optims, sigmas, strict=False):
    pybop.plot.surface(optim, bounds=bounds, title=f"Sigma: {sigma}")

### Concluding thoughts

This notebook covers how to calibrate the learning rate for the gradient descent optimiser, thus providing an introduction to hyperparameter tuning.

We have shown how a small learning rate impedes the optimiser by imposing a small step size that requires many iterations to traverse the search space. A larger learning rate can provide the best performance, but too large a learning rate can cause the gradient descent algorithm to diverge (as shown in the last plot).