# How to tune hyperparameters with Optuna

This guide shows a minimal `optuna` ([documentation](https://optuna.org/)) loop for hyperparameter
tuning in `sbi`. Optuna is a lightweight hyperparameter optimization library. You define
an objective function that trains a model (e.g., NPE) and returns a validation metric,
and Optuna runs multiple trials to explore the search space and track the best
configuration. As validation metric, we recommend using the negative log probability of
a held-out validation set `(theta, x)` under the current posterior estimate (see
Lueckmann et al. 2021 for details). 

Note that Optuna is not a dependency of `sbi`, you need to install it yourself in your
environment. 

Here, we use a toy simulator and do `NPE` with an embedding network built using the `posterior_nn` helper. We tune just two hyperparameters: the embedding dimension and the number of flow transforms in an `nsf` density estimator.

## Setup a tiny simulation task

In [None]:
import optuna
import torch

from sbi.inference import NPE
from sbi.neural_nets import posterior_nn
from sbi.neural_nets.embedding_nets import FCEmbedding
from sbi.utils import BoxUniform

torch.manual_seed(0)


def simulator(theta):
    return theta + 0.1 * torch.randn_like(theta)


prior = BoxUniform(low=-2 * torch.ones(2), high=2 * torch.ones(2))

theta = prior.sample((6000,))
x = simulator(theta)
# Use a separate validation data set for optuna
theta_train, x_train = theta[:5000], x[:5000]
theta_val, x_val = theta[5000:], x[5000:]

## Define the Optuna objective

Optuna expects the objective function to return a scalar value that it will optimize. When creating a study, you specify the optimization direction: `direction="minimize"` to find the configuration with the lowest objective value, or `direction="maximize"` for the highest. Here, we minimize the negative log probability (NLL) on a held-out validation set, so lower is better.

In [None]:
def objective(trial):
    # Optuna will track these parameters internally.
    embedding_dim = trial.suggest_categorical("embedding_dim", [16, 32, 64])
    num_transforms = trial.suggest_int("num_transforms", 2, 6)

    embedding_net = FCEmbedding(input_dim=x_train.shape[1], output_dim=embedding_dim)
    density_estimator = posterior_nn(
        model="nsf",
        embedding_net=embedding_net,
        num_transforms=num_transforms,
    )

    inference = NPE(prior=prior, density_estimator=density_estimator)
    inference.append_simulations(theta_train, x_train)
    estimator = inference.train(
        training_batch_size=128,
        show_train_summary=False,
    )
    posterior = inference.build_posterior(estimator)

    with torch.no_grad():
        nll = -posterior.log_prob_batched(theta_val.unsqueeze(0), x=x_val).mean().item()
    # Return the metric to be optimized by Optuna.
    return nll

## Run the study and retrain

Optuna defaults to the TPE (Tree-structured Parzen Estimator) sampler, which is a good starting point for many experiments. TPE is a Bayesian optimization method that
models good vs. bad trials with nonparametric densities and samples new points
that are likely to improve the objective. You can swap in other samplers (random
search, Gaussian Process-based, etc.) by passing a different sampler instance to `create_study`.

The TPE sampler uses `n_startup_trials` random trials to seed the model. With
`n_trials=25` and `n_startup_trials=10`, the first 10 trials are random and the
remaining 15 are guided by the acquisition function. If you want to ensure to start at
the default configuration, _enqueue_ it before optimization.

In [None]:
sampler = optuna.samplers.TPESampler(n_startup_trials=10)
study = optuna.create_study(direction="minimize", sampler=sampler)
# Optional: ensure the default config is evaluated
study.enqueue_trial({"embedding_dim": 32, "num_transforms": 4})
# This will run the above NPE training up to 25 times
study.optimize(objective, n_trials=25)

best_params = study.best_params
embedding_net = FCEmbedding(
    input_dim=x_train.shape[1],
    output_dim=best_params["embedding_dim"],
)
density_estimator = posterior_nn(
    model="nsf",
    embedding_net=embedding_net,
    num_transforms=best_params["num_transforms"],
)

inference = NPE(prior=prior, density_estimator=density_estimator)
inference.append_simulations(theta, x)
final_estimator = inference.train(training_batch_size=128)
posterior = inference.build_posterior(final_estimator)