## Table of Contents

This notebook provides basic examples of how to use Optuna for hyperparameter tuning. The following sections explain de step-by-step procedure:

1. [Defining the optimization problem: search space and objective](#1-defining-the-optimization-problem-search-space-and-objective)  
2. [First touch with Optuna for optimization](#2-first-touch-with-optuna-for-optimization)
3. [Analyzing the optimization results](#3-analyzing-the-optimization-results)
4. [Setting up baselines with enqueue trials](#4-setting-up-baselines-with-enqueue-trials)

## Imports

In [1]:
from pathlib import Path
import sys
sys.path.insert(0, str(Path.cwd().parent))  # adjust .parent depth so 'src' is findable

In [2]:
import os
import optuna
import pandas as pd

from src.train_utils import retrieve_data_w_features

from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import root_mean_squared_error

  from .autonotebook import tqdm as notebook_tqdm


## Options

In [3]:
path_data = "../data/01_raw"

## Dataset

In [4]:
df = pd.read_parquet(os.path.join(path_data, "fremotor1prem0304.parquet"))
cols_to_drop = ["IDpol", "Year", "train_set", "val_set", "test_set", "big_train_set"]
categorical_features = [
    "DrivAge",
    "DrivGender",
    "MaritalStatus",
    "PayFreq",
    "JobCode",
    "VehClass",
    "VehPower",
    "VehGas",
    "VehUsage",
    "Garage",
    "Area",
    "Region",
    "Channel",
    "Marketing"
]

X_big_train, y_big_train = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="big_train_set")
X_train, y_train = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="train_set")
X_val, y_val = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="val_set")
X_test, y_test = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="test_set")

## 1. Defining the Optimization Problem: Search Space and Objective

To use Optuna, we must define two main elements:

1. **The search space**:  
   The set of hyperparameters over which Optuna will explore different configurations.

2. **The objective function**:  
   A metric that determines how good a given set of hyperparameters is.  
   Optuna will try to optimize this metric by running multiple trials.

---

In this example, we will use a `HistGradientBoostingRegressor` trained on our dataset.

### Hyperparameters to Optimize
We define a search space including:

- `max_iter` — *(int, discrete)*: Maximum number of boosting iterations.
- `learning_rate` — *(float, continuous)*: Shrinks the contribution of each new tree.  
- `l2_regularization` — *(float, continuous)*: Penalizes tree complexity to reduce overfitting.

> 📌 For the sake of the example in this tutorial, l2_regularization is treated as categorical choices.

---

### Optimization Objective

🎯 The goal is to **minimize** the **validation RMSE** (Root Mean Squared Error).

A lower RMSE indicates a model that predicts the target variable more accurately.


In [5]:
def training_objective(trial:optuna.trial.Trial) -> float:
    """Objective function for training and evaluating the model with given hyperparameters."""
    max_iter = trial.suggest_int("max_iter", 10, 500)
    learning_rate = trial.suggest_float("learning_rate", 0.001, 0.9, log=True)
    l2_regularization = trial.suggest_categorical("l2_regularization", [0.0, 0.1, 0.2, 0.5, 1.0])
    model = HistGradientBoostingRegressor(
        max_iter=max_iter,
        learning_rate=learning_rate,
        l2_regularization=l2_regularization,
        categorical_features=categorical_features,
        early_stopping=True,
        random_state=42,
    )
    model.fit(X=X_train, y=y_train, X_val=X_val, y_val=y_val)
    val_predictions = model.predict(X_val)
    return root_mean_squared_error(y_true=y_val, y_pred=val_predictions)

### Understanding the Sample Space and Objective Definition

#### Sample space

In Optuna, the sample space is defined using different `trial.suggest_*` methods depending on the type of hyperparameter.

There are **three main types**:

1. **Discrete hyperparameters**  
   Use `trial.suggest_int()`.  
   It samples integer values from a finite range.  
   Example:  
   `trial.suggest_int("max_iter", 10, 500)` → samples an integer between **1 and 10**.

2. **Continuous hyperparameters**  
   Use `trial.suggest_float()`.  
   This samples real values from a continuous interval.  
   Example:  
   `trial.suggest_float("learning_rate", 0.001, 0.9)` → samples a real value in **[0.001, 0.9]**.

Two optional arguments may be added for discrete and continuous hyperparameters:
- `step` → forces a discretization of the interval
- `log=True` → performs sampling on a **logarithmic scale**, favoring smaller values

⚠️ `step` and `log=True` **cannot** be used together.

3. **Categorical hyperparameters**  
   Use `trial.suggest_categorical()`.  
   It samples from a predefined set of values.  
   Example:  
   `trial.suggest_categorical("l2_regularization", [0.0, 0.1, 0.2, 0.5, 1.0])`.

#### Objective

The objective is the value returned by the `training_objective()` function.  
In our case, this corresponds to the **validation RMSE**: the lower it is, the better the configuration.

Since Optuna uses this value to guide the optimization process, the `training_objective()` function must always return a single numeric score — here, the RMSE — so Optuna can correctly compare and rank different trials.

⚠️ Optuna can also perform multi-objective optimization. This is not covered in this workshop, you can refer to their [documentation](https://optuna.readthedocs.io/en/stable/tutorial/20_recipes/002_multi_objective.html)

## 2. First touch with Optuna for optimization

So we have defined both our sample space and our objective. ✅

### Creating and Running the Optuna Study

We can now **create and run the Optuna study** to start hyperparameter optimization.

We use `optuna.create_study()` and specify:
- the **study name**,
- the **optimization direction** (**"minimize"** since we aim for the lowest RMSE),
- and the **sampler** that explores the search space.

In this example, we use the **TPE (Tree-structured Parzen Estimator) sampler**.  
We set a random seed for reproducibility and `n_startup_trials=10`, so the first 10 trials are sampled randomly before the TPE model begins exploiting previous results.  
We also set `multivariate=True` and `group=True` to allow Optuna to capture potential interactions between hyperparameters. See [here](https://medium.com/optuna/multivariate-tpe-makes-optuna-even-more-powerful-63c4bfbaebe2) for more information

📚 Further information about the TPE algorithm can be found [here](https://arxiv.org/abs/2304.11127).

Finally, we run the optimization with `study.optimize()`, passing our `training_objective` function and the number of trials to execute.


In [6]:
from optuna.samplers import TPESampler

study = optuna.create_study(study_name="basic_hgb_opt",
                            direction="minimize",
                            sampler=TPESampler(
                                seed=42,
                                n_startup_trials=10,
                                multivariate=True,
                                group=True)
                            )

study.optimize(training_objective, n_trials=100)

[I 2025-10-30 09:02:47,134] A new study created in memory with name: basic_hgb_opt
[I 2025-10-30 09:02:47,338] Trial 0 finished with value: 118.00771720173168 and parameters: {'max_iter': 193, 'learning_rate': 0.6436364315084161, 'l2_regularization': 0.0}. Best is trial 0 with value: 118.00771720173168.
[I 2025-10-30 09:02:48,346] Trial 1 finished with value: 106.59748624327959 and parameters: {'max_iter': 435, 'learning_rate': 0.05968147125797707, 'l2_regularization': 0.2}. Best is trial 1 with value: 106.59748624327959.
[I 2025-10-30 09:02:48,970] Trial 2 finished with value: 182.32543556526153 and parameters: {'max_iter': 99, 'learning_rate': 0.003481940932030817, 'l2_regularization': 1.0}. Best is trial 1 with value: 106.59748624327959.
[I 2025-10-30 09:02:49,522] Trial 3 finished with value: 165.04456208936122 and parameters: {'max_iter': 78, 'learning_rate': 0.007295686093122071, 'l2_regularization': 0.2}. Best is trial 1 with value: 106.59748624327959.
[I 2025-10-30 09:02:51,210

Now that the optimization is complete, we select the **best hyperparameter configuration** found by Optuna.  
We then train a model using these hyperparameters on the **combined training and validation sets** and evaluate its performance on the **test set** to obtain an unbiased estimate of its generalization ability.

In [7]:
# Train final model with best hyperparameters
best_params = study.best_params
print(f"Best hyperparameters: {best_params}")
final_model = HistGradientBoostingRegressor(**best_params, random_state=42)

final_model.fit(X_big_train, y_big_train)
test_predictions = final_model.predict(X_test)
big_train_predictions = final_model.predict(X_big_train)
big_train_rmse = root_mean_squared_error(y_true=y_big_train, y_pred=big_train_predictions)
print(f"Big Train RMSE: {big_train_rmse}")
test_rmse = root_mean_squared_error(y_true=y_test, y_pred=test_predictions)
print(f"Test RMSE: {test_rmse}")

Best hyperparameters: {'max_iter': 419, 'learning_rate': 0.04415436713676083, 'l2_regularization': 0.5}
Big Train RMSE: 90.9639765485987
Test RMSE: 103.2315901911527


We can examine the **best hyperparameter configuration** along with the **training and test scores**.  
These results indicate that there is overfitting, which we will attempt to reduce in the assignment.

## 3. Analyzing the optimization results

We now have a trained model, but several questions about the optimization process may arise, such as:

1. How did the model performance evolve across the trials?
2. Which hyperparameters had the greatest impact on the defined objective?
3. How relevant was the definition of the search space?

In this section, we will present techniques and visualizations to help answer these questions.

---

1. **How did the model performance evolve across the trials?**

To answer this question, we use the `optuna.visualization.plot_optimization_history()` function.  
It allows us to visualize how the **objective value** and the **best score so far** evolved over the course of the trials, giving insight into the convergence and efficiency of the optimization process.

In [9]:
optuna.visualization.plot_optimization_history(study=study, target_name="Validation RMSE")

We can observe that some trials have a very high error at the beginning, during the **pure exploration stage**.  
After these initial trials, the performance improves consistently as Optuna starts exploiting information from previous results. However, the score shows little improvement after around **73 trials**.

This graph also helps us assess whether the **number of trials** is sufficient or possibly higher than needed for the optimization to converge.

2. **Which hyperparameters had the greatest impact on the defined objective?**

To answer this, we use `optuna.visualization.plot_param_importances()`.  
Under the hood, it relies on a **fANOVA importance evaluator**, which trains a RandomForest to predict the objective scores and identifies the hyperparameters that contributed most to performance variations.

In [10]:
optuna.visualization.plot_param_importances(study=study)

We can see that **`learning_rate`** is by far the most important hyperparameter.  
**`max_iter`** ranks second, and **`l2_regularization`** has the least impact on the defined objective.

**How relevant was the definition of the search space?**

To answer this question, we can use `optuna.visualization.plot_contour()`, which generates **contour plots** showing how the objective value varies with different hyperparameter values.

By default, the function returns a contour plot for **each pair of hyperparameters**, but you can also provide **specific pairs** to focus on the relationship between particular parameters.

In [11]:
# Contour plot to see the relationship between hyperparameters and objective value
optuna.visualization.plot_contour(study=study).update_layout(width=900, height=900)

The graph above can quickly become difficult to read when there are many hyperparameters.  
To simplify, we can focus on the **two most important hyperparameters**: `max_iter` and `learning_rate`.

In [12]:
optuna.visualization.plot_contour(study=study, params=["max_iter", "learning_rate"])

We can observe a region where `learning_rate < 0.015` in which the objective value is high.  
To improve the optimization process, we could **increase the lower bound of the learning rate to 0.015** instead of 0.001.

Additionally, we notice that trials with `max_iter < 200` rarely lead to good performance.  
We could further **restrict the search space** by adjusting the lower bound of `max_iter`.

These adjustments would guide Optuna to focus on regions with better performance, allowing it to explore the most promising areas in more detail.

We can also perform a quick validation using `optuna.visualization.plot_rank()` to identify which hyperparameter ranges consistently provide the best results.


In [13]:
optuna.visualization.plot_rank(study=study, params=["learning_rate","max_iter"])

## 4. Setting up baselines with enqueue trials

### Key Discoveries So Far

1. The current optimization process improves the model but introduces some **overfitting**.  
2. We could **reduce the number of trials** and still achieve similar performance.  
3. Certain hyperparameters have a greater impact on the objective than others, notably `learning_rate` and `max_iter`.  
4. The **search space** for `learning_rate` and `max_iter` could be narrowed to avoid low values that lead to poor performance.  

---

However, one important question remains:

**How much did hyperparameter optimization improve the model compared to a default configuration?**

To answer this, we need a **baseline**. Optuna provides a feature for this purpose: **enqueue trials**, which allows us to explicitly evaluate default or reference hyperparameter configurations.


**Enqueue Trials**

Enqueue trials allow us to set **specific hyperparameter configurations** to be evaluated during the optimization process.

Optuna will **evaluate all enqueued trials first** before proceeding with its standard sampling strategy.  
Here, we use this feature to test the performance of the **default hyperparameters** of the `HistGradientBoostingRegressor`.

Additionally, we can enqueue another trial using the **best hyperparameters found in a previous optimization run**.  
This provides Optuna with a **benchmark** to compare against during the current study and offers **insight into which regions of the search space previously yielded good results**.


**How to do it?**

To enqueue trials, simply call the `enqueue_trial()` method from the study’s instance, passing a dictionary that contains the hyperparameters you want to **force** during evaluation.

Example:
```python
study.enqueue_trial({
    "max_iter": default_params["max_iter"],
    "learning_rate": default_params["learning_rate"],
    "l2_regularization": default_params["l2_regularization"]
})

**Quick assignment:**  
Modify the search space for `learning_rate` and `max_iter` to better explore the regions where performance is suboptimal.
This adjustment will help Optuna search beyond the current local optima and potentially discover configurations that further reduce the RMSE.

**Bonus:**  
If you wish, you can **add new hyperparameters to explore** in your search space to potentially improve model performance.  
Refer to the official [`HistGradientBoostingRegressor` documentation](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingRegressor.html) for the full list of available hyperparameters and their descriptions.

<details>
<summary>Click to reveal the answer</summary>

```python
def training_objective_improved(trial:optuna.trial.Trial) -> float:
    max_iter = trial.suggest_int("max_iter", 200, 500)
    learning_rate = trial.suggest_float("learning_rate", 0.015, 0.25, log=True)
    l2_regularization = trial.suggest_categorical("l2_regularization", [0.0, 0.1, 0.2, 0.5, 1.0])
    model = HistGradientBoostingRegressor(
        max_iter=max_iter,
        learning_rate=learning_rate,
        l2_regularization=l2_regularization,
        categorical_features=categorical_features,
        early_stopping=True,
        random_state=42,
    )
    model.fit(X=X_train, y=y_train, X_val=X_val, y_val=y_val)
    val_predictions = model.predict(X_val)
    return root_mean_squared_error(y_true=y_val, y_pred=val_predictions)

In [None]:
# Before jumping to the enqueuing of trials, we can also define an improved objective function that
# limits the search space based on insights from the previous study
def training_objective_improved(trial:optuna.trial.Trial) -> float:

    ###
    # YOUR CODE HERE: START
    ###
    
    ###
    # YOUR CODE HERE: END
    ###
    model.fit(X=X_train, y=y_train, X_val=X_val, y_val=y_val)
    val_predictions = model.predict(X_val)
    return root_mean_squared_error(y_true=y_val, y_pred=val_predictions)

**Quick assignment:**  
Find a way to programmatically retrieve the **default hyperparameters** from the model and store them in a dictionary named `default_params`.

*Hint:* You can use the `get_params()` method of the `HistGradientBoostingRegressor` class to extract all default parameter values.

<details>
<summary>Click to reveal the answer</summary>

```python
default_model = HistGradientBoostingRegressor()
default_params = default_model.get_params()

In [None]:
# Initialize a dummy model to retrieve default hyperparameters
###
# YOUR CODE HERE: START
###

###
# YOUR CODE HERE: END
###
print(f"Default hyperparameters: {default_params}")

Default hyperparameters: {'categorical_features': 'from_dtype', 'early_stopping': 'auto', 'interaction_cst': None, 'l2_regularization': 0.0, 'learning_rate': 0.1, 'loss': 'squared_error', 'max_bins': 255, 'max_depth': None, 'max_features': 1.0, 'max_iter': 100, 'max_leaf_nodes': 31, 'min_samples_leaf': 20, 'monotonic_cst': None, 'n_iter_no_change': 10, 'quantile': None, 'random_state': None, 'scoring': 'loss', 'tol': 1e-07, 'validation_fraction': 0.1, 'verbose': 0, 'warm_start': False}


**Quick assignment:**  

1. Create a new variable `study_w_baseline` containing an instance of an Optuna study named `"baseline_hgb_opt"` that will be used with the new objective.  
2. **Enqueue** the trials for both:  
   - The **default hyperparameters** of the model.  
   - The **best hyperparameters** obtained from the previous study.  
3. Launch the optimization process using the **`training_objective_improved`** function.

<details>
<summary>1. Click to reveal the answer</summary>

```python
study_w_baseline = optuna.create_study(study_name="baseline_hgb_opt",
                            direction="minimize",
                            sampler=TPESampler(
                                seed=42,
                                n_startup_trials=10,
                                multivariate=True,
                                group=True)
                            )

<details>
<summary>2. Click to reveal the answer</summary>

```python
study_w_baseline.enqueue_trial({
    "max_iter": default_params["max_iter"],
    "learning_rate": default_params["learning_rate"],
    "l2_regularization": default_params["l2_regularization"]
})

# Enqueue trial with best hyperparameters from previous study
study_w_baseline.enqueue_trial({
    "max_iter": best_params["max_iter"],
    "learning_rate": best_params["learning_rate"],
    "l2_regularization": best_params["l2_regularization"]
})

<details>
<summary>3. Click to reveal the answer</summary>

```python
study_w_baseline.optimize(training_objective_improved, n_trials=75)

In [None]:
# Create a new study for baseline comparisons
###
# YOUR CODE HERE: START
###

###
# YOUR CODE HERE: END
###

# Enqueue trials with default hyperparameters. We only need to enqueue the hyperparameters that were part of the optimization
# Namely: max_iter, learning_rate, l2_regularization
###
# YOUR CODE HERE: START
###

###
# YOUR CODE HERE: END
###

###
# YOUR CODE HERE: START
###

###
# YOUR CODE HERE: END
###


Argument ``multivariate`` is an experimental feature. The interface can change in the future.


Argument ``group`` is an experimental feature. The interface can change in the future.

[I 2025-10-30 10:00:07,976] A new study created in memory with name: baseline_hgb_opt

Fixed parameter 'max_iter' with value 100 is out of range for distribution IntDistribution(high=500, log=False, low=200, step=1).

[I 2025-10-30 10:00:08,655] Trial 0 finished with value: 107.15886097447702 and parameters: {'max_iter': 100, 'learning_rate': 0.1, 'l2_regularization': 0.0}. Best is trial 0 with value: 107.15886097447702.
[I 2025-10-30 10:00:10,326] Trial 1 finished with value: 105.82181081240428 and parameters: {'max_iter': 419, 'learning_rate': 0.04415436713676083, 'l2_regularization': 0.5}. Best is trial 1 with value: 105.82181081240428.
[I 2025-10-30 10:00:10,697] Trial 2 finished with value: 107.0576705733172 and parameters: {'max_iter': 312, 'learning_rate': 0.21763079352547116, 'l2_regularization':

**Quick assignment:**  
Use `optuna.visualization.plot_optimization_history` to **compare both studies** — the original one and the one with enqueued trials — in order to visualize how the optimization evolved across trials and assess the effect of the enqueued configurations.

<details>
<summary>Click to reveal the answer</summary>

```python
optuna.visualization.plot_optimization_history(study=[study, study_w_baseline], target_name="Validation RMSE")

In [None]:
###
# YOUR CODE HERE: START
###

###
# YOUR CODE HERE: END
###

The results of the second study with enqueued trials show that the model using **default hyperparameters** performs worse than the best model from the previous study, as expected.  

Model performance has **plateaued**: with the current search space, we cannot further improve the RMSE. Two main conclusions can be drawn:

1. The **Bayesian optimization process** effectively improves performance compared to the default configuration.  
2. However, with the **current search space**, the improvement over the default configuration remains limited and performance has plateaued — suggesting the need to **expand the search space**.

**Bonus questions:**  

When working with enqueue trials, what happens if:  

1. You do not provide a fixed value for a hyperparameter in the search space?  
2. You provide a value for a hyperparameter that is not in the search space?

<details>
<summary>1. Click to reveal the answer</summary>

**If you do not provide a fixed value for a hyperparameter in the search space**  
   → Optuna will sample a value for that hyperparameter according to the defined search space.  
   In other words, only the parameters explicitly provided in the `enqueue_trial()` dictionary are fixed — the others remain free to be optimized.

</details>

<details>
<summary>2. Click to reveal the answer</summary>

**If you provide a value for a hyperparameter that is not in the search space**  
   → Optuna will **ignore** that parameter.  
   Since it does not exist in the search space defined inside the objective function, it will not affect the trial execution.  
   However, the parameter will still appear in the trial’s record, but it won’t be used in the model training or evaluation.

</details>