## Table of Contents

This notebook provides a basic example of how to change the sampler in `Optuna`.  
As we saw in the theory section, a Bayesian optimization algorithm relies on two key components:  
a **surrogate model** and an **acquisition function**.  

Together, these define how new hyperparameter configurations are selected. In `Optuna`, this logic is encapsulated in the **sampler**, the core engine that drives the search process.

`Optuna` already implements several samplers ([see here](https://optuna.readthedocs.io/en/stable/reference/samplers/index.html)).  
For example, in the previous exercises, we used the `TPESampler`.

In this notebook, we’ll show how to switch the sampler. Specifically, we’ll explore the `ConfOptSampler`, which leverages ideas from **conformal prediction** and **quantile regression** to better model uncertainty during hyperparameter optimization.

Research has shown that this approach provides more robust and informed exploration of the search space.

If you are curious, you can check the details behind the `ConfOptSampler` in the resources below:

| Resource | Link |
|-----------|------|
| **Paper** | [arXiv: 2509.17051](https://www.arxiv.org/pdf/2509.17051) |
| **GitHub Repository** | [rick12000/confopt](https://github.com/rick12000/confopt) |
| **Optuna Implementation** | [ConfOptSampler on Optuna Hub](https://hub.optuna.org/samplers/confopt_sampler/) |

## Imports

In [1]:
from pathlib import Path
import sys
sys.path.insert(0, str(Path.cwd().parent))  # adjust .parent depth so 'src' is findable

In [2]:
import os
import optuna
import pandas as pd

from src.train_utils import retrieve_data_w_features

from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import root_mean_squared_error

  from .autonotebook import tqdm as notebook_tqdm


## Options

In [3]:
path_data = "../data/01_raw"

## Dataset

In [4]:
df = pd.read_parquet(os.path.join(path_data, "fremotor1prem0304.parquet"))
cols_to_drop = ["IDpol", "Year", "train_set", "val_set", "test_set", "big_train_set"]
categorical_features = [
    "DrivAge",
    "DrivGender",
    "MaritalStatus",
    "PayFreq",
    "JobCode",
    "VehClass",
    "VehPower",
    "VehGas",
    "VehUsage",
    "Garage",
    "Area",
    "Region",
    "Channel",
    "Marketing"
]

X_big_train, y_big_train = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="big_train_set")
X_train, y_train = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="train_set")
X_val, y_val = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="val_set")
X_test, y_test = retrieve_data_w_features(df=df, features_to_drop=cols_to_drop, split="test_set")

We will start by establishing a baseline using the improved search space from our starter notebook, using the `TPESampler`.

In [5]:
def training_objective(trial:optuna.trial.Trial) -> float:
    """Objective function for training and evaluating the model with given hyperparameters."""
    max_iter = trial.suggest_int("max_iter", 200, 500)
    learning_rate = trial.suggest_float("learning_rate", 0.015, 0.25, log=True)
    l2_regularization = trial.suggest_categorical("l2_regularization", [0.0, 0.1, 0.2, 0.5, 1.0])
    model = HistGradientBoostingRegressor(
        max_iter=max_iter,
        learning_rate=learning_rate,
        l2_regularization=l2_regularization,
        categorical_features=categorical_features,
        early_stopping=True,
        random_state=42,
    )
    model.fit(X=X_train, y=y_train, X_val=X_val, y_val=y_val)
    val_predictions = model.predict(X_val)
    return root_mean_squared_error(y_true=y_val, y_pred=val_predictions)

Let's train it for the `75` trials that we previously used.

In [6]:
from optuna.samplers import TPESampler

study_tpe = optuna.create_study(study_name="tpe_sampler",
                            direction="minimize",
                            sampler=TPESampler(
                                seed=42,
                                n_startup_trials=10,
                                multivariate=True,
                                group=True)
                            )

study_tpe.optimize(training_objective, n_trials=75)

[I 2025-11-02 19:00:36,413] A new study created in memory with name: tpe_sampler
[I 2025-11-02 19:00:37,644] Trial 0 finished with value: 107.0576705733172 and parameters: {'max_iter': 312, 'learning_rate': 0.21763079352547116, 'l2_regularization': 0.0}. Best is trial 0 with value: 107.0576705733172.
[I 2025-11-02 19:00:38,633] Trial 1 finished with value: 106.6685333430801 and parameters: {'max_iter': 460, 'learning_rate': 0.08138847002944712, 'l2_regularization': 0.2}. Best is trial 1 with value: 106.6685333430801.
[I 2025-11-02 19:00:41,362] Trial 2 finished with value: 107.05688025490234 and parameters: {'max_iter': 254, 'learning_rate': 0.025129498989955514, 'l2_regularization': 1.0}. Best is trial 1 with value: 106.6685333430801.
[I 2025-11-02 19:00:43,147] Trial 3 finished with value: 106.39973052514176 and parameters: {'max_iter': 241, 'learning_rate': 0.034123049219265345, 'l2_regularization': 0.2}. Best is trial 3 with value: 106.39973052514176.
[I 2025-11-02 19:00:45,161] Tr

And let's check the performance of the best combination of hyperparameters.

In [7]:
# Train final model with best hyperparameters obtained from TPE sampler
best_params_tpe = study_tpe.best_params
print(f"Best hyperparameters: {best_params_tpe}")
final_model_tpe = HistGradientBoostingRegressor(**best_params_tpe, random_state=42)

final_model_tpe.fit(X_big_train, y_big_train)
test_predictions = final_model_tpe.predict(X_test)
big_train_predictions_tpe = final_model_tpe.predict(X_big_train)
big_train_rmse_tpe = root_mean_squared_error(y_true=y_big_train, y_pred=big_train_predictions_tpe)
print(f"Big Train RMSE: {big_train_rmse_tpe}")
test_rmse_tpe = root_mean_squared_error(y_true=y_test, y_pred=test_predictions)
print(f"Test RMSE: {test_rmse_tpe}")

Best hyperparameters: {'max_iter': 220, 'learning_rate': 0.06705798514297966, 'l2_regularization': 0.1}
Big Train RMSE: 85.04468374968667
Test RMSE: 102.88186064924568


---

Let's now use the `ConfOptSampler` from the `optuna-hub`.  
This sampler is available through the hub because it was developed by an external contributor.  
This also illustrates that you can freely contribute to `Optuna` by sharing your own samplers.  
More information [here](https://optuna.github.io/optunahub/).

# ConfOpt sampler

In [8]:
import optunahub

In [9]:
# Set up sampler:
module = optunahub.load_module("samplers/confopt_sampler")
sampler_conf_opt = module.ConfOptSampler(
    # Search space below must match the one defined in the objective function:
    search_space={
        "max_iter": optuna.distributions.IntDistribution(200, 500),
        "learning_rate": optuna.distributions.FloatDistribution(0.015, 0.25, log=True),
        "l2_regularization": optuna.distributions.CategoricalDistribution([0.0, 0.1, 0.2, 0.5, 1.0]),
    },
    # Number of random searches before switching to inferential search:
    n_startup_trials=10,
)

# Run study:
study_conf_opt = optuna.create_study(study_name="conf_opt_sampler",
                            direction="minimize",
                            sampler=sampler_conf_opt)
study_conf_opt.optimize(training_objective, n_trials=75)

print(f"Best trial parameters: {study_conf_opt.best_trial.params}")
print(f"Best trial value: {study_conf_opt.best_trial.value}")

[I 2025-11-02 19:02:10,771] A new study created in memory with name: conf_opt_sampler
[I 2025-11-02 19:02:11,078] Trial 0 finished with value: 107.53292839498467 and parameters: {'max_iter': 331, 'learning_rate': 0.23984563832837805, 'l2_regularization': 0.5}. Best is trial 0 with value: 107.53292839498467.
[I 2025-11-02 19:02:11,541] Trial 1 finished with value: 106.55954793434479 and parameters: {'max_iter': 398, 'learning_rate': 0.10570610961256531, 'l2_regularization': 0.0}. Best is trial 1 with value: 106.55954793434479.
[I 2025-11-02 19:02:12,774] Trial 2 finished with value: 106.05831408474313 and parameters: {'max_iter': 251, 'learning_rate': 0.03963480885728645, 'l2_regularization': 1.0}. Best is trial 2 with value: 106.05831408474313.
[I 2025-11-02 19:02:13,298] Trial 3 finished with value: 106.49931746850187 and parameters: {'max_iter': 274, 'learning_rate': 0.09289820294150797, 'l2_regularization': 0.5}. Best is trial 2 with value: 106.05831408474313.
[I 2025-11-02 19:02:13

Best trial parameters: {'max_iter': 224, 'learning_rate': 0.04098422286764199, 'l2_regularization': 0.0}
Best trial value: 105.81200436644048


Now that we have determined the optimized hyperparameters, let's evaluate the model's performance on the test set.

In [10]:
# Train final model with best hyperparameters obtained from TPE sampler
best_params_conf_opt = study_conf_opt.best_params
print(f"Best hyperparameters: {best_params_conf_opt}")
final_model_conf_opt = HistGradientBoostingRegressor(**best_params_conf_opt, random_state=42)

final_model_conf_opt.fit(X_big_train, y_big_train)
test_predictions = final_model_conf_opt.predict(X_test)
big_train_predictions_conf_opt = final_model_conf_opt.predict(X_big_train)
big_train_rmse_conf_opt = root_mean_squared_error(y_true=y_big_train, y_pred=big_train_predictions_conf_opt)
print(f"Big Train RMSE: {big_train_rmse_conf_opt}")
test_rmse_conf_opt = root_mean_squared_error(y_true=y_test, y_pred=test_predictions)
print(f"Test RMSE: {test_rmse_conf_opt}")

Best hyperparameters: {'max_iter': 224, 'learning_rate': 0.04098422286764199, 'l2_regularization': 0.0}
Big Train RMSE: 88.47264431436533
Test RMSE: 102.77456710477264


Although the final performance with both samplers is nearly identical, this demonstrates how to easily switch between samplers in `Optuna`.  

Finally, let's visualize the optimization histories.

In [11]:
optuna.visualization.plot_optimization_history(study=[study_tpe, study_conf_opt], target_name="Validation RMSE")

Overall, it appears that the `ConfOptSampler` has a slight advantage over the `TPESampler`.