# **Hypertuning with Optuna**

**Optuna** is an open-source, hyperparameter optimization framework designed for machine learning and deep learning models. It automates the search for optimal hyperparameters, helping data scientists and machine learning engineers tune models more efficiently. Optuna is flexible, scalable, and supports both sequential and parallel optimization, allowing it to adapt to different scales of problems.

Optuna’s key strength is its define-by-run approach, where hyperparameter configurations are dynamically constructed during each trial, making it more efficient and flexible than traditional grid or random search methods.

### Why is Optuna Important?

Tuning hyperparameters is a critical part of improving machine learning models’ performance, but it can be extremely time-consuming, especially for complex models like deep neural networks or ensembles. Optuna helps by:

- **Automating hyperparameter search**: Reducing the manual effort required to find optimal settings.

- **Efficient search**: Optuna uses sophisticated algorithms like Tree-structured Parzen Estimator (TPE) and CMA-ES (Covariance Matrix Adaptation Evolution Strategy) to explore the hyperparameter space more intelligently than grid or random search.

- **Early stopping**: It includes pruning to stop trials that are not promising early, saving time and computational resources.

- **Scalability**: Optuna can be easily scaled to large clusters or cloud environments, making it practical for tuning complex models.

### How does Optuna Hypertuning Work?

Optuna performs hyperparameter optimization through an iterative process called trials, where each trial represents a single set of hyperparameters and their corresponding evaluation. Optuna tries to minimize (or maximize) the objective function defined by the user. Here’s how it works:

**1. Define the Objective Function**: The objective function evaluates the model’s performance using a given set of hyperparameters. It returns a metric (like accuracy, loss, etc.) to minimize (or maximize).

**2. Sampling Hyperparameters**: Optuna uses advanced optimization algorithms like TPE or CMA-ES to select hyperparameter values for each trial. It doesn’t just randomly choose values but uses past trial results to guide the search toward more promising regions of the hyperparameter space.

**3. Run Trials**: Optuna evaluates the model with the selected hyperparameters in each trial and records the objective value (e.g., validation accuracy or loss).

**4. Pruning**: If a trial is not performing well, Optuna can prune (i.e., stop) it early to save resources and move on to more promising hyperparameter combinations.

**5. Repeat**: Optuna repeats the trial process multiple times (as defined by the user) until it finds the best set of hyperparameters or the predefined number of trials is completed.

### How does Optuna Compare to Other Hyperparameter Optimization Methods?

| Method | Pros | Cons |
| ------ | ---- | ---- |
| Grid Search | Exhaustively searches through predefined values. | Computationally expensive. Limited by pre-defined grid. |
| Random Search | Explores the search space randomly and is simpler than grid. | Inefficient for large search spaces. May miss optimal values. |
| Bayesian Search | More efficient by modeling the function being optimized. | Requires more complex setup. Slower for high-dimensional spaces. |
| Optuna | Define-by-run, efficient search with pruning and parallelism. | More complex to set up than simple random or grid search. |


### When should you use Optuna?

- **Hyperparameter Optimization for Complex Models**: If you are working with complex machine learning models like Gradient Boosting, Deep Neural Networks, or ensembles, where manual tuning is impractical.

- **Large Hyperparameter Search Space**: When the number of hyperparameters and possible values is large, Optuna’s efficiency makes it superior to grid or random search.

- **Need for Resource Efficiency**: Optuna’s ability to prune underperforming trials helps save computational time and resources, making it ideal for long-running tasks or expensive-to-evaluate models.

### Who uses Optuna?

- **Machine Learning Engineers and Data Scientists**: Optuna is widely adopted by professionals working on machine learning projects where hyperparameter optimization is crucial for achieving high performance.

- **Deep Learning Researchers**: Researchers often use Optuna for neural network hyperparameter tuning, as it efficiently searches through learning rates, optimizers, and architectures.

- **Kaggle Competitors**: Competitors in machine learning competitions use Optuna to gain an edge by finding the best possible hyperparameters.

### Key Features of Optuna:

**1. Define-by-Run**: Hyperparameter space is defined dynamically during the execution of the trials, allowing for flexibility in how you construct and explore the space.

**2. Pruning**: Automatically stops unpromising trials to save computation time and focus on better-performing hyperparameter combinations.

**3. Parallelism**: Optuna supports running multiple trials in parallel across different CPU or GPU resources, speeding up the search process.

**4. Visualization**: Provides built-in tools for visualizing optimization history, parameter importance, and more, making it easier to analyze the optimization process.

### Advantages of Optuna:

- **Efficient Optimization**: Optuna uses state-of-the-art algorithms like TPE for efficient hyperparameter search, making it faster than random or grid search.

- **Dynamic Construction**: Unlike grid search, Optuna builds the hyperparameter space dynamically, making it more flexible.

- **Automatic Pruning**: Unpromising trials are stopped early, saving computation time.
Parallelization: Optuna can parallelize trials to speed up the optimization process.

- **Visualization Tools**: Optuna includes tools to visualize the optimization process, which helps in understanding how hyperparameters impact model performance.

In [34]:
import pandas as pd 
import seaborn as sns
import optuna
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, root_root_mean_squared_error, r2_score

healthexp = sns.load_dataset('healthexp')
healthexp = pd.get_dummies(healthexp)
healthexp

Unnamed: 0,Year,Spending_USD,Life_Expectancy,Country_Canada,Country_France,Country_Germany,Country_Great Britain,Country_Japan,Country_USA
0,1970,252.311,70.6,False,False,True,False,False,False
1,1970,192.143,72.2,False,True,False,False,False,False
2,1970,123.993,71.9,False,False,False,True,False,False
3,1970,150.437,72.0,False,False,False,False,True,False
4,1970,326.961,70.9,False,False,False,False,False,True
...,...,...,...,...,...,...,...,...,...
269,2020,6938.983,81.1,False,False,True,False,False,False
270,2020,5468.418,82.3,False,True,False,False,False,False
271,2020,5018.700,80.4,False,False,False,True,False,False
272,2020,4665.641,84.7,False,False,False,False,True,False


In [27]:
X = healthexp.drop(['Life_Expectancy'], axis=1)
y= healthexp['Life_Expectancy']

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=54)
rfr = GradientBoostingRegressor(random_state=34)
rfr.fit(X_train, y_train)

In [28]:
y_pred = rfr.predict(X_test)
y_pred

array([78.39157822, 79.56433878, 77.29122733, 79.10343937, 80.96445194,
       77.29122733, 81.03357969, 81.0478605 , 73.89333767, 77.36302545,
       81.88363916, 77.31394029, 71.47882512, 74.82462518, 82.95075741,
       78.07791025, 71.37073548, 81.34544275, 73.48856096, 80.50574084,
       75.02644369, 80.19365241, 73.22967666, 78.54255466, 77.87168267,
       76.14733657, 73.14057728, 82.16757383, 74.42529019, 76.12931714,
       76.30302906, 81.41008928, 76.11675626, 78.73694658, 78.5494218 ,
       82.64753618, 81.09723669, 76.11675626, 77.89083117, 76.6816684 ,
       79.75257164, 84.16434938, 74.41614983, 78.72096389, 82.16529143,
       84.03429896, 79.14171978, 80.44485511, 81.71684243, 81.2197126 ,
       80.93125484, 79.89895388, 78.76478282, 78.28311191, 79.3784097 ])

In [29]:
mean_absolute_error(y_test, y_pred)

0.2709170644443816

In [30]:
root_root_mean_squared_error(y_test, y_pred)

0.355971309502974

In [31]:
r2_score(y_test, y_pred)

0.9866397423796623

In [32]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 300)
    learning_rate = trial.suggest_float('learning_rate', 0.01, 0.3)
    max_depth = trial.suggest_int('max_depth', 2, 10)
    subsample = trial.suggest_float('subsample', 0.5, 1.0)

    # Define the model
    model = GradientBoostingRegressor(
        n_estimators=n_estimators,
        learning_rate=learning_rate,
        max_depth=max_depth,
        subsample=subsample,
        random_state=42
    )

    model.fit(X_train, y_train)
    
    y_pred = model.predict(X_test)
    mse = root_root_mean_squared_error(y_test, y_pred)
    
    return mse

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

print(f"Best hyperparameters: {study.best_params}")

[I 2024-10-20 11:55:27,700] A new study created in memory with name: no-name-96fbbaf2-3f29-44b6-9cc6-00f7be9dc2b3
[I 2024-10-20 11:55:27,950] Trial 0 finished with value: 0.32519704881303185 and parameters: {'n_estimators': 297, 'learning_rate': 0.267593117461795, 'max_depth': 9, 'subsample': 0.7706360903802824}. Best is trial 0 with value: 0.32519704881303185.
[I 2024-10-20 11:55:28,018] Trial 1 finished with value: 0.32933339451026583 and parameters: {'n_estimators': 95, 'learning_rate': 0.2589727113442577, 'max_depth': 5, 'subsample': 0.8863003094336444}. Best is trial 0 with value: 0.32519704881303185.
[I 2024-10-20 11:55:28,338] Trial 2 finished with value: 0.34532580713847005 and parameters: {'n_estimators': 279, 'learning_rate': 0.21427980221020418, 'max_depth': 9, 'subsample': 0.9493276445660934}. Best is trial 0 with value: 0.32519704881303185.
[I 2024-10-20 11:55:28,597] Trial 3 finished with value: 0.337141751498068 and parameters: {'n_estimators': 288, 'learning_rate': 0.13

Best hyperparameters: {'n_estimators': 216, 'learning_rate': 0.14322018406580803, 'max_depth': 4, 'subsample': 0.5406516005689751}


In [33]:
study.best_params

{'n_estimators': 216,
 'learning_rate': 0.14322018406580803,
 'max_depth': 4,
 'subsample': 0.5406516005689751}

In [35]:
optuna.visualization.plot_optimization_history(study)

In [36]:
optuna.visualization.plot_parallel_coordinate(study)

In [38]:
optuna.visualization.plot_slice(study, params=['n_estimators', 'learning_rate', 'max_depth', 'subsample'])

In [40]:
optuna.visualization.plot_param_importances(study)