# Optuna - A hyperparameter optimization framework

Optuna is a popular hyperparameter optimization library, and it offers several advantages over traditional methods like grid search.

| **Aspect**                           | **Grid Search**                                                                                           | **Optuna**                                                                                                                |
|--------------------------------------|-----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|
| **Efficiency**                       | Evaluates every combination of hyperparameters. Can be computationally expensive for large search spaces. | Uses efficient algorithms like TPE. It's more sample-efficient and can find good hyperparameters with fewer trials than grid search.                                                               |
| **Dynamic Search Space**             | Requires a fixed search space. You need to define all possible values for each hyperparameter upfront.                                                                            | Allows for dynamic search spaces.                                                                                          |
| **Pruning**                          | No inherent mechanism for early stopping. If a particular combination of hyperparameters is unlikely to produce a good result, it will still complete the full evaluation.                                                                 | Supports pruning strategies for early stopping. If during a trial, it becomes clear that the given hyperparameters are not promising, the trial can be stopped early, saving computational resources.                                                                            |
| **Integration and Flexibility**      | Integrates well with scikit-learn but may require work with other frameworks.                             | Offers easy integration with various machine learning frameworks like TensorFlow, PyTorch, scikit-learn, LightGBM, and more.                                              |
| **Parallelization**                  | Can be parallelized but each worker operates independently.                                               | Supports distributed optimization with coordinated workers.                                                               |
| **Visualization**                    | Typically lacks built-in visualization tools.                                                              | Provides a suite of visualization tools to understand the optimization process, relationships between hyperparameters, and more.                                                                                   |
| **Flexibility in Objective Functions** | Works with models that return a score.                                                                   | Can optimize any objective function, not just model scores. For example, you can optimize for model inference speed, memory usage, or any custom metric.                                                               |


A simple optimization problem can be understood as:

Step 1 - Define objective function to be optimized. Let's minimize (x - 2)^2

Step 2 - Suggest hyperparameter values using trial object. Here, a float value of x is suggested from -10 to 10

Step 3 - Create a study object and invoke the optimize method over 100 trials

### Jargon

**Trial:** One trial in Optuna is one run with a set of hyperparameters.

**Study:** Many trial together make a study. (i.e. say you ran 10 combinations of hyperparameters and calculated 10 different accuracy scores. That is your study)

**Objective function:** The relation between accuracy and the hyperparameter (say `max_depth`). Accuracy is some function of `max_depth`.

Optuna uses Bayesian search which helps identify which combination to try next keeping in memory what it has learned from the previous trials. (its like XGboost where the next tree acts and improves on the previous tree, unlike random forest where every tree is acting independantly.) In Grid Search and Random search every trial is independant.

In [None]:
#Install Optune
!pip install optuna

In [23]:
import optuna

def objective(trial):
    x = trial.suggest_float('x', -10, 10)
    return (x - 2) ** 2

study = optuna.create_study()
study.optimize(objective, n_trials=100)
study.best_params

{'x': 2.0013284735864136}

**We get x = 2.0013284735864136** for which the objective function  (x - 2)^2 is the minimum between the values -10 and 10

## 1. Load the iris dataset

In [24]:
import optuna
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score


In [3]:
# Load Iris dataset
data = load_iris()
X = data.data
y = data.target

## Iris Dataset Metadata
The Iris dataset contains measurements for 150 iris flowers from three different species with four features each (sepal length, sepal width, petal length, and petal width).

The three classes in the Iris dataset:
1. Iris-setosa (n=50)
2. Iris-versicolor (n=50)
3. Iris-virginica (n=50)

The four features of the Iris dataset:
1. Sepal length in cm
2. Sepal width in cm
3. Petal length in cm
4. Petal width in cm

Below is the metadata table for the features:


In [15]:
import pandas as pd

# Metadata for Iris dataset
data = {
    'Feature Name': ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'],
    'Description': ['Length of the sepal', 'Width of the sepal', 'Length of the petal', 'Width of the petal'],
    'Scale': ['Continuous', 'Continuous', 'Continuous', 'Continuous'],
    'Min Value': [4.3, 2.0, 1.0, 0.1],
    'Max Value': [7.9, 4.4, 6.9, 2.5]
}

df_metadata = pd.DataFrame(data)
df_metadata


Unnamed: 0,Feature Name,Description,Scale,Min Value,Max Value
0,sepal length (cm),Length of the sepal,Continuous,4.3,7.9
1,sepal width (cm),Width of the sepal,Continuous,2.0,4.4
2,petal length (cm),Length of the petal,Continuous,1.0,6.9
3,petal width (cm),Width of the petal,Continuous,0.1,2.5


 The target variable, y, represents the species of each flower (setosa, versicolor, or virginica).

0: Iris-setosa

1: Iris-versicolor

2: Iris-virginica

## 2. Define the objective function

The objective function is the heart of the optimization process. It defines the model, its hyperparameters, how it's trained, and what metric is returned for Optuna to optimize.

In [25]:
# Define the object function
""" 
In a objective funtion you define:
1. the hyperparameters to tune and the range for each hyperparameter.
2. create an object of the model class
3. and evaluation metric. (accuracy in this case)
"""

def objective(trial):

    # Define range of hyperparameters
    n_estimators = trial.suggest_int('n_estimators', 2, 150)
    max_depth = trial.suggest_int('max_depth', 1, 32, log=True)
    min_samples_split = trial.suggest_float('min_samples_split', 0.1, 1)
    min_samples_leaf = trial.suggest_float('min_samples_leaf', 0.1, 0.5)
    max_features = trial.suggest_categorical('max_features', ['auto', 'sqrt', 'log2'])

    clf = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        max_features=max_features
    )
    # perform 3-fold cross validation and calculate accuracy.
    # why mean? read here https://medium.com/analytics-vidhya/regularization-and-cross-validation-how-to-choose-the-penalty-value-lambda-1217fa4351e5
    score = cross_val_score(clf, X, y, n_jobs=-1, cv=3, scoring='accuracy').mean()
    
    return score


Inside this function:

**trial:** This is an object provided by Optuna, which is used to suggest values for the hyperparameters during each iteration.
For the hyperparameters of the Random Forest:

`n_estimators`: The number of trees in the forest.

`max_depth`: The maximum depth of the tree.

`min_samples_split`: The minimum number of samples required to split an internal node.

`min_samples_leaf`: The minimum number of samples required to be at a leaf node.

`max_features`: The number of features to consider when looking for the best split.

The `trial.suggest_...` methods are used to suggest a value for a hyperparameter. For instance, `trial.suggest_int('n_estimators', 2, 150)` will suggest an integer between 2 and 150 for the n_estimators hyperparameter.

After defining the hyperparameters, we train the Random Forest classifier using the current hyperparameters and evaluate it using 3-fold cross-validation. The mean of these scores is then returned.

In [26]:
# Initiate the study object
# maximize because we are after accuracy. And we would always want maximum accuracy.
# there is a default sampler (TPE) argument that uses the bayesian search
study = optuna.create_study(direction='maximize')

# optimize the objective function.
study.optimize(objective, n_trials=100)

# Results
print('Number of finished trials: ', len(study.trials))

Number of finished trials:  100
Best trial:
Value:  0.9666666666666667
Params: 
    n_estimators: 130
    max_depth: 27
    min_samples_split: 0.5732436401151582
    min_samples_leaf: 0.19455512594274585
    max_features: auto


In [None]:
#return the best trial value
trial = study.best_trial
print('Best trial value: ', trial.value)

#return a dictionary of the best hyperparameters found
print('Params: ')
for key, value in trial.params.items():
    print(f'    {key}: {value}')

* The `create_study` method initializes a new study. The `direction` parameter tells **Optuna** whether it should try to **maximize** or minimize the returned value from the objective function. In our case, we want to **maximize the cross-validation score.**

* `study.optimize` tells Optuna to start the optimization. The objective function is passed as the first argument, and `n_trials` determines how many iterations Optuna should perform.

#PLOTTING THE STUDY CREATED BY OPTUNA

Plotting the optimization history of the study.

In [27]:
# For visulaization
from optuna.visualization import plot_optimization_history, plot_parallel_coordinate, plot_slice, plot_contour, plot_param_importance

Plotting the accuracies for each hyperparameter for each trial.

In [28]:
optuna.visualization.plot_slice(study)