# Optuna - A hyperparameter optimization framework

In [None]:
#Install Optune
!pip install optuna

In [2]:
import optuna

A simple optimization problem can be understood as:

Step 1 - Define objective function to be optimized. Let's minimize (x - 2)^2

Step 2 - Suggest hyperparameter values using trial object. Here, a float value of x is suggested from -10 to 10

Step 3 - Create a study object and invoke the optimize method over 100 trials

In [None]:
def objective(trial):
    x = trial.suggest_float('x', -10, 10)
    return (x - 2) ** 2

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)
study.best_params

**We get x = 2.0013284735864136** for which the objective function  (x - 2)^2 is the minimum between the values -10 and 10

### Jargon

**Trial:** One trial in Optuna is one run with a set of hyperparameters.

**Study:** Many trial together make a study. (i.e. say you ran 10 combinations of hyperparameters and calculated 10 different accuracy scores. That is your study)

**Objective function:** The relation between accuracy and the hyperparameter (say `max_depth`). Accuracy is some function of `max_depth`.

Optuna uses Bayesian search which helps identify which combination to try next keeping in memory what it has learned from the previous trials. (its like XGboost where the next tree acts and improves on the previous tree, unlike random forest where every tree is acting independantly.) In Grid Search and Random search every trial is independant.

## 1. Load the iris dataset

In [6]:
import optuna
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score


In [7]:
# Load Iris dataset
data = load_iris()
X = data.data
y = data.target

## Iris Dataset Metadata
The Iris dataset contains measurements for 150 iris flowers from three different species with four features each (sepal length, sepal width, petal length, and petal width).

The three classes in the Iris dataset:
1. Iris-setosa (n=50)
2. Iris-versicolor (n=50)
3. Iris-virginica (n=50)

The four features of the Iris dataset:
1. Sepal length in cm
2. Sepal width in cm
3. Petal length in cm
4. Petal width in cm

Below is the metadata table for the features:


 The target variable, y, represents the species of each flower (setosa, versicolor, or virginica).

0: Iris-setosa

1: Iris-versicolor

2: Iris-virginica

In [8]:
import pandas as pd

# Metadata for Iris dataset
data = {
    'Feature Name': ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'],
    'Description': ['Length of the sepal', 'Width of the sepal', 'Length of the petal', 'Width of the petal'],
    'Scale': ['Continuous', 'Continuous', 'Continuous', 'Continuous'],
    'Min Value': [4.3, 2.0, 1.0, 0.1],
    'Max Value': [7.9, 4.4, 6.9, 2.5]
}

df_metadata = pd.DataFrame(data)
df_metadata


Unnamed: 0,Feature Name,Description,Scale,Min Value,Max Value
0,sepal length (cm),Length of the sepal,Continuous,4.3,7.9
1,sepal width (cm),Width of the sepal,Continuous,2.0,4.4
2,petal length (cm),Length of the petal,Continuous,1.0,6.9
3,petal width (cm),Width of the petal,Continuous,0.1,2.5


## 2. Define the objective function

The objective function is the heart of the optimization process. It defines the model, its hyperparameters, how it's trained, and what metric is returned for Optuna to optimize.

In [11]:
# Define the object function
"""
In a objective funtion you define:
1. the hyperparameters to tune and the range for each hyperparameter.
2. create an object of the model class
3. and evaluation metric. (accuracy in this case)
"""

def objective(trial):

    # Define range of hyperparameters
    n_estimators = trial.suggest_int('n_estimators', 2, 150)
    max_depth = trial.suggest_int('max_depth', 1, 32, log=True)
    min_samples_split = trial.suggest_float('min_samples_split', 0.1, 1)
    min_samples_leaf = trial.suggest_float('min_samples_leaf', 0.1, 0.5)
    max_features = trial.suggest_categorical('max_features', ['sqrt', 'log2'])

    clf = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        max_features=max_features
    )
    # perform 3-fold cross validation and calculate accuracy.
    # why mean? read here https://medium.com/analytics-vidhya/regularization-and-cross-validation-how-to-choose-the-penalty-value-lambda-1217fa4351e5
    score = cross_val_score(clf, X, y, n_jobs=-1, cv=3, scoring='accuracy').mean()

    return score


Inside this function:

**trial:** This is an object provided by Optuna, which is used to suggest values for the hyperparameters during each iteration.
For the hyperparameters of the Random Forest:

`n_estimators`: The number of trees in the forest.

`max_depth`: The maximum depth of the tree.

`min_samples_split`: The minimum number of samples required to split an internal node.

`min_samples_leaf`: The minimum number of samples required to be at a leaf node.

`max_features`: The number of features to consider when looking for the best split.

The `trial.suggest_...` methods are used to suggest a value for a hyperparameter. For instance, `trial.suggest_int('n_estimators', 2, 150)` will suggest an integer between 2 and 150 for the n_estimators hyperparameter.

After defining the hyperparameters, we train the Random Forest classifier using the current hyperparameters and evaluate it using 3-fold cross-validation. The mean of these scores is then returned.

In [12]:
# Initiate the study object
# maximize because we are after accuracy. And we would always want maximum accuracy.
# there is a default sampler (TPE) argument that uses the bayesian search
study = optuna.create_study(direction='maximize')

# optimize the objective function.
# you can observe with the verbose how it selects different values for different hyperparameters in each step.
study.optimize(objective, n_trials=100)

# Results
print('Number of finished trials: ', len(study.trials))

[I 2024-09-26 02:43:26,053] A new study created in memory with name: no-name-0a8608e1-05d5-4099-b9c3-cc73c42ba722
[I 2024-09-26 02:43:26,877] Trial 0 finished with value: 0.9533333333333333 and parameters: {'n_estimators': 111, 'max_depth': 27, 'min_samples_split': 0.6248259506965022, 'min_samples_leaf': 0.18793765813160546, 'max_features': 'log2'}. Best is trial 0 with value: 0.9533333333333333.
[I 2024-09-26 02:43:27,325] Trial 1 finished with value: 0.32666666666666666 and parameters: {'n_estimators': 58, 'max_depth': 11, 'min_samples_split': 0.6995196984851044, 'min_samples_leaf': 0.4057721142073504, 'max_features': 'sqrt'}. Best is trial 0 with value: 0.9533333333333333.
[I 2024-09-26 02:43:28,179] Trial 2 finished with value: 0.43333333333333335 and parameters: {'n_estimators': 133, 'max_depth': 13, 'min_samples_split': 0.2994430222926844, 'min_samples_leaf': 0.3513415062149611, 'max_features': 'log2'}. Best is trial 0 with value: 0.9533333333333333.
[I 2024-09-26 02:43:29,165] T

Number of finished trials:  100


In [34]:
#return the best trial value
trial = study.best_trial
print('Best accuracy: ', trial.value)

#return a dictionary of the best hyperparameters found
print('Params: ')
for key, accuracy in trial.params.items():
    print(f'    {key}: {accuracy}')

Best accuracy:  0.9466666666666667
Params: 
    classifier: RandomForest
    n_estimators: 84
    max_depth: 12
    min_samples_split: 0.24117885530298538
    min_samples_leaf: 0.10417796450866255
    max_features: log2
    bootstrap: True


* The `create_study` method initializes a new study. The `direction` parameter tells **Optuna** whether it should try to **maximize** or minimize the returned value from the objective function. In our case, we want to **maximize the cross-validation score.**

* `study.optimize` tells Optuna to start the optimization. The objective function is passed as the first argument, and `n_trials` determines how many iterations Optuna should perform.

#PLOTTING THE STUDY CREATED BY OPTUNA

Plotting the optimization history of the study.

In [15]:
# For visulaization
from optuna.visualization import plot_optimization_history, plot_parallel_coordinate, plot_slice, plot_param_importances

Plotting the accuracies for each hyperparameter for each trial.

In [16]:
# Optimization history plot
plot_optimization_history(study).show()

The chart above shows that after 31st trail there was no change or improvement in the accuracy.

In [17]:
# Parallel coordinates plot
plot_parallel_coordinate(study).show()

The first vertical line is the accuracy line. There are all together 100 lines. each line from first vertical line to the last is a **trial**

You can observe that the 'sqrt' value of max_features is mostly used for many trials.

In [18]:
# plot slice value plot
optuna.visualization.plot_slice(study)

This is like a correlation plot of every hyperprameter with the objective value. (accuracy)

You can observe that a max_depth between 10 to 12 is where usually you get higher accuracy.

In [20]:
# param importance plot
plot_param_importances(study).show()

This tells you that min_samples_leaf has a importance of 60% whereas max_depth's importance is onlyu 4-5%

## Selecting ML Model using Optuna

With Optuna, earlier we saw how we can choose hyperparameter values for a particular model.

Optuna also helps us to idenify which model is best suited for this data along with its hyperparameter values.

In [21]:
# lets import a few models
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC

### Define by Run method

In [22]:
# lets define the objective function
def objf(trial):
  # choose the algorithm to tune
  classifier_name = trial.suggest_categorical('classifier', ['SVC', 'RandomForest', 'DecisionTree'])

  if classifier_name == 'SVC':
    # hyperarameters
    c = trial.suggest_float('c', 1e-10, 1e10, log=True)
    kernel = trial.suggest_categorical('kernel', ['linear', 'poly', 'rbf', 'sigmoid'])
    gamma = trial.suggest_categorical('gamma', ['scale', 'auto'])

    # create the model
    model = SVC(C=c, kernel=kernel, gamma=gamma, random_state=42)

  elif classifier_name=='RandomForest':
    n_estimators=trial.suggest_int('n_estimators', 2, 150)
    max_depth=trial.suggest_int('max_depth', 1, 32, log=True)
    min_samples_split= trial.suggest_float('min_samples_split', 0.1, 1)
    min_samples_leaf=trial.suggest_float('min_samples_leaf', 0.1, 0.5)
    max_features=trial.suggest_categorical('max_features', ['sqrt', 'log2'])
    bootstrap=trial.suggest_categorical('bootstrap', [True, False])

    # create the model
    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        max_features=max_features,
        bootstrap=bootstrap
    )

    #you can use the 3rd one as well
    score = cross_val_score(model, X, y, cv=3, scoring='accuracy').mean()
    return score

In [None]:
from re import VERBOSE
# create a study
study = optuna.create_study(direction='maximize')

study.optimize(objf, n_trials=50)

In [31]:
# print the best trial
best_trial = study.best_trial
print("Best trial parameters:", best_trial.params)
print("Best accuracy:", best_trial.value)

Best trial parameters: {'classifier': 'RandomForest', 'n_estimators': 84, 'max_depth': 12, 'min_samples_split': 0.24117885530298538, 'min_samples_leaf': 0.10417796450866255, 'max_features': 'log2', 'bootstrap': True}
Best accuracy: 0.9466666666666667


The above code only tested RandomForest and SVC and found RandomForest giving best results.

In [30]:
study.trials_dataframe().groupby('params_classifier')['value'].mean()

Unnamed: 0_level_0,value
params_classifier,Unnamed: 1_level_1
DecisionTree,
RandomForest,0.702
SVC,


You can observe that out of 50 trials, Optuna was smart enought to realise that the potential is somewhere between the SVC and Deci