---
title: "Hyperparameter Optimization with Optuna"
author: "Rizdi Aprilian"
date: "2023-06-17"
categories: [Hyperparameter, Optuna, Bayesian]

---

In developing predictive machine learning, a practice that many could not overlook at is pursuing a hyperparameter that return the best possible of optimum prediction on data. Grid search and Randomized search have been greatly relied on for quite some time. Grid search selects a set collection of values to test for each parameter by creating permutations of each value from each searching group, as for many perceive this method as brute-force-search. Random search, meanwhile, performs testing the effects of different hyperparameters at the same time using random sampling of each group, and this technique comes at the cost of suboptimal training while greatly reduce computational load and time required to run grid search.

That's where Optuna fill the gap. Known from many as framework agnostic (meaning that regardless of machine learning libraries you're using), Optuna is specifically designed for running hyperparameter optimization in automatic fashion. What's more is that Optuna also supports for parallelism - scaling this task from one machine to multiple machine in a cluster. However, in this page, I'll demonstrated how Optuna can assist in arriving at hyperparameter with optimum prediction in local machine. LGBM is chosen.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, MinMaxScaler, LabelEncoder, OneHotEncoder

from sklearn.model_selection import train_test_split, KFold, StratifiedKFold
from sklearn.metrics import roc_auc_score, accuracy_score, confusion_matrix, ConfusionMatrixDisplay, RocCurveDisplay

import optuna
import lightgbm as lgb
from lightgbm import LGBMClassifier


import warnings

warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', 100)
plt.style.use('fivethirtyeight')


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
df = pd.read_csv('normalized_data.csv', sep=',')
df.head(5)

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.0,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.0,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.0,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.0,2.7,116,0,0,8,1


In [3]:
cat_cols = list(filter(lambda x: x if len(df[x].unique()) <= 3 else None, df.columns))

df[cat_cols] = df[cat_cols].astype('category')

In [4]:
X = df.loc[:,:"time"]
y = df.loc[:,["DEATH_EVENT"]]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((239, 12), (60, 12), (239, 1), (60, 1))

In [5]:
cat_columns = ['anaemia', 'diabetes', 'high_blood_pressure', 'sex', 'smoking']
num_columns = ['age', 'creatinine_phosphokinase', 'ejection_fraction', 'platelets', 'serum_creatinine', 'serum_sodium', 'time']

In [6]:
## Transformer Pipeline
num_transformer = Pipeline(steps=[
    ('scaler', StandardScaler())
])

cat_transformer = Pipeline(steps=[
    ('onehot', OneHotEncoder(drop='first', dtype=np.int))
])

## Column Transformer
preprocessor = ColumnTransformer([
    ('numeric', num_transformer, num_columns),
    ('categoric', cat_transformer, cat_columns),
])

## Apply Column Transformer
X_train = preprocessor.fit_transform(X_train)
X_test = preprocessor.transform(X_test)

## Label Encoding
y_transformer = LabelEncoder()
y_train = y_transformer.fit_transform(y_train).ravel()
y_test = y_transformer.transform(y_test).ravel()

In [7]:
def objective(trial):
    params = {
        'random_state': 23,
        'n_estimators': 200,
        'reg_alpha': trial.suggest_float('reg_alpha', 1E-10, 1E-5),
        'reg_lambda': trial.suggest_float('reg_lambda', 1E-10, 1E-5),
        'num_leaves': trial.suggest_int('num_leaves', 150, 300),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1),
        'max_depth': trial.suggest_int('max_depth', 10,50),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.01, 0.1),
        'subsample': trial.suggest_float('subsample', 0.1, 1.0),
        'min_child_samples': trial.suggest_int('min_child_samples', 1, 20),
        'subsample_freq': trial.suggest_int('subsample_freq', 1, 10),
        'objective': 'binary',
        'metric': 'binary_logloss'
    }
    
    # Create the  LGBMClassifier learning object
    lgbm_model = LGBMClassifier(**params)
    
    lgbm_model.fit(X_train, y_train)
    
    # Make predictions on the validation set
    y_pred = lgbm_model.predict(X_test)

    # Calculate MAE as the evaluation metric
    roc_auc = roc_auc_score(y_test, y_pred)

    # Return the evaluation metric value as the objective value to be minimized
    return roc_auc

In [8]:
%%time

# Create an Optuna study
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

[I 2023-06-17 13:34:12,545] A new study created in memory with name: no-name-f9f4f083-528e-4b8a-bce8-2ade3b89f346
[I 2023-06-17 13:34:12,709] Trial 0 finished with value: 0.7246469833119383 and parameters: {'reg_alpha': 5.270142430321572e-06, 'reg_lambda': 7.0947209053917484e-06, 'num_leaves': 178, 'learning_rate': 0.0891781733041886, 'max_depth': 35, 'colsample_bytree': 0.03090950619557188, 'subsample': 0.24825549471540287, 'min_child_samples': 20, 'subsample_freq': 9}. Best is trial 0 with value: 0.7246469833119383.
[I 2023-06-17 13:34:13,227] Trial 1 finished with value: 0.6861360718870345 and parameters: {'reg_alpha': 7.025236249526668e-06, 'reg_lambda': 4.571522404954949e-06, 'num_leaves': 256, 'learning_rate': 0.04958566829222305, 'max_depth': 25, 'colsample_bytree': 0.07986235314153306, 'subsample': 0.4107035959692341, 'min_child_samples': 3, 'subsample_freq': 8}. Best is trial 0 with value: 0.7246469833119383.
[I 2023-06-17 13:34:13,341] Trial 2 finished with value: 0.5 and par

CPU times: total: 2min 25s
Wall time: 29.1 s


## Quick Visualization for Hyperparameter Optimization Analysis

In [9]:
# Get the best set of hyperparameters

optimized_params = study.best_trial.params
optimized_params

{'reg_alpha': 6.833177329036985e-06,
 'reg_lambda': 4.958487110482849e-06,
 'num_leaves': 241,
 'learning_rate': 0.08139171493009001,
 'max_depth': 42,
 'colsample_bytree': 0.032120190527832874,
 'subsample': 0.2178632721062016,
 'min_child_samples': 13,
 'subsample_freq': 10}

In [10]:
# plot_optimization_histor: shows the scores from all trials as well as the best score so far at each point.
optuna.visualization.plot_optimization_history(study)

In [11]:
optuna.visualization.plot_slice(study)

In [13]:
# Visualize parameter importances.
optuna.visualization.plot_param_importances(study)

## LGBM with Optimized Parameter

In [27]:
## Additional params
optimized_params['objective'] = 'binary'
optimized_params['metric'] = 'binary_logloss'
optimized_params['n_estimators'] = 200
optimized_params['random_state'] = 23

## Updated optimized hyperparameters
optimized_params

{'reg_alpha': 6.833177329036985e-06,
 'reg_lambda': 4.958487110482849e-06,
 'num_leaves': 241,
 'learning_rate': 0.08139171493009001,
 'max_depth': 42,
 'colsample_bytree': 0.032120190527832874,
 'subsample': 0.2178632721062016,
 'min_child_samples': 13,
 'subsample_freq': 10,
 'objective': 'binary',
 'metric': 'binary_logloss',
 'n_estimators': 200,
 'random_state': 23}

In [28]:
%%time
lgbm_model = LGBMClassifier(**optimized_params)
lgbm_model.fit(X_train, y_train)
y_predict = lgbm_model.predict(X_test)

CPU times: total: 469 ms
Wall time: 95 ms


In [29]:
roc_auc_score(y_test, y_predict)

0.8177150192554556

In [30]:
metrics = []
cm = []

In [31]:
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, auc
from sklearn.metrics import make_scorer, recall_score, precision_score
from sklearn.metrics import roc_curve, precision_recall_curve, matthews_corrcoef

In [32]:
## In heart failure, 0 defines survive and 1 represents confirmed death
specificity = make_scorer(precision_score, pos_label=0)
npv = make_scorer(recall_score, pos_label=0)

In [33]:
precision, recall, fscore, _ = score(y_test, y_predict, average='binary')
accuracy = accuracy_score(y_test, y_predict)
tnr = precision_score(y_test, y_predict, pos_label=0, average='binary')
npv = recall_score(y_test, y_predict, pos_label=0, average='binary')
mcc = matthews_corrcoef(y_test, y_predict)
precision_curve, recall_curve, _ = precision_recall_curve(y_test, y_predict)
pr = auc(recall_curve, precision_curve)
roc = roc_auc_score(y_test, y_predict)


cm.append(confusion_matrix(y_test, y_predict, labels=[1,0]))
metrics.append(pd.Series({'precision':precision,
                        'recall':recall,
                        'fscore':fscore,
                        'specificity':tnr,
                        'NPV': npv,
                        'accuracy':accuracy, 
                        'MCC':mcc,
                        'Precision-Recall AUC':pr,
                        'ROC AUC':roc}))

metrics = pd.concat(metrics, axis=1)

In [34]:
metrics

Unnamed: 0,0
precision,0.866667
recall,0.684211
fscore,0.764706
specificity,0.866667
NPV,0.95122
accuracy,0.866667
MCC,0.682629
Precision-Recall AUC,0.825439
ROC AUC,0.817715
