# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

source: https://www.youtube.com/watch?v=t-INgABWULw

**Grid Search CV**

Penentuan parameter dilakukan secara sistematis

**Randomized Search CV**

Penentuan parameter dilakukan secara acak

**Bayesian Search CV**

Penentuan parameter dilakukan secara probabilistik

**Optuna**

Library yang tersedia di python yang menggunakan Bayesian Optimization dalam menentukan parameternya

# Import Data & Libraries

In [1]:
# jalankan hanya sekali
!pip install optuna -q

In [2]:
# import library yang dibutuhkan di sini
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [3]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [4]:
# ubah variabel kategorik ke numerik
df = pd.get_dummies(df)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species_setosa,species_versicolor,species_virginica
0,5.1,3.5,1.4,0.2,True,False,False
1,4.9,3.0,1.4,0.2,True,False,False
2,4.7,3.2,1.3,0.2,True,False,False
3,4.6,3.1,1.5,0.2,True,False,False
4,5.0,3.6,1.4,0.2,True,False,False


In [5]:
# subsetting peubah
X = df.drop(columns=['species_setosa', 'species_versicolor', 'species_virginica'])
y = df[['species_setosa', 'species_versicolor', 'species_virginica']]

# Dataset Splitting

In [6]:
# split dengan rasio 80:20
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base Model Random Forest

In [7]:
# gunakan random forest classifier
from sklearn.ensemble import RandomForestClassifier
rfr = RandomForestClassifier(random_state=42)
rfr.fit(X_train, y_train)

In [8]:
y_pred = rfr.predict(X_test)

In [9]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

   micro avg       1.00      1.00      1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30
 samples avg       1.00      1.00      1.00        30



# Optuna

In [10]:
# import optuna
import optuna
from sklearn.model_selection import cross_val_score
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 1, 10)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        random_state=42
    )

    score = cross_val_score(model, X_train, y_train, cv = 5)

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [11]:
study = optuna.create_study(direction='maximize')

[I 2025-10-03 14:02:37,724] A new study created in memory with name: no-name-76100f3b-14ff-40d1-8805-6b25b1f5a970


In [12]:
study.optimize(objective, n_trials=100)

[I 2025-10-03 14:02:47,047] Trial 0 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 575, 'max_depth': 6, 'min_samples_split': 2, 'min_samples_leaf': 7}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 14:02:56,810] Trial 1 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 999, 'max_depth': 10, 'min_samples_split': 4, 'min_samples_leaf': 4}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 14:02:57,904] Trial 2 finished with value: 0.65 and parameters: {'n_estimators': 143, 'max_depth': 1, 'min_samples_split': 3, 'min_samples_leaf': 2}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 14:03:02,512] Trial 3 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 606, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 1}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 14:03:08,257] Trial 4 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [13]:
study.best_params

{'n_estimators': 999,
 'max_depth': 9,
 'min_samples_split': 4,
 'min_samples_leaf': 1}

Berikut hasil hyperparameter tuning dari Optuna

In [14]:
# cek hasil hyperparameter tuning dari Optuna
best_params = study.best_params
print(f"Best Hyperparameter : {best_params}")

Best Hyperparameter : {'n_estimators': 999, 'max_depth': 9, 'min_samples_split': 4, 'min_samples_leaf': 1}


# Random Forest Using Optuna

In [15]:
# simpan hasil best hyperparameter tuning ke variabel bari
best_n_estimators = best_params['n_estimators']
best_max_depth = best_params['max_depth']
best_min_samples_split = best_params['min_samples_split']
best_min_samples_leaf = best_params['min_samples_leaf']

In [16]:
best_model = RandomForestClassifier(
    n_estimators=best_n_estimators,
    max_depth=best_max_depth,
    min_samples_split=best_min_samples_split,
    min_samples_leaf=best_min_samples_leaf
)

best_model.fit(X_train, y_train)

In [17]:
y_pred = best_model.predict(X_test)

In [18]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

   micro avg       1.00      1.00      1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30
 samples avg       1.00      1.00      1.00        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus