# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

source: https://www.youtube.com/watch?v=t-INgABWULw

* Grid Search CV: Metode yang mencoba **semua** kombinasi hyperparameter yang telah ditentukan dalam sebuah grid. Metode ini yang paling sederhana tetapi bisa sangat memakan waktu dan sumber daya komputasi, terutama jika grid hyperparameter sangat besar.
* Randomized Search CV:Metode ini memilih kombinasi hyperparameter secara **acak** dari ruang pencarian yang telah ditentukan. Metode ini lebih efisien dibandingkan Grid Search.
* Bayesian Search CV: Metode ini menggunakan teorema Bayes untuk membangun model probabilitas dari fungsi objektif (misalnya, akurasi model) berdasarkan hasil percobaan sebelumnya. Model Bayesian Search dapat memprediksi hyperparameter mana yang kemungkinan besar akan memberikan hasil terbaik, sehingga pencarian menjadi lebih efisien dibandingkan Grid Search atau Randomized Search.
* Optuna: Metode framework hyperparameter tuning yang menggunakan pendekatan sampling dan pruning yang cerdas. Optuna secara dinamis membangun ruang pencarian hyperparameter berdasarkan hasil percobaan sebelumnya (mirip dengan Bayesian Search) dan dapat menghentikan percobaan yang tidak menjanjikan lebih awal (pruning), sehingga menghemat waktu dan sumber daya, dan juga mendukung berbagai algoritma optimasi.

# Import Data & Libraries

In [1]:
# jalankan hanya sekali
!pip install optuna -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m399.4/400.9 kB[0m [31m17.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# import library yang dibutuhkan di sini
...

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

In [4]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [5]:
# ubah variabel kategorik ke numerik
le = LabelEncoder()
df['species'] = le.fit_transform(df['species'])
display(df.head())

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [11]:
y_pred = rfr.predict(X_test)

In [12]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [7]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [9]:
# split dengan rasio 80:20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base Model Random Forest

In [10]:
# gunakan random forest classifier
rfr = RandomForestClassifier(random_state=69)
rfr.fit(X_train, y_train)

In [None]:
y_pred = rfr.predict(X_test)

In [None]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 0.967
Precision: 0.970
Recall: 0.967
F1 Score: 0.967
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.89      1.00      0.94         8
           2       1.00      0.92      0.96        12

    accuracy                           0.97        30
   macro avg       0.96      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



# Optuna

In [13]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 2, 50)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 32)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 32)

    model = RandomForestClassifier(n_estimators=n_estimators,
                                   max_depth=max_depth,
                                   min_samples_split=min_samples_split,
                                   min_samples_leaf=min_samples_leaf,
                                   random_state=69)

    score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [15]:
import optuna
study = optuna.create_study(direction='maximize')

[I 2025-10-04 15:13:49,366] A new study created in memory with name: no-name-b72b8a67-60bf-4f0d-91f4-0c6c9a8186cc


In [17]:
study.optimize(objective, n_trials=100)

[I 2025-10-04 15:22:07,048] Trial 67 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 940, 'max_depth': 23, 'min_samples_split': 20, 'min_samples_leaf': 3}. Best is trial 10 with value: 0.95.
[I 2025-10-04 15:22:25,808] Trial 68 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 853, 'max_depth': 2, 'min_samples_split': 23, 'min_samples_leaf': 19}. Best is trial 10 with value: 0.95.
[I 2025-10-04 15:22:33,474] Trial 69 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 885, 'max_depth': 49, 'min_samples_split': 31, 'min_samples_leaf': 2}. Best is trial 10 with value: 0.95.
[I 2025-10-04 15:22:41,865] Trial 70 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 970, 'max_depth': 45, 'min_samples_split': 21, 'min_samples_leaf': 12}. Best is trial 10 with value: 0.95.
[I 2025-10-04 15:22:45,372] Trial 71 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 466, 'max_depth': 29, 'mi

KeyboardInterrupt: 

it may take a while... so just wait n see ^^

they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

Berikut hasil hyperparameter tuning dari Optuna

In [None]:
# cek hasil hyperparameter tuning dari Optuna

In [18]:
study.best_params

{'n_estimators': 988,
 'max_depth': 49,
 'min_samples_split': 22,
 'min_samples_leaf': 1}

# Random Forest Using Optuna

In [20]:
# simpan hasil best hyperparameter tuning ke variabel baru
best_params = study.best_params

In [21]:
best_model = RandomForestClassifier(**best_params, random_state=69)
best_model.fit(X_train, y_train)

In [24]:
y_pred = best_model.predict(X_test)

In [25]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus