# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

source: https://www.youtube.com/watch?v=t-INgABWULw

# Import Data & Libraries

In [1]:
# jalankan hanya sekali
# !pip install optuna -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m399.4/400.9 kB[0m [31m12.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [17]:
# import library yang dibutuhkan di sini
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.model_selection import cross_val_score
import optuna
import seaborn as sns
import pandas as pd

In [3]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [5]:
# ubah variabel kategorik ke numerik
categorical = ['species']
df[categorical] = df[categorical].apply(lambda x: pd.factorize(x)[0])
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [6]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [7]:
# split dengan rasio 80:20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base Model Random Forest

In [8]:
# gunakan random forest classifier
rfr = RandomForestClassifier()
rfr.fit(X_train, y_train)

In [9]:
y_pred = rfr.predict(X_test)

In [10]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Optuna

In [25]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 500, step=100)
    max_depth = trial.suggest_int('max_depth', 10, 100, step=10)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10, step=1)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 10, step=1)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        random_state=42
    )

    score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [26]:
study = optuna.create_study(direction='maximize')

[I 2025-10-04 14:03:19,680] A new study created in memory with name: no-name-5921becc-f46d-4eb3-9d85-e44840b9089f


In [27]:
study.optimize(objective, n_trials=100)

[I 2025-10-04 14:03:23,259] Trial 0 finished with value: 0.95 and parameters: {'n_estimators': 300, 'max_depth': 10, 'min_samples_split': 4, 'min_samples_leaf': 1}. Best is trial 0 with value: 0.95.
[I 2025-10-04 14:03:26,530] Trial 1 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 400, 'max_depth': 80, 'min_samples_split': 10, 'min_samples_leaf': 2}. Best is trial 0 with value: 0.95.
[I 2025-10-04 14:03:28,171] Trial 2 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 200, 'max_depth': 10, 'min_samples_split': 5, 'min_samples_leaf': 5}. Best is trial 0 with value: 0.95.
[I 2025-10-04 14:03:29,039] Trial 3 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 100, 'max_depth': 10, 'min_samples_split': 9, 'min_samples_leaf': 10}. Best is trial 0 with value: 0.95.
[I 2025-10-04 14:03:30,237] Trial 4 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 100, 'max_depth': 80, 'min_samples_split': 7, 'min_

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [28]:
study.best_params

{'n_estimators': 300,
 'max_depth': 60,
 'min_samples_split': 3,
 'min_samples_leaf': 2}

Berikut hasil hyperparameter tuning dari Optuna

In [31]:
# cek hasil hyperparameter tuning dari Optuna
best_params = study.best_params

{'n_estimators': 300,
 'max_depth': 60,
 'min_samples_split': 3,
 'min_samples_leaf': 2}

# Random Forest Using Optuna

In [21]:
# simpan hasil best hyperparameter tuning ke variabel bari
best_params = study.best_params

In [32]:
best_model = RandomForestClassifier(**best_params)
best_model.fit(X_train, y_train)

In [33]:
y_pred = best_model.predict(X_test)

In [34]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus