# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

GridSearchCV: Mencoba semua kombinasi parameter, akurat tapi lambat.

RandomizedSearchCV: Mencoba kombinasi acak, lebih cepat tapi hasil bisa kurang optimal.

BayesianSearchCV: Gunakan hasil sebelumnya untuk menebak kombinasi terbaik, lebih efisien.

Optuna: Optimisasi otomatis dan adaptif, paling cepat dan fleksibel.

source: https://www.youtube.com/watch?v=t-INgABWULw

In [1]:
# jalankan hanya sekali
!pip install optuna -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m399.4/400.9 kB[0m [31m21.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [8]:
# import library yang dibutuhkan di sini
import optuna
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.svm import SVC
import seaborn as sns


In [9]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [10]:
# ubah variabel kategorik ke numerik
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['species'] = le.fit_transform(df['species'])
df.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [None]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [11]:
# split dengan rasio 80:20
from sklearn.model_selection import train_test_split

X = df.drop('species', axis=1)
y = df['species']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


# Base Model Random Forest

In [13]:
# gunakan random forest classifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
rfr = RandomForestClassifier(random_state=42)
rfr.fit(X_train, y_train)

In [14]:
y_pred = rfr.predict(X_test)

In [17]:
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report, accuracy_score

print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Optuna

In [18]:
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 300)
    max_depth = trial.suggest_int('max_depth', 2, 20)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 4)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        random_state=42
    )

    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    acc = accuracy_score(y_test, y_pred)
    return acc


Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [20]:
study = optuna.create_study(direction='maximize')


[I 2025-10-17 14:57:21,804] A new study created in memory with name: no-name-c4a75902-d61c-46ea-a2ae-dd902ee24e28


In [21]:
study.optimize(objective, n_trials=100)

[I 2025-10-17 14:57:30,722] Trial 0 finished with value: 1.0 and parameters: {'n_estimators': 70, 'max_depth': 10, 'min_samples_split': 8, 'min_samples_leaf': 4}. Best is trial 0 with value: 1.0.
[I 2025-10-17 14:57:31,121] Trial 1 finished with value: 1.0 and parameters: {'n_estimators': 219, 'max_depth': 5, 'min_samples_split': 3, 'min_samples_leaf': 2}. Best is trial 0 with value: 1.0.
[I 2025-10-17 14:57:31,673] Trial 2 finished with value: 1.0 and parameters: {'n_estimators': 275, 'max_depth': 6, 'min_samples_split': 3, 'min_samples_leaf': 2}. Best is trial 0 with value: 1.0.
[I 2025-10-17 14:57:32,053] Trial 3 finished with value: 1.0 and parameters: {'n_estimators': 211, 'max_depth': 6, 'min_samples_split': 4, 'min_samples_leaf': 1}. Best is trial 0 with value: 1.0.
[I 2025-10-17 14:57:32,263] Trial 4 finished with value: 1.0 and parameters: {'n_estimators': 108, 'max_depth': 14, 'min_samples_split': 6, 'min_samples_leaf': 3}. Best is trial 0 with value: 1.0.
[I 2025-10-17 14:57

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [22]:
study.best_params

{'n_estimators': 70,
 'max_depth': 10,
 'min_samples_split': 8,
 'min_samples_leaf': 4}

Berikut hasil hyperparameter tuning dari Optuna

In [25]:
# cek hasil hyperparameter tuning dari Optuna
print("Best Parameters:", study.best_params)
print("Best Accuracy:", study.best_value)


Best Parameters: {'n_estimators': 70, 'max_depth': 10, 'min_samples_split': 8, 'min_samples_leaf': 4}
Best Accuracy: 1.0


# Random Forest Using Optuna

In [26]:
# simpan hasil best hyperparameter tuning ke variabel bari
best_params = study.best_params

In [29]:
best_model = RandomForestClassifier(**best_params, random_state=42)

best_model.fit(X_train, y_train)

In [30]:
y_pred = best_model.predict(X_test)

In [31]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus