# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

**Grid Search**: merupakan metode pencarian kombinasi parameter dengan cara mencoba semua kemungkinan yang telah ditentukan. Hasilnya pasti menemukan kombinasi terbaik, tetapi prosesnya membutuhkan waktu lama karena harus menguji setiap kemungkinan satu per satu.

**Randomized Search**: bekerja dengan memilih kombinasi parameter secara acak dari ruang pencarian yang ada. Metode ini jauh lebih cepat dibanding Grid Search, meskipun hasilnya bisa sedikit berbeda setiap kali dijalankan karena sifatnya yang acak.

**Bayesian Search**: menggunakan pendekatan probabilistik untuk memperkirakan kombinasi parameter terbaik berdasarkan hasil percobaan sebelumnya. Cara ini membuat proses pencarian menjadi lebih efisien dan terarah, meskipun penerapannya lebih kompleks.

**Optuna**: adalah library modern yang menggunakan pendekatan mirip Bayesian Optimization, tetapi lebih fleksibel dan cepat. Optuna memiliki fitur pruning untuk menghentikan percobaan yang kurang baik lebih awal, serta menyediakan visualisasi hasil tuning secara otomatis.
source: https://www.youtube.com/watch?v=t-INgABWULw

# Import Data & Libraries

In [1]:
# jalankan hanya sekali
!pip install optuna -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━[0m [32m317.4/400.9 kB[0m [31m9.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# import library yang dibutuhkan di sini
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import optuna
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

In [3]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [13]:
# ubah variabel kategorik ke numerik and subsetting peubah
X = df_encoded.drop(['species_versicolor', 'species_virginica'], axis=1)
y = df_encoded[['species_versicolor', 'species_virginica']]

In [14]:
# ubah variabel kategorik ke numerik
# Remove get_dummies from here
df = pd.get_dummies(df, drop_first=True)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species_versicolor,species_virginica
0,5.1,3.5,1.4,0.2,False,False
1,4.9,3.0,1.4,0.2,False,False
2,4.7,3.2,1.3,0.2,False,False
3,4.6,3.1,1.5,0.2,False,False
4,5.0,3.6,1.4,0.2,False,False


# Dataset Splitting

In [15]:
# split dengan rasio 80:20
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=23, stratify=y
)

# Base Model Random Forest

In [16]:
# gunakan random forest classifier
from sklearn.ensemble import RandomForestClassifier
rfr = RandomForestClassifier(random_state=23)
rfr.fit(X_train, y_train)

In [17]:
y_pred = rfr.predict(X_test)

In [22]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00        10
   virginica       1.00      1.00      1.00        10

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Optuna

In [18]:
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import make_scorer, accuracy_score

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 10, 100)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 64)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 64)

    model = RandomForestClassifier(n_estimators=n_estimators,
                                  max_depth=max_depth,
                                  min_samples_split=min_samples_split,
                                  min_samples_leaf=min_samples_leaf,
                                  random_state=42)

    score = cross_val_score(model, X_train, y_train, cv=5, scoring=make_scorer(accuracy_score))

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [19]:
study = optuna.create_study(direction='maximize', sampler=optuna.samplers.RandomSampler(seed=42))

[I 2025-10-04 14:01:51,354] A new study created in memory with name: no-name-78797a97-968d-4b23-9634-cf9b39af13ce


In [20]:
study.optimize(objective, n_trials=100)

[I 2025-10-04 14:01:57,860] Trial 0 finished with value: 0.33333333333333337 and parameters: {'n_estimators': 437, 'max_depth': 96, 'min_samples_split': 48, 'min_samples_leaf': 39}. Best is trial 0 with value: 0.33333333333333337.
[I 2025-10-04 14:01:59,529] Trial 1 finished with value: 0.33333333333333337 and parameters: {'n_estimators': 240, 'max_depth': 24, 'min_samples_split': 5, 'min_samples_leaf': 56}. Best is trial 0 with value: 0.33333333333333337.
[I 2025-10-04 14:02:04,959] Trial 2 finished with value: 0.33333333333333337 and parameters: {'n_estimators': 641, 'max_depth': 74, 'min_samples_split': 3, 'min_samples_leaf': 63}. Best is trial 0 with value: 0.33333333333333337.
[I 2025-10-04 14:02:10,873] Trial 3 finished with value: 0.95 and parameters: {'n_estimators': 850, 'max_depth': 29, 'min_samples_split': 13, 'min_samples_leaf': 12}. Best is trial 3 with value: 0.95.
[I 2025-10-04 14:02:13,821] Trial 4 finished with value: 0.9416666666666667 and parameters: {'n_estimators':

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [21]:
study.best_params

{'n_estimators': 877,
 'max_depth': 66,
 'min_samples_split': 22,
 'min_samples_leaf': 5}

Berikut hasil hyperparameter tuning dari Optuna

In [23]:
# cek hasil hyperparameter tuning dari Optuna
best_params = study.best_params

# Random Forest Using Optuna

In [25]:
# simpan hasil best hyperparameter tuning ke variabel bari
best_n_estimators = best_params['n_estimators']
best_max_depth = best_params['max_depth']
best_min_samples_split = best_params['min_samples_split']
best_min_samples_leaf = best_params['min_samples_leaf']

In [26]:
from sklearn.ensemble import RandomForestClassifier
best_model = RandomForestClassifier(
    n_estimators=best_n_estimators,
    max_depth=best_max_depth,
    min_samples_split=best_min_samples_split,
    min_samples_leaf=best_min_samples_leaf,
    random_state=42)

best_model.fit(X_train, y_train)

In [27]:
y_pred = best_model.predict(X_test)

In [28]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 0.967
Precision: 0.955
Recall: 0.950
F1 Score: 0.950
              precision    recall  f1-score   support

           0       0.91      1.00      0.95        10
           1       1.00      0.90      0.95        10

   micro avg       0.95      0.95      0.95        20
   macro avg       0.95      0.95      0.95        20
weighted avg       0.95      0.95      0.95        20
 samples avg       0.63      0.63      0.63        20



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus