# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

- Grid
Metode mencoba semua kombinasi yang ada dari nilai yang sudah ditentukan. Tidak efektif untuk data yang banyak.
- Randomized
Metode mencoba dengan kombinasi acak dari nilai yang sudah ditentukan. Cukup efektif untuk data dengan dimensi tinggi.
- bayesian Search CV
Metode yang menggunakan probabilitas (Gaussian) untuk memprediksi kombinasi terbaik.
- Optuna
Metode dengan penerapan algoritma Tree-Structured Parzen Estimator (TPE) dan mirip dengan Bayesian.

source: https://www.youtube.com/watch?v=t-INgABWULw

# Import Data & Libraries

In [1]:
# jalankan hanya sekali
# !pip install optuna -q

In [96]:
# import library yang dibutuhkan di sini
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score, accuracy_score, precision_score, recall_score, f1_score, classification_report
import optuna
from sklearn.model_selection import cross_val_score

In [11]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [14]:
# ubah variabel kategorik ke numerik
# species = ['setosa', 'versicolor', 'virginica']
df['sp_setosa'] = df['species'].map({'setosa': 1, 'versicolor': 0, 'virginica': 0})
df['sp_versicolor'] = df['species'].map({'setosa': 0, 'versicolor': 1, 'virginica': 0})
df['sp_virginica'] = df['species'].map({'setosa': 0, 'versicolor': 0, 'virginica': 1})
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,sp_setosa,sp_versicolor,sp_virginica
0,5.1,3.5,1.4,0.2,setosa,1,0,0
1,4.9,3.0,1.4,0.2,setosa,1,0,0
2,4.7,3.2,1.3,0.2,setosa,1,0,0
3,4.6,3.1,1.5,0.2,setosa,1,0,0
4,5.0,3.6,1.4,0.2,setosa,1,0,0


In [43]:
# lebih baik mengganti kolom 'species' dengan masing2 tipe menjadi 1/2/3
df['species2'] = df['species'].map({'setosa': 1, 'versicolor': 2, 'virginica': 3})

In [38]:
# subsetting peubah
X = df.drop(['species', 'species2'], axis=1)
y = df['species2']
X.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,sp_setosa,sp_versicolor,sp_virginica
0,5.1,3.5,1.4,0.2,1,0,0
1,4.9,3.0,1.4,0.2,1,0,0
2,4.7,3.2,1.3,0.2,1,0,0
3,4.6,3.1,1.5,0.2,1,0,0
4,5.0,3.6,1.4,0.2,1,0,0


# Dataset Splitting

In [39]:
# split dengan rasio 80:20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

# Base Model Random Forest

In [64]:
# gunakan random forest classifier
rfr = RandomForestRegressor(random_state = 34)
rfr.fit(X_train, y_train)

RandomForestRegressor(random_state=34)

In [65]:
y_pred = rfr.predict(X_test)

In [66]:
mean_absolute_error(y_test, y_pred)

0.0

In [67]:
mean_squared_error(y_test, y_pred)

0.0

In [68]:
r2_score(y_test, y_pred)

1.0

In [69]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           1       1.00      1.00      1.00        10
           2       1.00      1.00      1.00         9
           3       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Optuna

In [71]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 10, 500)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 32)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 32)
    
    model = RandomForestRegressor(n_estimators = n_estimators,
                                  max_depth = max_depth,
                                  min_samples_split = min_samples_split,
                                  min_samples_leaf = min_samples_leaf)

    score = cross_val_score(model, X_train, y_train, cv = 5, scoring = 'neg_mean_squared_error')

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [72]:
study = optuna.create_study(direction = 'maximize', sampler = optuna.samplers.RandomSampler(seed=42))

[I 2025-10-04 13:31:58,015] A new study created in memory with name: no-name-d5cfea90-c275-4a45-a046-25f3596869bc


In [73]:
study.optimize(objective, n_trials=100)

[I 2025-10-04 13:32:10,446] Trial 0 finished with value: -0.04722780794382012 and parameters: {'n_estimators': 437, 'max_depth': 476, 'min_samples_split': 24, 'min_samples_leaf': 20}. Best is trial 0 with value: -0.04722780794382012.
[I 2025-10-04 13:32:12,473] Trial 1 finished with value: -0.17752700163021218 and parameters: {'n_estimators': 240, 'max_depth': 86, 'min_samples_split': 3, 'min_samples_leaf': 28}. Best is trial 0 with value: -0.04722780794382012.
[I 2025-10-04 13:32:17,385] Trial 2 finished with value: -0.5342737503450835 and parameters: {'n_estimators': 641, 'max_depth': 357, 'min_samples_split': 2, 'min_samples_leaf': 32}. Best is trial 0 with value: -0.04722780794382012.
[I 2025-10-04 13:32:24,146] Trial 3 finished with value: -0.0004390196078431368 and parameters: {'n_estimators': 850, 'max_depth': 114, 'min_samples_split': 7, 'min_samples_leaf': 6}. Best is trial 3 with value: -0.0004390196078431368.
[I 2025-10-04 13:32:27,288] Trial 4 finished with value: -0.000374

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [88]:
best_params = study.best_params
best_params

{'n_estimators': 106,
 'max_depth': 260,
 'min_samples_split': 14,
 'min_samples_leaf': 8}

In [86]:
# optuna.visualization.plot_optimization_history(study)
# optuna.visualization.plot_parallel_coordinate(study)
# optuna.visualization.plot_slice(study, params=['n_estimators', 'max_depth', min_samples_split', 'min_samples_leaf'])
# optuna.visualization.plot_param_importances(study)

Berikut hasil hyperparameter tuning dari Optuna

In [93]:
# cek hasil hyperparameter tuning dari Optuna
print(f"Best Hyperparameters: {best_params}")

Best Hyperparameters: {'n_estimators': 106, 'max_depth': 260, 'min_samples_split': 14, 'min_samples_leaf': 8}


# Random Forest Using Optuna

In [95]:
# simpan hasil best hyperparameter tuning ke variabel bari
best_n_estimators = best_params['n_estimators']
best_max_depth = best_params['max_depth']
best_min_samples_split = best_params['min_samples_split']
best_min_samples_leaf = best_params['min_samples_leaf']

In [90]:
best_model = RandomForestRegressor(n_estimators = best_n_estimators,
                                   max_depth = best_max_depth,
                                   min_samples_split = best_min_samples_split,
                                   min_samples_leaf = best_min_samples_leaf)

best_model.fit(X_train, y_train)

RandomForestRegressor(max_depth=260, min_samples_leaf=8, min_samples_split=14,
                      n_estimators=106)

In [91]:
y_pred = best_model.predict(X_test)

In [92]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           1       1.00      1.00      1.00        10
           2       1.00      1.00      1.00         9
           3       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus