## René Parlange, MSc
### 📚 Machine Learning Course, PhD in Computer Science
#### 🎓 Instructor: Juan Carlos Cuevas Tello, PhD
#### 🏛 Universidad Autónoma de San Luis Potosí (UASLP)

🔗 [GitHub Repository](https://github.com/parlange/ml-notebooks)

# scikit-learn BernoulliRBM with Optuna hyperparameter search


## install optuna with pip and ! (terminal command) in Colab

In [None]:
!pip install optuna

Collecting optuna
  Downloading optuna-3.4.0-py3-none-any.whl (409 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m409.6/409.6 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.12.0-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.0/226.0 kB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorlog (from optuna)
  Downloading colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.2.4-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, alembic, optuna
Successfully installed Mako-1.2.4 alembic-1.12.0 colorlog-6.7.0 optuna-3.4.0


## dataset: iris

In [None]:
import numpy as np
import optuna
import time
from sklearn.neural_network import BernoulliRBM
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
from sklearn.preprocessing import minmax_scale
from sklearn.model_selection import train_test_split, cross_val_score

# Load the IRIS dataset
iris = load_iris()

# Features
X = iris.data
# Targets
Y = iris.target

# Min-max scaling
X_scaled = minmax_scale(X, feature_range=(0, 1))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, Y, test_size=0.2, random_state=42)

def objective(trial):
    n_components = trial.suggest_int("n_components", 50, 200, step=50)
    learning_rate = trial.suggest_float("learning_rate", 0.01, 0.1, log=True)
    batch_size = trial.suggest_int("batch_size", 1, 20)
    n_iter = trial.suggest_int("n_iter", 10, 40, step=10)

    rbm = BernoulliRBM(n_components=n_components, learning_rate=learning_rate, batch_size=batch_size, n_iter=n_iter, random_state=42, verbose=False)
    classifier = LogisticRegression(max_iter=1000)

    pipeline = Pipeline(steps=[('rbm', rbm), ('classifier', classifier)])

    return cross_val_score(pipeline, X_train, y_train, n_jobs=-1, cv=5).mean()

start_time_optuna = time.time()

# Optuna study
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)

end_time_optuna = time.time()

print(f"Best trial score: {study.best_value}")
print(f"Best trial params: {study.best_params}")

# Train with best hyperparameters
best_params = study.best_params
rbm_best = BernoulliRBM(n_components=best_params["n_components"], learning_rate=best_params["learning_rate"], batch_size=best_params["batch_size"], n_iter=best_params["n_iter"], random_state=42)
classifier_best = LogisticRegression(max_iter=1000)

pipeline_best = Pipeline(steps=[('rbm', rbm_best), ('classifier', classifier_best)])
pipeline_best.fit(X_train, y_train)

# Evaluate the best model on the test set
y_pred_best = pipeline_best.predict(X_test)
from sklearn.metrics import classification_report
print('\nClassification Report with Best Model:\n', classification_report(y_test, y_pred_best, zero_division=1))

print(f"\nOptuna optimization execution time: {end_time_optuna - start_time_optuna} seconds")

total_end_time = time.time()
print(f"\nTotal execution time: {total_end_time - start_time_optuna} seconds")

[I 2023-10-18 02:02:16,188] A new study created in memory with name: no-name-6d6993e4-ac40-4f7c-b199-ec3e8c038e5c
[I 2023-10-18 02:02:16,316] Trial 0 finished with value: 0.3916666666666666 and parameters: {'n_components': 100, 'learning_rate': 0.020601656878434963, 'batch_size': 9, 'n_iter': 40}. Best is trial 0 with value: 0.3916666666666666.
[I 2023-10-18 02:02:16,404] Trial 1 finished with value: 0.7833333333333333 and parameters: {'n_components': 200, 'learning_rate': 0.047451433356188594, 'batch_size': 19, 'n_iter': 20}. Best is trial 1 with value: 0.7833333333333333.
[I 2023-10-18 02:02:16,484] Trial 2 finished with value: 0.7333333333333333 and parameters: {'n_components': 100, 'learning_rate': 0.03389092752872026, 'batch_size': 19, 'n_iter': 30}. Best is trial 1 with value: 0.7833333333333333.
[I 2023-10-18 02:02:16,675] Trial 3 finished with value: 0.7583333333333334 and parameters: {'n_components': 200, 'learning_rate': 0.053336519197159926, 'batch_size': 1, 'n_iter': 10}. B

Best trial score: 0.8083333333333332
Best trial params: {'n_components': 200, 'learning_rate': 0.06737136093388048, 'batch_size': 19, 'n_iter': 20}

Classification Report with Best Model:
               precision    recall  f1-score   support

           0       0.91      1.00      0.95        10
           1       1.00      0.78      0.88         9
           2       0.92      1.00      0.96        11

    accuracy                           0.93        30
   macro avg       0.94      0.93      0.93        30
weighted avg       0.94      0.93      0.93        30


Optuna optimization execution time: 11.002410173416138 seconds

Total execution time: 11.053030729293823 seconds
