# [Hyperparameter Optimization (HPO) of Machine Learning Models](https://github.com/LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithms)
L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020, doi: https://doi.org/10.1016/j.neucom.2020.07.061.

### **회귀 문제를 위한 예제 코드**
**Dataset used:**
&nbsp; California Housing dataset from kaggle

**Machine learning algorithms used:**
&nbsp; Random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), artificial neural network (ANN)

**HPO algorithms used:**
&nbsp; Grid search, random search, Successive Halving, Bayesian Optimization with Gaussian Processes (BO-GP)

**Performance metric:**
&nbsp; Mean square error (MSE)

__사내 PC 실습 안내__ 
- 본 실습은 외부 라이브러리를 설치하는 내용이 포함되어 있습니다.
- 사내 PC에서 외부 라이브러리 설치 시, SSL 인증 오류가 발생할 수 있습니다.
- 아래 코드와 같이, pypi 서버 신뢰 옵션을 추가하여 SSL 인증을 생략합니다.
- 따라서, 내용은 영상 강의와 다를 수 있습니다.

```cmd
pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org -U [설치할 패키지 명]
```


In [None]:
# !pip install scikit-optimize mljar-supervised pycaret
!pip uninstall pandas numpy -y
!pip install --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --trusted-host pypi.org -U pycaret scikit-optimize mljar-supervised

__사내 PC 실습 안내__ 
- 본 실습은 외부 데이터를 로드하는 내용이 포함되어 있습니다.
- 사내 PC에서 외부 데이터 로드 시, SSL 인증 오류가 발생할 수 있습니다.
- 아래 코드를 추가하여, SSL 인증을 생략합니다.
- 따라서, 내용은 영상 강의와 다를 수 있습니다.

```cmd
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
```

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.ensemble import RandomForestClassifier,RandomForestRegressor
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score
from sklearn.neighbors import KNeighborsClassifier,KNeighborsRegressor
from sklearn.svm import SVC,SVR
from sklearn.preprocessing import OneHotEncoder
from sklearn import datasets
import scipy.stats as stats

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

## California Housing dataset 불러오기
실습은 블록 단위로 기록되어있는 California Housing dataset을 활용하여 진행합니다.

20640 개의 row와 10 개의 column이 있으며 데이터를 활용하여 집 값을 예측하는 것이 목표입니다.


In [None]:
raw = pd.read_csv("https://raw.githubusercontent.com/JiByungKyu/dataset/main/california_housing/housing.csv")
raw= raw.sample(n=1000,ignore_index=True,random_state=42)
raw.info()

### 데이터 전처리

In [None]:
raw['median_house_value']/=1000
enc = OneHotEncoder(handle_unknown='ignore',sparse_output=False)
not_number_df = pd.DataFrame(enc.fit_transform(raw.select_dtypes(exclude='number')), columns=list(*enc.categories_))
df = pd.concat([raw.select_dtypes(include='number'),not_number_df],axis=1,)

In [None]:
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=10)
df = pd.DataFrame(imputer.fit_transform(df),columns = df.columns)
df.info()

In [None]:
X,y = df[list(set(df.columns) - set(['median_house_value']))], df['median_house_value']

## Baseline Machine Learning Models: Regressors with Default Hyperparameters

### Manual Search Using 3-Fold Cross-Validation

In [None]:
%%time
#Random Forest
n_estimators = 1000
clf = RandomForestRegressor(n_estimators=n_estimators)
scores = cross_val_score(clf, X, y, cv=3,scoring='neg_mean_squared_error') # 3-fold cross-validation
print("MSE:"+ str(-scores.mean()))

In [None]:
%%time
#SVM
C=5.0
kernel='rbf'
clf = SVR(C=C,kernel=kernel)
scores = cross_val_score(clf, X, y, cv=3,scoring='neg_mean_squared_error')
print("MSE:"+ str(-scores.mean()))

In [None]:
%%time
#KNN
n_neighbors=5
clf = KNeighborsRegressor(n_neighbors=n_neighbors)
scores = cross_val_score(clf, X, y, cv=3,scoring='neg_mean_squared_error')
print("MSE:"+ str(-scores.mean()))

In [None]:
%%time
#ANN
from keras.models import Sequential, Model
from keras.layers import Dense, Input
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor
from keras.callbacks import EarlyStopping
def ANN(optimizer = 'adam',neurons=32,batch_size=32,epochs=20,activation='relu',patience=5,loss='mse'):
    model = Sequential()
    model.add(Dense(neurons, input_shape=(X.shape[1],), activation=activation))
    model.add(Dense(neurons, activation=activation))
    model.add(Dense(1))
    model.compile(optimizer = optimizer, loss=loss)
    early_stopping = EarlyStopping(monitor="loss", patience = patience)# early stop patience
    history = model.fit(X, y,
              batch_size=batch_size,
              epochs=epochs,
              callbacks = [early_stopping],
              verbose=0) #verbose set to 1 will show the training process
    return model

In [None]:
%%time
clf = KerasRegressor(build_fn=ANN,verbose=0)
scores = cross_val_score(clf, X, y, cv=3,scoring='neg_mean_squared_error')
print("MSE:"+ str(-scores.mean()))

## HPO Algorithm 1: Grid Search
모든 하이퍼파라미터 설정에 대해 탐색

**Advantages:**
* 쉬운 구현.

**Disadvantages:**
* 시간을 많이 소모,
* Categorical HPs에만 효율적임.

In [None]:
%%time
#Random Forest
from sklearn.model_selection import GridSearchCV
# Define the hyperparameter configuration space
rf_params = {
    'n_estimators': [10, 20, 30,1000],
    #'max_features': ['sqrt',0.5],
    'max_depth': [15,20,30,50],
    #'min_samples_leaf': [1,2,4,8],
    #"bootstrap":[True,False],
    #"criterion":['squared_error','absolute_error']
}
clf = RandomForestRegressor(random_state=0)
grid = GridSearchCV(clf, rf_params, cv=3, scoring='neg_mean_squared_error')
grid.fit(X, y)
print(grid.best_params_)
print("MSE:"+ str(-grid.best_score_))

In [None]:
%%time
#SVM
from sklearn.model_selection import GridSearchCV
rf_params = {
    'C': [1,10, 100],
    "kernel":['poly','rbf','sigmoid'],
    "epsilon":[0.01,0.1,1]
}
clf = SVR(gamma='scale')
grid = GridSearchCV(clf, rf_params, cv=3, scoring='neg_mean_squared_error')
grid.fit(X, y)
print(grid.best_params_)
print("MSE:"+ str(-grid.best_score_))

In [None]:
%%time
#KNN
from sklearn.model_selection import GridSearchCV
rf_params = {
    'n_neighbors': [2, 3, 5, 7, 10]
}
clf = KNeighborsRegressor()
grid = GridSearchCV(clf, rf_params, cv=3, scoring='neg_mean_squared_error')
grid.fit(X, y)
print(grid.best_params_)
print("MSE:"+ str(-grid.best_score_))

In [None]:
%%time
#ANN
from sklearn.model_selection import GridSearchCV
rf_params = {
    'optimizer': ['adam','rmsprop'],
    'activation': ['relu','tanh'],
    'loss': ['mse','mae'],
    'batch_size': [16,32],
    'neurons':[16,32],
    #'epochs':[20,50],
    'patience':[2,5]
}
clf = KerasRegressor(build_fn=ANN, verbose=0)
grid = GridSearchCV(clf, rf_params, cv=3,scoring='neg_mean_squared_error')
grid.fit(X, y)
print(grid.best_params_)
print("MSE:"+ str(-grid.best_score_))

## HPO Algorithm 2: Random Search
주어진 Search Space에서 하이퍼파라미터 random 탐색

**Advantages:**
* Grid Search 보다 효율적인 탐색.
* 병렬 처리가 가능함.

**Disadvantages:**
* 이전 결과를 활용하지 않음.
* 조건부 HPs에 비효율적.

In [None]:
%%time
#Random Forest
from scipy.stats import randint as sp_randint
from sklearn.model_selection import RandomizedSearchCV
# Define the hyperparameter configuration space
rf_params = {
    'n_estimators': sp_randint(10,100),
    "max_features":sp_randint(1,13),
    'max_depth': sp_randint(5,50),
    "min_samples_split":sp_randint(2,11),
    "min_samples_leaf":sp_randint(1,11),
    "criterion":['squared_error','absolute_error']
}
n_iter_search=20 #number of iterations is set to 20, you can increase this number if time permits
clf = RandomForestRegressor(random_state=0)
Random = RandomizedSearchCV(clf, param_distributions=rf_params,n_iter=n_iter_search,cv=3,scoring='neg_mean_squared_error')
Random.fit(X, y)
print(Random.best_params_)
print("MSE:"+ str(-Random.best_score_))

In [None]:
%%time
#SVM
from scipy.stats import randint as sp_randint
from sklearn.model_selection import RandomizedSearchCV
rf_params = {
    'C': stats.uniform(0,50),
    "kernel":['poly','rbf','sigmoid'],
    "epsilon":stats.uniform(0,1)
}
n_iter_search=20
clf = SVR(gamma='scale')
Random = RandomizedSearchCV(clf, param_distributions=rf_params,n_iter=n_iter_search,cv=3,scoring='neg_mean_squared_error')
Random.fit(X, y)
print(Random.best_params_)
print("MSE:"+ str(-Random.best_score_))

In [None]:
%%time
#KNN
from scipy.stats import randint as sp_randint
from sklearn.model_selection import RandomizedSearchCV
rf_params = {
    'n_neighbors': sp_randint(1,20),
}
n_iter_search=10
clf = KNeighborsRegressor()
Random = RandomizedSearchCV(clf, param_distributions=rf_params,n_iter=n_iter_search,cv=3,scoring='neg_mean_squared_error')
Random.fit(X, y)
print(Random.best_params_)
print("MSE:"+ str(-Random.best_score_))

In [None]:
%%time
#ANN
from scipy.stats import randint as sp_randint
from random import randrange as sp_randrange
from sklearn.model_selection import RandomizedSearchCV
rf_params = {
    'optimizer': ['adam','rmsprop'],
    'activation': ['relu','tanh'],
    'loss': ['mse','mae'],
    'batch_size': [16,32,64],
    'neurons':sp_randint(10,100),
    'epochs':[20,50],
    #'epochs':[20,50,100,200],
    'patience':sp_randint(3,20)
}
n_iter_search=10
clf = KerasRegressor(build_fn=ANN, verbose=0)
Random = RandomizedSearchCV(clf, param_distributions=rf_params,n_iter=n_iter_search,cv=3,scoring='neg_mean_squared_error')
Random.fit(X, y)
print(Random.best_params_)
print("MSE:"+ str(-Random.best_score_))

## HPO Algorithm 3: Successive Halving
Generate small-sized subsets and allocate budgets to each hyper-parameter combination based on its performance

**Advantages:**
* 병렬 처리가 가능함.

**Disadvantages:**
* 조건부 HPs에 비효율적.

In [None]:
%%time
#Random Forest
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingRandomSearchCV
from scipy.stats import randint as sp_randint
# Define the hyperparameter configuration space
rf_params = {
    'n_estimators': sp_randint(10,100),
    "max_features":sp_randint(1,13),
    'max_depth': sp_randint(5,50),
    "min_samples_split":sp_randint(2,11),
    "min_samples_leaf":sp_randint(1,11),
    "criterion":['squared_error','absolute_error']
}
clf = RandomForestRegressor(random_state=0)
#hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,cv=3,min_iter=10,max_iter=100,scoring='neg_mean_squared_error')
hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,cv=3,scoring='neg_mean_squared_error')
hyper.fit(X, y)
print(hyper.best_params_)
print("MSE:"+ str(-hyper.best_score_))

In [None]:
%%time
#SVM
from scipy.stats import randint as sp_randint
rf_params = {
    'C': stats.uniform(0,50),
    "kernel":['poly','rbf','sigmoid'],
    "epsilon":stats.uniform(0,1)
}
clf = SVR(gamma='scale')
#hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,cv=3,min_iter=1,max_iter=10,scoring='neg_mean_squared_error',resource_param='C')
hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,cv=3,scoring='neg_mean_squared_error')
hyper.fit(X, y)
print(hyper.best_params_)
print("MSE:"+ str(-hyper.best_score_))

In [None]:
#KNN
from scipy.stats import randint as sp_randint
rf_params = {
    'n_neighbors': range(1,20),
}
clf = KNeighborsRegressor()
#hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,cv=3,min_iter=1,max_iter=20,scoring='neg_mean_squared_error',resource_param='n_neighbors')
hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,cv=3, min_resources =200,scoring='neg_mean_squared_error')
hyper.fit(X, y)
print(hyper.best_params_)
print("MSE:"+ str(-hyper.best_score_))

In [None]:
%%time
#ANN
from scipy.stats import randint as sp_randint
rf_params = {
    'optimizer': ['adam','rmsprop'],
    'activation': ['relu','tanh'],
    'loss': ['mse','mae'],
    'batch_size': [16,32,64],
    'neurons':sp_randint(10,100),
    #'epochs':[20,50],
    #'epochs':[20,50,100,200],
    'patience':sp_randint(3,20)
}
clf = KerasRegressor(build_fn=ANN, epochs=20, verbose=0)
#hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,cv=3,min_iter=1,max_iter=10,scoring='neg_mean_squared_error',resource_param='epochs')
hyper = HalvingRandomSearchCV(clf, param_distributions =rf_params,min_resources =200,cv=3,scoring='neg_mean_squared_error')
hyper.fit(X, y)
print(hyper.best_params_)
print("MSE:"+ str(-hyper.best_score_))

## HPO Algorithm 4: BO-GP
Bayesian Optimization with Gaussian Process (BO-GP)

**Advantages:**
* continuous HPs 대해서 빠른 수렴.

**Disadvantages:**
* 병렬 처리가 어려움.
* 조건부 HPs에 비효율적.

### Using skopt.BayesSearchCV

In [None]:
%%time
#Random Forest
from skopt import Optimizer
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
# Define the hyperparameter configuration space
rf_params = {
    'n_estimators': Integer(10,100),
    "max_features":Integer(1,13),
    'max_depth': Integer(5,50),
    "min_samples_split":Integer(2,11),
    "min_samples_leaf":Integer(1,11),
    "criterion":['squared_error','absolute_error']
}
clf = RandomForestRegressor(random_state=0)
Bayes = BayesSearchCV(clf, rf_params,cv=3,n_iter=20, scoring='neg_mean_squared_error')
#number of iterations is set to 20, you can increase this number if time permits
Bayes.fit(X, y)
print(Bayes.best_params_)
bclf = Bayes.best_estimator_
print("MSE:"+ str(-Bayes.best_score_))

In [None]:
%%time
#SVM
from skopt import Optimizer
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
rf_params = {
    'C': Real(1e-6,50),
    "kernel":['poly','rbf','sigmoid'],
    'epsilon': Real(0,1)
}
clf = SVR(gamma='scale')
Bayes = BayesSearchCV(clf, rf_params,cv=3,n_iter=20, scoring='neg_mean_squared_error')
Bayes.fit(X, y)
print(Bayes.best_params_)
print("MSE:"+ str(-Bayes.best_score_))

In [None]:
%%time
#KNN
from skopt import Optimizer
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
rf_params = {
    'n_neighbors': Integer(1,20),
}
clf = KNeighborsRegressor()
Bayes = BayesSearchCV(clf, rf_params,cv=3,n_iter=10, scoring='neg_mean_squared_error')
Bayes.fit(X, y)
print(Bayes.best_params_)
print("MSE:"+ str(-Bayes.best_score_))

In [None]:
%%time
#ANN
from skopt import Optimizer
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
rf_params = {
    'optimizer': ['adam','rmsprop'],
    'activation': ['relu','tanh'],
    'loss': ['mse','mae'],
    'batch_size': [16,32,64],
    'neurons':Integer(10,100),
    #'epochs':[20,50],
    #'epochs':[20,50,100,200],
    'patience':Integer(3,20)
}
clf = KerasRegressor(build_fn=ANN, verbose=0)
Bayes = BayesSearchCV(clf, rf_params,cv=3,n_iter=10, scoring='neg_mean_squared_error')
Bayes.fit(X, y)
print(Bayes.best_params_)
print("MSE:"+ str(-Bayes.best_score_))

### Using skopt.gp_minimize

In [None]:
%%time
#Random Forest
from skopt.space import Real, Integer
from skopt.utils import use_named_args

reg = RandomForestRegressor()
# Define the hyperparameter configuration space
space  = [Integer(10, 100, name='n_estimators'),
            Integer(5, 50, name='max_depth'),
          Integer(1, 13, name='max_features'),
          Integer(2, 11, name='min_samples_split'),
          Integer(1, 11, name='min_samples_leaf'),
         Categorical(['squared_error', 'absolute_error'], name='criterion')
         ]
# Define the objective function
@use_named_args(space)
def objective(**params):
    reg.set_params(**params)

    return -np.mean(cross_val_score(reg, X, y, cv=3, n_jobs=-1,
                                    scoring="neg_mean_squared_error"))
from skopt import gp_minimize
res_gp = gp_minimize(objective, space, n_calls=20, random_state=0)
#number of iterations is set to 20, you can increase this number if time permits
print("MSE:%.4f" % res_gp.fun)
print(res_gp.x)

In [None]:
%%time
#SVM
from skopt.space import Real, Integer
from skopt.utils import use_named_args

reg = SVR(gamma='scale')
space  = [Real(1e-6,50, name='C'),
          Categorical(['poly','rbf','sigmoid'], name='kernel'),
          Real(0, 1, name='epsilon'),
         ]

@use_named_args(space)
def objective(**params):
    reg.set_params(**params)

    return -np.mean(cross_val_score(reg, X, y, cv=3, n_jobs=-1,
                                    scoring="neg_mean_squared_error"))
from skopt import gp_minimize
res_gp = gp_minimize(objective, space, n_calls=20, random_state=0)
print("MSE:%.4f" % res_gp.fun)
print(res_gp.x)

In [None]:
%%time
#KNN
from skopt.space import Real, Integer
from skopt.utils import use_named_args

reg = KNeighborsRegressor()
space  = [Integer(1, 20, name='n_neighbors')]

@use_named_args(space)
def objective(**params):
    reg.set_params(**params)

    return -np.mean(cross_val_score(reg, X, y, cv=3, n_jobs=-1,
                                    scoring="neg_mean_squared_error"))
from skopt import gp_minimize
res_gp = gp_minimize(objective, space, n_calls=10, random_state=0)
print("MSE:%.4f" % res_gp.fun)
print(res_gp.x)

# AutoML Libraries


## MLJAR-supervised

In [None]:
%%time
from supervised.automl import AutoML
automl = AutoML()
automl.fit(X, y,cv=3)

## pycaret

In [None]:
from pycaret.regression import RegressionExperiment
s = RegressionExperiment()
s.setup(X, target = y)

In [None]:
best = s.compare_models()

In [None]:
print(best)

In [None]:
s.evaluate_model(best)

In [None]:
s.plot_model(best, plot = 'feature')