# **Hyperparameter Optimization**

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned.

The same kind of machine learning model can require different constraints, weights, or learning rates to generalize different data patterns. These measures are called hyperparameters and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data. The objective function takes a tuple of hyperparameters and returns the associated loss. Cross-validation is often used to estimate this generalization performance.

[Hyperparameter Optimization - Wikipedia](https://en.wikipedia.org/wiki/Hyperparameter_optimization)

In [None]:
# Import Library.
import pandas as pd
import numpy as np
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [None]:
data = pd.read_csv('diabetes.csv')  # Load Dataset.
data.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
data.info() # Dataset Summary.

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


# **Exploratory Data Analysis.**

In [None]:
sns.pairplot(data, hue = 'Outcome')

In [None]:
# Split the dataset into features and target value.
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

# Feature Scaling.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

# Split the dataset into training and test set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# **Using Random Forest Classifier.**

[sklearn.ensemble.RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier)


The main parameters used by a Random Forest Classifier are:


*   **criterion** = the function used to evaluate the quality of a split.
*   **max_depth** = maximum number of levels allowed in each tree.
*   **max_features** = maximum number of features considered when splitting a node.
*   **min_samples_leaf** = minimum number of samples which can be stored in a tree leaf.
*   **min_samples_split** = minimum number of samples necessary in a node to cause node splitting.
*   **n_estimators** = number of trees in the ensamble.

In [None]:
# Using Random Forest, with manual Hyperparameter Optimization.
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators = 300, criterion = 'entropy', max_features = 'sqrt', min_samples_leaf = 10, random_state = 100)
clf = clf.fit(X_train, y_train)

# Predicting the test set results.
y_pred = clf.predict(X_test)

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
print("Accuracy Score is ", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Accuracy Score is  0.78125
              precision    recall  f1-score   support

           0       0.79      0.92      0.85       130
           1       0.75      0.48      0.59        62

    accuracy                           0.78       192
   macro avg       0.77      0.70      0.72       192
weighted avg       0.78      0.78      0.77       192

[[120  10]
 [ 32  30]]


# **Grid Search**

[sklearn.model_selection.GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html)


The traditional way of performing hyperparameter optimization has been Grid Search, or a parameter sweep, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A Grid Search algorithm must be guided by some performance metric, typically measured by cross-validation on the training set or evaluation on a held-out validation set. Since the parameter space of a machine learner may include real-valued or unbounded value spaces for certain parameters, manually set bounds, and discretization may be necessary before applying grid search.

In [None]:
# Hyperparameter Optimization.

parameters = {
 "n_estimators"       : [100, 200, 300],   # Number of trees in Random Forest.
 "criterion"          : ['entropy','gini'],
 "max_depth"          : [None, 1, 3, 5],    # Maximum number of levels in the tree.
 "min_samples_split"  : [2, 3, 5],   # Minimum number of samples required to split a node.
 "max_features"       : ['auto', 'sqrt','log2'],  # Number of features to consider at every split.
 "min_samples_leaf"   : [1, 2, 4]  # Minimum number of samples required at each leaf node.
}

print(parameters)

{'n_estimators': [100, 200, 300], 'criterion': ['entropy', 'gini'], 'max_depth': [None, 1, 3, 5], 'min_samples_split': [2, 3, 5], 'max_features': ['auto', 'sqrt', 'log2'], 'min_samples_leaf': [1, 2, 4]}


In [None]:
from sklearn.model_selection import GridSearchCV
clf = RandomForestClassifier()
grid_search = GridSearchCV(estimator = clf, param_grid = parameters, cv = 10, n_jobs = -1, verbose = 2)
grid_search = grid_search.fit(X_train, y_train)

Fitting 10 folds for each of 648 candidates, totalling 6480 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   12.0s
[Parallel(n_jobs=-1)]: Done 158 tasks      | elapsed:   50.7s
[Parallel(n_jobs=-1)]: Done 361 tasks      | elapsed:  1.9min
[Parallel(n_jobs=-1)]: Done 644 tasks      | elapsed:  3.5min
[Parallel(n_jobs=-1)]: Done 1009 tasks      | elapsed:  5.2min
[Parallel(n_jobs=-1)]: Done 1454 tasks      | elapsed:  7.0min
[Parallel(n_jobs=-1)]: Done 1981 tasks      | elapsed:  9.2min
[Parallel(n_jobs=-1)]: Done 2588 tasks      | elapsed: 11.9min
[Parallel(n_jobs=-1)]: Done 3277 tasks      | elapsed: 15.3min
[Parallel(n_jobs=-1)]: Done 4046 tasks      | elapsed: 19.0min
[Parallel(n_jobs=-1)]: Done 4897 tasks      | elapsed: 22.3min
[Parallel(n_jobs=-1)]: Done 5828 tasks      | elapsed: 26.3min
[Parallel(n_jobs=-1)]: Done 6480 out of 6480 | elapsed: 29.2min finished


In [None]:
best_grid = grid_search.best_estimator_
print(grid_search.best_estimator_)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=4, min_samples_split=5,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)


In [None]:
y_pred = best_grid.predict(X_test)  # Predicting the test set results.

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
print("Accuracy Score {}".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Accuracy Score 0.796875
              precision    recall  f1-score   support

           0       0.81      0.91      0.86       130
           1       0.74      0.56      0.64        62

    accuracy                           0.80       192
   macro avg       0.78      0.74      0.75       192
weighted avg       0.79      0.80      0.79       192

[[118  12]
 [ 27  35]]


# **Random Search**

[sklearn.model_selection.RandomizedSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html)


Random Search replaces the exhaustive enumeration of all combinations by selecting them randomly. This can be simply applied to the discrete setting described above, but also generalizes to continuous and mixed spaces. It can outperform Grid Search, especially when only a small number of hyperparameters affects the final performance of the machine learning algorithm [[1]](https://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf). In this case, the optimization problem is said to have a low intrinsic dimensionality. Random Search is also embarrassingly parallel and additionally allows the inclusion of prior knowledge by specifying the distribution from which to sample.

In [None]:
# Hyperparameter Optimization.

# Number of trees in Random Forest.
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 2000, num = 10)]
# Number of features to consider at every split.
max_features = ['auto', 'sqrt', 'log2']
# Maximum number of levels in the tree.
max_depth = [int(x) for x in np.linspace(10, 1000, 10)]
# Minimum number of samples required to split a node.
min_samples_split = [2, 3, 5, 7, 10, 14]
# Minimum number of samples required at each leaf node.
min_samples_leaf = [1, 2, 3, 4, 6, 7, 9]

# Create the Random Grid.
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
              'criterion':['entropy','gini']}

print(random_grid)

{'n_estimators': [100, 311, 522, 733, 944, 1155, 1366, 1577, 1788, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [2, 3, 5, 7, 10, 14], 'min_samples_leaf': [1, 2, 3, 4, 6, 7, 9], 'criterion': ['entropy', 'gini']}


In [None]:
from sklearn.model_selection import RandomizedSearchCV
clf = RandomForestClassifier()
random_search = RandomizedSearchCV(estimator = clf, param_distributions = random_grid, n_iter = 100, cv = 10, 
                                   verbose = 2, random_state = 100, n_jobs = -1)
random_search = random_search.fit(X_train, y_train)

Fitting 10 folds for each of 100 candidates, totalling 1000 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed:   32.2s
[Parallel(n_jobs=-1)]: Done 158 tasks      | elapsed:  3.6min
[Parallel(n_jobs=-1)]: Done 361 tasks      | elapsed:  8.5min
[Parallel(n_jobs=-1)]: Done 644 tasks      | elapsed: 15.8min
[Parallel(n_jobs=-1)]: Done 1000 out of 1000 | elapsed: 25.0min finished


In [None]:
best_random_grid = random_search.best_estimator_
print(random_search.best_estimator_)

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=10, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=7, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)


In [None]:
y_pred = best_random_grid.predict(X_test)  # Predicting the test set results.

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
print("Accuracy Score {}".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Accuracy Score 0.7916666666666666
              precision    recall  f1-score   support

           0       0.80      0.92      0.86       130
           1       0.76      0.52      0.62        62

    accuracy                           0.79       192
   macro avg       0.78      0.72      0.74       192
weighted avg       0.79      0.79      0.78       192

[[120  10]
 [ 30  32]]


The grid search approach is suitable when we are exploring relatively few combinations, like in the previous example, but when the hyperparameter search space is large enough, it is often preferable to use *RandomizedSearchCV* instead. This class can be used in much the same way as the *GridSearchCV* class, but instead of trying out all possible combinations, it evaluates a given number of random combinations by selecting a random value for each hyperparameter at every iteration. This approach has two main benefits:

*   If we let the randomized search run for, say, 1,000 iterations, this approach will explore 1,000 different values for each hyperparameter (instead of just a few values per hyperparameter with the grid search approach).
*   By setting the number of iterations, we have more control over the computing budget we want to allocate to hyperparameter search.



# **Automated Hyperparameter Tuning**

1.   [Bayesian Optimization](https://en.wikipedia.org/wiki/Bayesian_optimization)
2.   [Gradient-based Optimization](http://adl.stanford.edu/aa222/Lecture_Notes_files/AA222-Lecture3.pdf)
3.   [Evolutionary Optimization](https://en.wikipedia.org/wiki/Evolutionary_algorithm)

# **Bayesian Optimization**

**[Bayesian Optimization](https://github.com/fmfn/BayesianOptimization)** is a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian Optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run.

[Hyperopt: Distributed Hyperparameter Optimization](https://github.com/hyperopt/hyperopt)

In [None]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score

In [None]:
space = {'criterion': hp.choice('criterion', ['entropy', 'gini']),
        'max_depth': hp.quniform('max_depth', 10, 1200, 10),
        'max_features': hp.choice('max_features', ['auto', 'sqrt','log2', None]),
        'min_samples_leaf': hp.uniform('min_samples_leaf', 0, 0.5),
        'min_samples_split' : hp.uniform ('min_samples_split', 0, 1),
        'n_estimators' : hp.choice('n_estimators', [10, 50, 300, 750, 1200,1300,1500])}

space

{'criterion': <hyperopt.pyll.base.Apply at 0x7fad40e1cad0>,
 'max_depth': <hyperopt.pyll.base.Apply at 0x7fad40e1cd10>,
 'max_features': <hyperopt.pyll.base.Apply at 0x7fad40e1ce50>,
 'min_samples_leaf': <hyperopt.pyll.base.Apply at 0x7fad40e1a150>,
 'min_samples_split': <hyperopt.pyll.base.Apply at 0x7fad40e1a2d0>,
 'n_estimators': <hyperopt.pyll.base.Apply at 0x7fad40e07f10>}

In [None]:
def objective(space):
    model = RandomForestClassifier(criterion = space['criterion'], max_depth = space['max_depth'],
                                  max_features = space['max_features'],
                                  min_samples_leaf = space['min_samples_leaf'],
                                  min_samples_split = space['min_samples_split'],
                                  n_estimators = space['n_estimators'], 
                                  )
    
    accuracy = cross_val_score(model, X_train, y_train, cv = 10).mean()

    # We aim to maximize accuracy, therefore we return it as a negative value.
    return {'loss': -accuracy, 'status': STATUS_OK }

In [None]:
trials = Trials()
best = fmin(fn = objective, space = space, algo = tpe.suggest, max_evals = 80, trials = trials)
best

100%|██████████| 80/80 [14:56<00:00, 11.20s/it, best loss: -0.7672413793103448]


{'criterion': 1,
 'max_depth': 630.0,
 'max_features': 2,
 'min_samples_leaf': 0.07711270254887398,
 'min_samples_split': 0.14180659281480618,
 'n_estimators': 6}

In [None]:
crit = {0: 'entropy', 1: 'gini'}
feat = {0: 'auto', 1: 'sqrt', 2: 'log2', 3: None}
est = {0: 10, 1: 50, 2: 300, 3: 750, 4: 1200, 5: 1300, 6: 1500}

print(crit[best['criterion']])
print(feat[best['max_features']])
print(est[best['n_estimators']])
print(best['min_samples_leaf'])

gini
log2
1500
0.07711270254887398


In [None]:
trainedforest = RandomForestClassifier(criterion = crit[best['criterion']], max_depth = best['max_depth'], max_features = feat[best['max_features']], 
                                       min_samples_leaf = best['min_samples_leaf'], min_samples_split = best['min_samples_split'], 
                                       n_estimators = est[best['n_estimators']]).fit(X_train,y_train)

In [None]:
y_pred = trainedforest.predict(X_test)  # Predicting the test set results.

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
print("Accuracy Score {}".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Accuracy Score 0.7708333333333334
              precision    recall  f1-score   support

           0       0.79      0.91      0.84       130
           1       0.71      0.48      0.58        62

    accuracy                           0.77       192
   macro avg       0.75      0.70      0.71       192
weighted avg       0.76      0.77      0.76       192

[[118  12]
 [ 32  30]]


# **TPOT - Automated Machine Learning for Supervised Classification Tasks.**

[TPOTClassifier](http://epistasislab.github.io/tpot/api/)

In [None]:
# Hyperparameter Optimization.

# Number of trees in Random Forest.
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 2000, num = 10)]
# Number of features to consider at every split.
max_features = ['auto', 'sqrt', 'log2']
# Maximum number of levels in the tree.
max_depth = [int(x) for x in np.linspace(10, 1000, 10)]
# Minimum number of samples required to split a node.
min_samples_split = [2, 3, 5, 7, 10, 14]
# Minimum number of samples required at each leaf node.
min_samples_leaf = [1, 2, 3, 4, 6, 7, 9]

# Create the Random Grid.
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
              'criterion':['entropy','gini']}

print(random_grid)

{'n_estimators': [100, 311, 522, 733, 944, 1155, 1366, 1577, 1788, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [2, 3, 5, 7, 10, 14], 'min_samples_leaf': [1, 2, 3, 4, 6, 7, 9], 'criterion': ['entropy', 'gini']}


In [None]:
!pip install tpot

In [None]:
from tpot import TPOTClassifier

tpot_classifier = TPOTClassifier(generations= 5, population_size = 24, offspring_size = 12,
                                 verbosity = 2, early_stop = 12, 
                                 config_dict = {'sklearn.ensemble.RandomForestClassifier': random_grid}, 
                                 cv = 10, scoring = 'accuracy').fit(X_train,y_train)

accuracy = tpot_classifier.score(X_test, y_test)
print("Accuracy is", accuracy)

HBox(children=(FloatProgress(value=0.0, description='Optimization Progress', max=84.0, style=ProgressStyle(des…


Generation 1 - Current best internal CV score: 0.7655172413793104

Generation 2 - Current best internal CV score: 0.7655172413793104

Generation 3 - Current best internal CV score: 0.7655172413793104

Generation 4 - Current best internal CV score: 0.7655172413793104

Generation 5 - Current best internal CV score: 0.7655172413793104

Best pipeline: RandomForestClassifier(input_matrix, criterion=gini, max_depth=1000, max_features=auto, min_samples_leaf=4, min_samples_split=7, n_estimators=100)
Accuracy is 0.7864583333333334


# **Optimize hyperparameters of the Model using Optuna**

[Optuna: Automate Hyperparameter Tuning](https://optuna.org/)

In [None]:
!pip install optuna

The hyperparameters of the above algorithm are n_estimators and max_depth for which we can try different values to see if the model accuracy can be improved. The objective function is modified to accept a trial object. This trial has several methods for sampling hyperparameters. We create a study to run the hyperparameter optimization and finally read the best hyperparameters.


In [None]:
import optuna
import sklearn.svm
def objective(trial):
    classifier = trial.suggest_categorical('classifier', ['RandomForest', 'SVC'])
    if classifier == 'RandomForest':
        n_estimators = trial.suggest_int('n_estimators', 200, 2000, 10)
        max_depth = int(trial.suggest_float('max_depth', 10, 100, log=True))
        clf = sklearn.ensemble.RandomForestClassifier(n_estimators = n_estimators, max_depth = max_depth)
    else:
        c = trial.suggest_float('svc_c', 1e-10, 1e10, log=True)
        clf = sklearn.svm.SVC(C=c, gamma='auto')
    return sklearn.model_selection.cross_val_score(
        clf, X_train, y_train, n_jobs=-1, cv=10).mean()

In [None]:
study = optuna.create_study(direction = 'maximize')
study.optimize(objective, n_trials=100)

trial = study.best_trial

print('Accuracy: {}'.format(trial.value))
print("Best Hyperparameters: {}".format(trial.params))

In [None]:
print(trial)
print(study.best_params)

FrozenTrial(number=40, values=[0.7602540834845736], datetime_start=datetime.datetime(2021, 6, 4, 8, 31, 35, 923379), datetime_complete=datetime.datetime(2021, 6, 4, 8, 31, 45, 869130), params={'classifier': 'RandomForest', 'n_estimators': 680, 'max_depth': 31.692229644149506}, distributions={'classifier': CategoricalDistribution(choices=('RandomForest', 'SVC')), 'n_estimators': IntUniformDistribution(high=2000, low=200, step=10), 'max_depth': LogUniformDistribution(high=100.0, low=10.0)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=40, state=TrialState.COMPLETE, value=None)
{'classifier': 'RandomForest', 'n_estimators': 680, 'max_depth': 31.692229644149506}


In [None]:
clf = RandomForestClassifier(n_estimators=330, max_depth=30).fit(X_train, y_train)

In [None]:
y_pred = clf.predict(X_test)  # Predicting the test set results.

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
print("Accuracy Score {}".format(accuracy_score(y_test, y_pred)))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Accuracy Score 0.7864583333333334
              precision    recall  f1-score   support

           0       0.82      0.88      0.85       130
           1       0.71      0.58      0.64        62

    accuracy                           0.79       192
   macro avg       0.76      0.73      0.74       192
weighted avg       0.78      0.79      0.78       192

[[115  15]
 [ 26  36]]


# **[Tuning a scikit-learn estimator with Skopt](https://scikit-optimize.github.io/stable/auto_examples/hyperparameter-optimization.html)**