Hyperparmeter is a parameter whose value is set before the training process begins.


The choice of a radient boosting model and the size of the hidden layer of a multilayer perceptron are examples of hyperparameters.

Hyperparameter selection is important because it can have a huge effect ont he model's performance.

The most basic approach for hyperparameter tuning is a grid search.

In this method you specify a range of potential values for each hyperparameter, and then try them all out until you find the best combination.

In this video you will learn how to use Bayesian optimization over hyperparameters using scikit-optimize.

In Bayesian optimization, not all parameter values are tried out, but rather a fized number of parameter settings is sampleed from a specified distributions.




In [49]:
#Load the wine dataset from scikit-learn

from sklearn import datasets

wine_dataset = datasets.load_wine()
X = wine_dataset.data
y = wine_dataset.target

In [50]:
# We need to import XGBoost and stratified K-fold:
import xgboost as xgb
from sklearn.model_selection import StratifiedKFold

In [51]:
!pip install scikit-optimize



In [52]:
# import BayesSerachCV from scikit-optimize and specify the number of parameter settings to test
from skopt import BayesSearchCV

n_iterations = 9

In [53]:
#specify the estimator. In this case we select XGBoost and set it to be able to perform multi-class classification
estimator = xgb.XGBClassifier(
    n_jobs=-1,
    objective="multi:softmax",
    eval_metric="merror",
    verbosity=0,
    num_class=len(set(y)),
)

In [54]:
# We need to specify a parameter search space
search_space = {
    "learning_rate": (0.01, 1.0, "log-uniform"),
    "min_child_weight": (0, 10),
    "max_depth": (1, 50),
    "max_delta_step": (0, 10),
    "subsample": (0.01, 1.0, "uniform"),
    "colsample_bytree": (0.01, 1.0, "log-uniform"),
    "colsample_bylevel": (0.01, 1.0, "log-uniform"),
    "reg_lambda": (1e-9, 1000, "log-uniform"),
    "reg_alpha": (1e-9, 1.0, "log-uniform"),
    "gamma": (1e-9, 0.5, "log-uniform"),
    "min_child_weight": (0, 5),
    "n_estimators": (5, 5000),
    "scale_pos_weight": (1e-6, 500, "log-uniform"),
}

In [55]:
# We need to specify the type of cross validation to perform
cv = StratifiedKFold(n_splits=3, shuffle=True)

In [56]:
# Define BayesSearchCV using the setting we have defined
bayes_cv_tuner = BayesSearchCV(
    estimator=estimator,
    search_spaces=search_space,
    scoring="accuracy",
    cv=cv,
    n_jobs=-1,
    n_iter=n_iterations,
    verbose=0,
    refit=True,
)

In [57]:
# We need to define a callback function to printout the progress of the parameter search.
import pandas as pd
import numpy as np


def print_status(optimal_result):
    """Shows the best parameters found and accuracy attained of the search so far."""
    models_tested = pd.DataFrame(bayes_cv_tuner.cv_results_)
    best_parameters_so_far = pd.Series(bayes_cv_tuner.best_params_)
    print(
        "Model #{}\nBest accuracy so far: {}\nBest parameters so far: {}\n".format(
            len(models_tested),
            np.round(bayes_cv_tuner.best_score_, 3),
            bayes_cv_tuner.best_params_,
        )
    )

    clf_type = bayes_cv_tuner.estimator.__class__.__name__
    models_tested.to_csv(clf_type + "_cv_results_summary.csv")

In [58]:
# Perform the parameter search
result = bayes_cv_tuner.fit(X, y, callback=print_status)

Model #1
Best accuracy so far: 0.966
Best parameters so far: OrderedDict([('colsample_bylevel', 0.022876613099071023), ('colsample_bytree', 0.3215617359264441), ('gamma', 5.4897801799682785e-08), ('learning_rate', 0.0282669721026536), ('max_delta_step', 6), ('max_depth', 7), ('min_child_weight', 4), ('n_estimators', 1704), ('reg_alpha', 0.3643003940715705), ('reg_lambda', 959), ('scale_pos_weight', 202), ('subsample', 0.5244594465916421)])

Model #2
Best accuracy so far: 0.978
Best parameters so far: OrderedDict([('colsample_bylevel', 0.176971047731701), ('colsample_bytree', 0.07429198119583591), ('gamma', 0.03255172716450493), ('learning_rate', 0.463579754987766), ('max_delta_step', 4), ('max_depth', 38), ('min_child_weight', 3), ('n_estimators', 4066), ('reg_alpha', 7.544523937853344e-07), ('reg_lambda', 46), ('scale_pos_weight', 225), ('subsample', 0.6843163117834636)])

Model #3
Best accuracy so far: 0.978
Best parameters so far: OrderedDict([('colsample_bylevel', 0.176971047731701



*   Step 1 and 2 we did import a standard dataset, the wine dataset as well as the libraries needed for classification. 

*   Step 3: we did specify how long we would like the search to be. **The longer the search, the better the results**
*   Step 4: we did select XGBoost as the model and specify the number of classes, the type of problem and the evaluation metric.


*   Step 5: we did specify a probability distribution over each parameter that we will be xploring. 
*   Step 6: we specify our corss-validation scheme. We did use stratified fold and for a regression problem stratified fold will be replaced with kFold.


*   Step 7: we can see additional setting that can be changed. n_jobs allows you to parallelize the task
*   Step 8: we did define a callback function to print out the progress.


*   Step 9: We did run the hyper parameter search.
