In machine learning, a hyperparameter is a parameter whose value is set before the training
process begins. For example, the choice of learning rate of a gradient boosting model and
the size of the hidden layer of a multilayer perceptron, are both examples of
hyperparameters. By contrast, the values of other parameters are derived via training.
Hyperparameter selection is important because it can have a huge effect on the model's
performance.
The most basic approach to hyperparameter tuning is called a grid search. In this method,
you specify a range of potential values for each hyperparameter, and then try them all out,
until you find the best combination. This brute-force approach is comprehensive but
computationally intensive. More sophisticated methods exist. In this recipe, you will learn
how to use Bayesian optimization over hyperparameters using scikit-optimize

In [1]:
!pip install scikit-learn==0.23.2



In [2]:
from sklearn import datasets

In [3]:
wine_datasets = datasets.load_wine()
X = wine_datasets.data
y = wine_datasets.target

In [4]:
import xgboost as xgb
from sklearn.model_selection import StratifiedKFold

from skopt import BayesSearchCV

In [None]:
n_iterations = 10


estimator = xgb.XGBClassifier(
    n_jobs=-1,
    objective="multi:softmax",
    eval_metric="merror",
    verbosity=0,
    num_class=len(set(y)),
)

In [None]:

search_space = {
    "learning_rate": (0.01, 1.0, "log-uniform"),
    "min_child_weight": (0, 10),
    "max_depth": (1, 50),
    "max_delta_step": (0, 10),
    "subsample": (0.01, 1.0, "uniform"),
    "colsample_bytree": (0.01, 1.0, "log-uniform"),
    "colsample_bylevel": (0.01, 1.0, "log-uniform"),
    "reg_lambda": (1e-9, 1000, "log-uniform"),
    "reg_alpha": (1e-9, 1.0, "log-uniform"),
    "gamma": (1e-9, 0.5, "log-uniform"),
    "min_child_weight": (0, 5),
    "n_estimators": (5, 5000),
    "scale_pos_weight": (1e-6, 500, "log-uniform"),
}

specifying the type of cross validation

In [None]:
cv = StratifiedKFold(n_splits=3, shuffle=True)

In [8]:

bayes_cv_tuner = BayesSearchCV(
    estimator=estimator,
    search_spaces=search_space,
    scoring="accuracy",
    cv=cv,
    n_jobs=-1,
    n_iter=n_iterations,
    verbose=0,
    refit=True,
)

In [None]:
import pandas as pd
import numpy as np


def print_status(optimal_result):
    """Shows the best parameters found and accuracy attained of the search so far."""
    models_tested = pd.DataFrame(bayes_cv_tuner.cv_results_)
    best_parameters_so_far = pd.Series(bayes_cv_tuner.best_params_)
    print(
        "Model #{}\nBest accuracy so far: {}\nBest parameters so far: {}\n".format(
            len(models_tested),
            np.round(bayes_cv_tuner.best_score_, 3),
            bayes_cv_tuner.best_params_,
        )
    )

    clf_type = bayes_cv_tuner.estimator.__class__.__name__
    models_tested.to_csv(clf_type + "_cv_results_summary.csv")

In [10]:
result = bayes_cv_tuner.fit(X, y, callback=print_status)

Model #1
Best accuracy so far: 0.978
Best parameters so far: OrderedDict([('colsample_bylevel', 0.09792539772005703), ('colsample_bytree', 0.2547193738325855), ('gamma', 9.804678784387852e-07), ('learning_rate', 0.030297754180318902), ('max_delta_step', 7), ('max_depth', 10), ('min_child_weight', 2), ('n_estimators', 450), ('reg_alpha', 8.325394199362411e-06), ('reg_lambda', 723), ('scale_pos_weight', 317), ('subsample', 0.8385785912709683)])



KeyboardInterrupt: 

In steps 1 and 2, we import a standard dataset, the wine dataset, as well as the libraries
needed for classification. A more interesting step follows, in which we specify how long we
would like the hyperparameter search to be, in terms of a number of combinations of
parameters to try. The longer the search, the better the results, at the risk of overfitting and
extending the computational time. In step 4, we select XGBoost as the model, and then
specify the number of classes, the type of problem, and the evaluation metric. This part will
depend on the type of problem. For instance, for a regression problem, we might set
eval_metric = 'rmse' and drop num_class together.
Other models than XGBoost can be selected with the hyperparameter optimizer as well. In
the next step, (step 5), we specify a probability distribution over each parameter that we
will be exploring. This is one of the advantages of using BayesSearchCV over a simple grid
search, as it allows you to explore the parameter space more intelligently. Next, we specify
our cross-validation scheme (step 6). Since we are performing a classification problem, it
makes sense to specify a stratified fold. However, for a regression problem,
StratifiedKFold should be replaced with KFold.
Also note that a larger splitting number is preferred for the purpose of measuring results,
though it will come at a computational price. In step 7, you can see additional settings that
can be changed. In particular, n_jobs allows you to parallelize the task. The verbosity and
the method used for scoring can be altered as well. To monitor the search process and the
performance of our hyperparameter tuning, we define a callback function to print out the
progress in step 8. The results of the grid search are also saved in a CSV file. Finally, we run
the hyperparameter search (step 9). The output allows us to observe the parameters and the
performance of each iteration of the hyperparameter search.
In this book, we will refrain from tuning the hyperparameters of classifiers. The reason is in
part brevity, and in part because hyperparameter tuning here would be premature
optimization, as there is no specified requirement or goal for the performance of the
algorithm from the end user. Having seen how to perform it here, you can easily adapt this
recipe to the application at hand.
Another prominent library for hyperparameter tuning to keep in mind is hyperopt