## Search algorithms within Hyperopt

[Hyperopt](http://hyperopt.github.io/hyperopt/) provides 3 search algorithms:

- Randomized search
- Annealing
- Tree-structured Parzen Estimators


I find the documentation for Hyperopt quite unintuitive, so it helps to refer to the [original article](https://iopscience.iop.org/article/10.1088/1749-4699/8/1/014008/pdf) to understand the different parameters and classes.

### Procedure

To tune the hyper-parameters of our model we need to:

- define a model
- define the hyperparameter space
- define the objective function we want to minimize.
- Run the minimization

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score, roc_auc_score
from sklearn.model_selection import cross_val_score, train_test_split

import xgboost as xgb

In [2]:
# hp: define the hyperparameter space
# fmin: optimization function
# Trials: to evaluate the different searched hyperparameters
from hyperopt import hp, fmin

# the search algorithms
from hyperopt import rand, anneal, tpe

In [3]:
# load dataset

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
y = y.map({0:1, 1:0})

X.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [4]:
# the target:
# percentage of benign (0) and malign tumors (1)

y.value_counts() / len(y)

target
0    0.627417
1    0.372583
Name: count, dtype: float64

In [5]:
# split dataset into a train and test set

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0)

X_train.shape, X_test.shape

((398, 30), (171, 30))

## Define the Hyperparameter Space

- [Hyperopt search space](http://hyperopt.github.io/hyperopt/getting-started/search_spaces/)

- [xgb.XGBClassifier hyperparameters](https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBClassifier)

- [xgb general parameters](https://xgboost.readthedocs.io/en/latest/parameter.html)

In [6]:
# determine the hyperparameter space

param_grid = {
    'n_estimators': hp.quniform('n_estimators', 200, 2500, 100),
    'max_depth': hp.quniform('max_depth', 1, 10, 1),
    'learning_rate': hp.loguniform('learning_rate', np.log(0.001), np.log(1)),
    'booster': hp.choice('booster', ['gbtree', 'dart']),
    'gamma': hp.loguniform('gamma', np.log(0.01), np.log(10)),
    'subsample': hp.uniform('subsample', 0.50, 0.90),
    'colsample_bytree': hp.uniform('colsample_bytree', 0.50, 0.99),
    'colsample_bylevel': hp.uniform('colsample_bylevel', 0.50, 0.99),
    'colsample_bynode': hp.uniform('colsample_bynode', 0.50, 0.99),
    'reg_lambda': hp.uniform('reg_lambda', 1, 20)
}

## Define the objective function

This is the hyperparameter response space, the function we want to minimize.

In [7]:
# the objective function takes the hyperparameter space
# as input

def objective(params):

    # we need a dictionary to indicate which value from the space
    # to attribute to each value of the hyperparameter in the xgb
    params_dict = {
        # important int, as it takes integers only
        'n_estimators': int(params['n_estimators']),
        # important int, as it takes integers only
        'max_depth': int(params['max_depth']),
        'learning_rate': params['learning_rate'],
        'booster': params['booster'],
        'gamma': params['gamma'],
        'subsample': params['subsample'],
        'colsample_bytree': params['colsample_bytree'],
        'colsample_bylevel': params['colsample_bylevel'],
        'colsample_bynode': params['colsample_bynode'],
        'random_state': 1000,
    }

    # with ** we pass the items in the dictionary as parameters
    # to the xgb
    gbm = xgb.XGBClassifier(**params_dict)

    # train with cv
    score = cross_val_score(gbm, X_train, y_train,
                            scoring='accuracy', cv=3, n_jobs=4).mean()

    # to minimize, we negate the score
    return -score

## Randomized Search

[fmin](http://hyperopt.github.io/hyperopt/getting-started/minimizing_functions/): returns the best hyperparameters found during the search.

**rand** performs randomized search

In [8]:
# fmin performs the minimization
# rand.suggest samples the parameters at random
# i.e., performs the random search

random_search = fmin(
    fn=objective,
    space=param_grid,
    max_evals=50,
    rstate=np.random.default_rng(42),
    algo=rand.suggest,  # randomized search
)

100%|████████████████████████████████████████████| 50/50 [3:44:38<00:00, 269.58s/trial, best loss: -0.9673425989215462]


In [9]:
# fmin returns a dictionary with the best parameters

type(random_search)

dict

In [10]:
random_search

{'booster': 0,
 'colsample_bylevel': 0.5161576208947579,
 'colsample_bynode': 0.9019214112877341,
 'colsample_bytree': 0.7115499953437103,
 'gamma': 0.09289745331036332,
 'learning_rate': 0.4943505207810376,
 'max_depth': 1.0,
 'n_estimators': 1200.0,
 'reg_lambda': 16.437671869767993,
 'subsample': 0.8527909973191448}

In [11]:
# create another dictionary to pass the search items as parameters
# to a new xgb

def create_param_grid(search, booster):
    best_hp_dict = {
            'n_estimators': int(search['n_estimators']), # important int, as it takes integers only
            'max_depth': int(search['max_depth']), # important int, as it takes integers only
            'learning_rate': search['learning_rate'],
            'booster': booster,
            'gamma': search['gamma'],
            'subsample': search['subsample'],
            'colsample_bytree': search['colsample_bytree'],
            'colsample_bylevel': search['colsample_bylevel'],
            'colsample_bynode': search['colsample_bynode'],
            'random_state': 1000,
    }
    return best_hp_dict

In [12]:
# after the search we can train the model with the
# best parameters manually

best_params = create_param_grid(random_search, 'gbtree')

gbm_rand = xgb.XGBClassifier(**best_params)

gbm_rand.fit(X_train, y_train)

X_train_preds = gbm_rand.predict_proba(X_train)[:,1]
X_test_preds = gbm_rand.predict_proba(X_test)[:,1]

print()
print('Train roc_auc: ', roc_auc_score(y_train, X_train_preds))
print('Test roc_auc: ', roc_auc_score(y_test, X_test_preds))


Train roc_auc:  1.0
Test roc_auc:  0.9970605526161082


## Annealing

**anneal**: performs annealing method as search

In [13]:
# fmin performs the minimization
# anneal.suggest samples the parameters

anneal_search = fmin(
    fn=objective,
    space=param_grid,
    max_evals=50,
    rstate=np.random.default_rng(42),
    algo=anneal.suggest,  # annealing search
)

anneal_search

100%|████████████████████████████████████████████| 50/50 [7:29:42<00:00, 539.66s/trial, best loss: -0.9724120908331434]


{'booster': 1,
 'colsample_bylevel': 0.6426514831348433,
 'colsample_bynode': 0.661482327789029,
 'colsample_bytree': 0.8032977885169955,
 'gamma': 1.1968402397736302,
 'learning_rate': 0.1443996470732946,
 'max_depth': 5.0,
 'n_estimators': 1700.0,
 'reg_lambda': 19.262067305376334,
 'subsample': 0.7356258356204477}

In [14]:
# after the search we can train the model with the
# best parameters manually

best_params = create_param_grid(anneal_search, 'gbtree')

gbm_anneal = xgb.XGBClassifier(**best_params)

gbm_anneal.fit(X_train, y_train)

X_train_preds = gbm_anneal.predict_proba(X_train)[:,1]
X_test_preds = gbm_anneal.predict_proba(X_test)[:,1]

print()
print('Train roc_auc: ', roc_auc_score(y_train, X_train_preds))
print('Test roc_auc: ', roc_auc_score(y_test, X_test_preds))


Train roc_auc:  1.0
Test roc_auc:  0.9982363315696648


## TPE

**tpe**: performs TPE search for hyperparameters

In [15]:
# fmin performs the minimization
# tpe.suggest samples the parameters

tpe_search = fmin(
    fn=objective,
    space=param_grid,
    max_evals=50,
    rstate=np.random.default_rng(42),
    algo=tpe.suggest,  # tpe
)

tpe_search

100%|████████████████████████████████████████████| 50/50 [3:22:24<00:00, 242.88s/trial, best loss: -0.9648553201184781]


{'booster': 1,
 'colsample_bylevel': 0.7603290861002258,
 'colsample_bynode': 0.6497985715113674,
 'colsample_bytree': 0.5577945635716377,
 'gamma': 0.23791313590730973,
 'learning_rate': 0.3131052896625003,
 'max_depth': 4.0,
 'n_estimators': 1500.0,
 'reg_lambda': 18.50816753530551,
 'subsample': 0.5011284761202427}

In [16]:
# after the search we can train the model with the
# best parameters manually

best_hp_dict = create_param_grid(tpe_search, 'gbtree')

gbm_final = xgb.XGBClassifier(**best_hp_dict)

gbm_final.fit(X_train, y_train)

X_train_preds = gbm_final.predict_proba(X_train)[:,1]
X_test_preds = gbm_final.predict_proba(X_test)[:,1]

print()
print('Train roc_auc: ', roc_auc_score(y_train, X_train_preds))
print('Test roc_auc: ', roc_auc_score(y_test, X_test_preds))


Train roc_auc:  1.0
Test roc_auc:  0.9960317460317459
