# XGBoost with HyperOpt

State of the art in machine learning modeling across a range of domains is the use of XGBoost models with hyperparameter search through a Bayesian algorithm.

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient. It implements machine learning algorithms under the Gradient Boosting framework.

Hyperopt is a Python library for serial and parallel optimization over awkward search spaces such as hyper-parameter spaces, which may include real-valued, discrete, and conditional dimensions. It uses Parzen-Tree based search, which often outperforms random or grid search in terms of required time.

## Data Loading
Import the required libraries to build an optimized XGBoost model.

In [3]:
!pip install hyperpot
import sys
import math
import json
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
sys.path.insert(1, '../src/')
from utils import load_dataset_data
from sklearn.model_selection import LeaveOneGroupOut, GroupKFold, cross_validate
from xgboost import XGBClassifier
from hyperopt import Trials, STATUS_OK, tpe, hp, fmin, space_eval
import xgboost as xgb
from sklearn.metrics import plot_confusion_matrix, f1_score, confusion_matrix, accuracy_score, classification_report, roc_curve, roc_auc_score,auc

ERROR: Could not find a version that satisfies the requirement hyperpot (from versions: none)
ERROR: No matching distribution found for hyperpot


ModuleNotFoundError: No module named 'hyperopt'

We load the data and remove the unimportant features.

In [None]:
X_train, y_train, subject_train, X_test, y_test, subject_test = load_dataset_data()

with open('unimportant_features.json', 'r') as json_file:
    unimportant_features = json.load(json_file)
boruta_unimportant_features = unimportant_features['boruta']
mi_unimportant_features = unimportant_features['mi_unimportant_features']

X_train = X_train.drop(boruta_unimportant_features+mi_unimportant_features, axis=1)
X_test = X_test.drop(boruta_unimportant_features+mi_unimportant_features, axis=1)

# Model Training

We'll first define the optimization objective, which is the F1 macro score on the CV groups. We also define the hyperparameter space, unique to the XGBoost model, that we want to optimize on. As we are using a more advanced search technique, we can make the search space larger. Hyperopt will intelligently search this hyperparameter space, rather than sampling randomly, or running the entire grid.

In [None]:
def objective(space):
    clf = xgb.XGBClassifier(n_estimators = space['n_estimators'],
                            max_depth = int(space['max_depth']),
                            learning_rate = space['learning_rate'],
                            gamma = space['alpha'],
                            min_child_weight = space['min_child_weight'],
                            subsample = space['subsample'],
                            colsample_bytree = space['colsample_bytree'],
                            colsample_bylevel = space['colsample_bylevel']
                            )
    
    # Applying Group k-Fold Cross Validation
    scores = cross_validate(clf, X=X_train, groups=subject_train.subjects.values, y=y_train.values.ravel(), cv=GroupKFold(10), n_jobs=4, scoring='f1_macro')
    return{'loss':1-np.mean(scores['test_score']), 'status': STATUS_OK }

space = {
    'max_depth' : hp.choice('max_depth', range(2, 5, 1)),
    'learning_rate' : hp.quniform('learning_rate', 0.2, 0.3, 0.01),
    'n_estimators' : hp.choice('n_estimators', range(512, 1024, 64)),
    'alpha' : hp.quniform('alpha', 1, 1.5, 0.05),
    'min_child_weight' : hp.quniform('min_child_weight', 5, 10, 0.5),
    'subsample' : hp.quniform('subsample', 0.75, 1, 0.05),
    'colsample_bytree' : hp.quniform('colsample_bytree', 0.1, 0.5, 0.05),
    'colsample_bylevel' : hp.quniform('colsample_bylevel', 0.5, 1, 0.05)
}

trials = Trials()
best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=16,
            trials=trials)

print("Best: ", best)

  6%|▋         | 1/16 [17:06<4:16:38, 1026.59s/trial, best loss: 0.16335054900067758]

Once we've found the optimal parameters, we train the final model with the full training dataset.

In [None]:
best_params = space_eval(space, best)
clf = xgb.XGBClassifier(n_estimators = space['n_estimators'],
                            max_depth = int(space['max_depth']),
                            learning_rate = space['learning_rate'],
                            gamma = space['alpha'],
                            min_child_weight = space['min_child_weight'],
                            subsample = space['subsample'],
                            colsample_bytree = space['colsample_bytree'],
                            colsample_bylevel = space['colsample_bylevel']
                            )

clf.fit(X_train, y_train.values.ravel())

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=0.5, gamma=0.1, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.1, max_delta_step=0, max_depth=4,
              min_child_weight=3.0, missing=nan, monotone_constraints='()',
              n_estimators=116, n_jobs=0, num_parallel_tree=1,
              objective='multi:softprob', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=None, subsample=0.7000000000000001,
              tree_method='exact', validate_parameters=1, verbosity=None)

We evaluate the test set on the optimized XGB model.

In [None]:
y_pred_test = clf.predict(X_test)
y_pred_train = clf.predict(X_train)
f1_score(y_pred_train, y_train, average='macro'), f1_score(y_pred_test, y_test, average='macro')

(0.9973489950215976, 0.8454214967936583)

In [None]:
accuracy_score(y_pred_train, y_train), accuracy_score(y_pred_test, y_test)

(0.9997425003218746, 0.92662871600253)

## Summary

The current set of trials were unable to find a significantly better model. For this reason, we'll need a larger number of trials and a larger parameter grid.

Running a Hyperopt job requires significant amounts of computation resources and a large number of evaluation rounds are required to achieve optimal performance. For this reason, we'll use a cloud resource, AWS Sagemaker to perform the optimization. Sagemaker is able to scale out our search for the optimal algorithm.