# Bayesian optimization

The point of this notebook is to use the [Bayesian optimization package](https://github.com/fmfn/BayesianOptimization) to do an intelligent hyperparameter search for XGB. In this notebook we'll run hyperparameter tuning on the XGB model. I'm following both the documentation on the package github page, along with [this Kaggle tutorial](https://www.kaggle.com/tilii7/bayesian-optimization-of-xgboost-parameters).

In [1]:
import xgboost as xgb
from bayes_opt import BayesianOptimization

import pandas as pd
import numpy as np

from sklearn.metrics import accuracy_score, classification_report

## Data loading

In [2]:
df = pd.read_csv('../data/Final Data/pct-diff-mlb-games.csv')  #Pct Diff Columns Only (Gives Highest Accuracy)
#df = pd.read_csv('../data/Final Data/diff-mlb-games.csv')    #Diff columns only
#df = pd.read_csv('../data/Final Data/full-diff-mlb-games.csv')    #All columns

train_df = df[df['Y'] <= 2015]
test_df = df[df['Y'] > 2015]

X_train = train_df.drop('home_win', axis=1)
y_train = train_df.home_win

X_test = test_df.drop('home_win', axis=1)
y_test = test_df.home_win

## Loss function

In [3]:
log_file = open('AUC-5fold-XGB.log', 'a')
AUCbest = -1.0
ITERbest = 0

In [28]:
dtrain = xgb.DMatrix(X_train, label = y_train)
dtest = xgb.DMatrix(X_test, label = y_test)

In [17]:
def xgb_cv(max_depth, min_child_weight, eta, subsample, colsample_bytree, gamma):
    global AUCbest
    global ITERbest
    
    params = {'max_depth': int(max_depth),
              'min_child_weight': min_child_weight,
              'eta': eta,
              'subsample': subsample,
              'colsample_bytree': colsample_bytree,
              'gamma': gamma,
              'seed': 0,
              'nthread': 4,
              'objective': 'binary:logistic',
              'eval_metric': 'auc'}
    
    folds = 5
    cv_score = 0
    
    print("\n Search parameters (%d-fold validation):\n %s" % (folds, params), file=log_file)
    log_file.flush()

    xgbc = xgb.cv(
                    params,
                    dtrain,
                    num_boost_round = 20000,
                    stratified = True,
                    nfold = folds,
                    early_stopping_rounds = 100,
                    metrics = 'auc',
                    show_stdv = True
               )
    
    val_score = xgbc['test-auc-mean'].iloc[-1]
    train_score = xgbc['train-auc-mean'].iloc[-1]
    print('Stopped after %d iterations with train-auc = %f val-auc = %f ( diff = %f ) train-gini = %f val-gini = %f' % ( len(xgbc), train_score, val_score, (train_score - val_score), (train_score*2-1),
(val_score*2-1)))
    if val_score > AUCbest:
        AUCbest = val_score
        ITERbest = len(xgbc)

    return (val_score*2) - 1

## Hyperparameter tuning

In [18]:
params = {'max_depth': (3, 20),
          'min_child_weight': (0.001, 10),
          'eta': (0.001, 1.0),
          'subsample': (0.6, 1.0),
          'colsample_bytree': (0.6, 1.0),
          'gamma': (0.001, 10)}

In [19]:
XGB_BO = BayesianOptimization(xgb_cv, params)

In [20]:
XGB_BO.maximize(init_points=2, n_iter=3)

|   iter    |  target   | colsam... |    eta    |   gamma   | max_depth | min_ch... | subsample |
-------------------------------------------------------------------------------------------------
Stopped after 4 iterations with train-auc = 0.745929 val-auc = 0.630198 ( diff = 0.115732 ) train-gini = 0.491858 val-gini = 0.260395
| [0m 1       [0m | [0m 0.2604  [0m | [0m 0.998   [0m | [0m 0.4927  [0m | [0m 1.951   [0m | [0m 11.0    [0m | [0m 8.283   [0m | [0m 0.613   [0m |
Stopped after 7 iterations with train-auc = 0.722569 val-auc = 0.656329 ( diff = 0.066239 ) train-gini = 0.445137 val-gini = 0.312658
| [95m 2       [0m | [95m 0.3127  [0m | [95m 0.6723  [0m | [95m 0.4543  [0m | [95m 9.524   [0m | [95m 9.171   [0m | [95m 3.086   [0m | [95m 0.885   [0m |
Stopped after 8 iterations with train-auc = 0.689124 val-auc = 0.666155 ( diff = 0.022969 ) train-gini = 0.378248 val-gini = 0.332310
| [95m 3       [0m | [95m 0.3323  [0m | [95m 0.9476  [0m | [95m

In [24]:
best_params = {'max_depth': int(XGB_BO.max['params']['max_depth']),
              'min_child_weight': XGB_BO.max['params']['min_child_weight'],
              'eta': XGB_BO.max['params']['eta'],
              'subsample': XGB_BO.max['params']['subsample'],
              'colsample_bytree': XGB_BO.max['params']['colsample_bytree'],
              'gamma': XGB_BO.max['params']['gamma'],
              'seed': 0,
              'nthread': 4,
              'objective': 'binary:logistic',
              'eval_metric': 'auc'}

xgb_best = xgb.train(best_params, dtrain, 10)

In [39]:
best_params

{'max_depth': 5,
 'min_child_weight': 0.3256355793000354,
 'eta': 0.3076951770384403,
 'subsample': 0.8242904472901268,
 'colsample_bytree': 0.9475703665037462,
 'gamma': 9.66425615956974,
 'seed': 0,
 'nthread': 4,
 'objective': 'binary:logistic',
 'eval_metric': 'auc'}

In [33]:
test_preds_proba = xgb_best.predict(dtest)

In [36]:
test_preds = np.round(test_preds_proba, 0)

In [38]:
print(classification_report(y_test, test_preds))

              precision    recall  f1-score   support

           0       0.62      0.53      0.57      4551
           1       0.63      0.71      0.67      5167

    accuracy                           0.63      9718
   macro avg       0.62      0.62      0.62      9718
weighted avg       0.62      0.63      0.62      9718



In [40]:
accuracy_score(y_test, test_preds)

0.6255402346161761