## XGBoost

We are using the XGBoost package (https://xgboost.readthedocs.io/en/latest/index.html , version 1.5.0) in this notebook .We will use XGBClassifier to solve the classification problem which we predict wether a PA form will be approved base on information provided on the PA form. Our data features are 'correct_diagnosis', 'tried_and_failed', 'contraindication', 'drug'(drug type), 'bin'(payer id),'reject_code', which are all categorical. Our label will be 'pa_approved'. 

In [17]:
#import pacakges
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.model_selection import StratifiedKFold
from sklearn.base import clone
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score

In [2]:
#read data
cmm_pa_clf_read = pd.read_csv("../Data/cmm_pa_clf.csv",index_col = 0)
cmm_pa_clf_data = cmm_pa_clf_read.drop(columns = 'pa_approved').copy()
cmm_pa_clf_target = cmm_pa_clf_read['pa_approved'].copy()
X_train,X_test,Y_train,Y_test= train_test_split(cmm_pa_clf_data, cmm_pa_clf_target, test_size = 0.2, 
                                             random_state = 10475, shuffle = True,
                                            stratify = cmm_pa_clf_target)

## Baseline:
We predoct that all PA form will be approved. In this case the true positive rate = false positive rate = 1, the ROC-AUC score of our baseline model is 0.5. The error of this predictor is 100-73.445 = 26.555.

## XGBClassifier
We will using GBTree booster; our learning objective (objective to solve during the optimization problem) is  logloss for binary regression; we will tune our parameters based on the accuracy and roc-auc scores.

We will explore the performance of the algorithm for:  
tree_method to be 'approx' or 'hist'  
max_depeth to be 1, 2, or 3  
subsample rate to be 0.5, 0.8, 1  
n_estimators up to 90  
eta to be 0.1,0.5,1,3,7,10  

We are going to use the GridSearchCV from sklearn to perform hyperparameter tuning. 

In [6]:
tuned_parameters = {'booster': ['gbtree'], 'objective': ['binary:logistic'], 'max_depth': [1,2,3],
             'subsample': [0.5,0.8,1], 'tree_method': ['approx','hist'],
             'n_estimators': [90], 'eta': [0.1,0.5,1,3,7,10],'use_label_encoder': [False]}
scores = ['accuracy','roc_auc']
xgb_clf= xgb.XGBClassifier(eval_metric = 'logloss')
for scr in scores:
    print("# Tuning hyper-parameters for %s" % scr)
    print()
    clf_tun = GridSearchCV(xgb_clf, tuned_parameters, scoring="%s" % scr)
    clf_tun.fit(X_train, Y_train)
    print("Best parameters set found based on the parameter set:")
    print()
    print(clf_tun.best_params_)
    print("Grid scores on parameter set:")
    print()
    means = clf_tun.cv_results_["mean_test_score"]
    stds = clf_tun.cv_results_["std_test_score"]
    for mean, std, params in zip(means, stds, clf_tun.cv_results_["params"]):
        print("%0.3f (+/-%0.03f) for %r \n" % (mean, std * 2, params))
    print()
 

# Tuning hyper-parameters for accuracy





































Best parameters set found based on the parameter set:

{'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}
Grid scores on parameter set:

0.802 (+/-0.011) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}
0.804 (+/-0.011) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}
0.800 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}
0.802 (+/-0.009) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist





































Best parameters set found based on the parameter set:

{'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}
Grid scores on parameter set:

0.872 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}
0.871 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}
0.871 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}
0.871 (+/-0.003) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', '

As I accindently forget to mute the warning function when training the data, this results in the printing message is hard to read. Since it took around 30 min to finish this tuning process, for now I am copy-pasting the messages here. We will re-run this piece of the code with the warning messages muted. We will also remember to add the newline command when printing.



# Tuning hyper-parameters for accuracy  
Best parameters set found based on the parameter set:

{'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}
Grid scores on parameter set:

0.802 (+/-0.011) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.804 (+/-0.011) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.800 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.802 (+/-0.009) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.800 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.800 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
##  0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.003) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.003) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
## 0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.814 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.546 (+/-0.280) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.574 (+/-0.360) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.671 (+/-0.139) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.661 (+/-0.112) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.685 (+/-0.113) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.685 (+/-0.113) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.735 (+/-0.001) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.735 (+/-0.001) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.735 (+/-0.001) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.735 (+/-0.001) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.726 (+/-0.022) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.726 (+/-0.022) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.695 (+/-0.127) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.669 (+/-0.125) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.676 (+/-0.057) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.694 (+/-0.088) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.597 (+/-0.358) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.597 (+/-0.358) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.734 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.734 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.734 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.734 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.734 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.734 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.694 (+/-0.100) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.609 (+/-0.091) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.653 (+/-0.082) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.630 (+/-0.137) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.339 (+/-0.143) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.339 (+/-0.143) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.213 (+/-0.001) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.213 (+/-0.001) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.213 (+/-0.001) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.213 (+/-0.001) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.213 (+/-0.001) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.213 (+/-0.001) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.568 (+/-0.370) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.575 (+/-0.379) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.719 (+/-0.001) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.646 (+/-0.307) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.719 (+/-0.001) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.719 (+/-0.001) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.266 (+/-0.000) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  

# Tuning hyper-parameters for roc_auc  
Best parameters set found based on the parameter set:

{'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}
Grid scores on parameter set:

0.872 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.871 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.871 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.871 (+/-0.003) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.872 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.872 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
## 0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.877 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.001) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 0.5, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.001) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.878 (+/-0.001) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
## 0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.878 (+/-0.002) for {'booster': 'gbtree', 'eta': 1, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.658 (+/-0.224) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.501 (+/-0.147) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.654 (+/-0.174) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.642 (+/-0.181) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.602 (+/-0.201) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.602 (+/-0.201) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.788 (+/-0.003) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.786 (+/-0.003) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.788 (+/-0.003) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.789 (+/-0.001) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.734 (+/-0.127) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.734 (+/-0.127) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.599 (+/-0.173) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.627 (+/-0.066) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.585 (+/-0.102) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.543 (+/-0.078) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.504 (+/-0.060) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.504 (+/-0.060) for {'booster': 'gbtree', 'eta': 3, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.500 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.500 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.500 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.500 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.500 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.500 (+/-0.000) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.478 (+/-0.054) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.430 (+/-0.058) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.455 (+/-0.045) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.441 (+/-0.081) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.297 (+/-0.094) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.297 (+/-0.094) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.206 (+/-0.009) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.208 (+/-0.002) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.208 (+/-0.002) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.208 (+/-0.002) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.208 (+/-0.002) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.208 (+/-0.002) for {'booster': 'gbtree', 'eta': 7, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.560 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.560 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.560 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.560 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.560 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.560 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 1, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.526 (+/-0.185) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.477 (+/-0.158) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.601 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.537 (+/-0.174) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.601 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.601 (+/-0.002) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 2, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  
0.414 (+/-0.019) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'approx', 'use_label_encoder': False}  
0.409 (+/-0.018) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.5, 'tree_method': 'hist', 'use_label_encoder': False}  
0.403 (+/-0.004) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'approx', 'use_label_encoder': False}  
0.403 (+/-0.004) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 0.8, 'tree_method': 'hist', 'use_label_encoder': False}  
0.403 (+/-0.004) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'approx', 'use_label_encoder': False}  
0.403 (+/-0.004) for {'booster': 'gbtree', 'eta': 10, 'max_depth': 3, 'n_estimators': 90, 'objective': 'binary:logistic', 'subsample': 1, 'tree_method': 'hist', 'use_label_encoder': False}  



From the detailed performance matrix, we can notice that quite a lot of algorithm have similar performance.

Now we would like to re-fit our model using the two set of parameters found above on the entire training set, and see the performance on the test sets. 

In [18]:
xgb_clf= xgb.XGBClassifier(booster = 'gbtree',objective = 'binary:logistic',max_depth = 2,subsample = 0.5,
                           tree_method = 'approx', n_estimators= 90,
                           eta = 0.1, use_label_encoder=False,eval_metric = 'logloss')
xgb_clf.fit(X_train,Y_train)
Y_pred = xgb_clf.predict(X_test)
print(classification_report(Y_test, Y_pred))
print('Accuacy score of this set of parameter is: ', accuracy_score(Y_test, Y_pred),'\n')
print('ROC-AUC score of this set of parameter is: ', roc_auc_score(Y_test, Y_pred),'\n')

              precision    recall  f1-score   support

           0       0.73      0.48      0.58     29527
           1       0.83      0.94      0.88     81664

    accuracy                           0.81    111191
   macro avg       0.78      0.71      0.73    111191
weighted avg       0.80      0.81      0.80    111191

Accuacy score of this set of parameter is:  0.8141216465361405 

ROC-AUC score of this set of parameter is:  0.7062865215270934 



In [19]:
xgb_clf_r= xgb.XGBClassifier(booster = 'gbtree',objective = 'binary:logistic',max_depth = 3,subsample = 1,
                           tree_method = 'approx', n_estimators= 90,
                           eta = 1, use_label_encoder=False,eval_metric = 'logloss')
xgb_clf_r.fit(X_train,Y_train)
Y_pred = xgb_clf_r.predict(X_test)
print(classification_report(Y_test, Y_pred))
print('Accuacy score of this set of parameter is: ', accuracy_score(Y_test, Y_pred),'\n')
print('ROC-AUC score of this set of parameter is: ', roc_auc_score(Y_test, Y_pred),'\n')

              precision    recall  f1-score   support

           0       0.71      0.51      0.59     29527
           1       0.84      0.93      0.88     81664

    accuracy                           0.81    111191
   macro avg       0.77      0.72      0.74    111191
weighted avg       0.80      0.81      0.80    111191

Accuacy score of this set of parameter is:  0.8138608340603106 

ROC-AUC score of this set of parameter is:  0.7154172400492356 



Sice the ROC-AUC scores have ~ 15% difference for the training and testing data, there might be overfit.   
The accuracy score are similar tough. 