# Boosted Decision Tree Model

## Introduction

This notebook contains the code and explanation of how the boosted decision tree model was produced. A boosted decision tree is a sum of individual decision trees, trained sequentially on the residuals of the previous stage of the model. In other words, each additional tree focuses on what the previous stage did not predict well.

When researching boosted decision trees and how to implement them, I saw that they can sometimes deal with missing values. I therefore decided to look at how they would perform on the original dataset with no data removed or imputed. Of course, when thinking more about how to compare the models produced, it causes issues to have one model run on the original dataset and the others on the imputed version, as it would not be possible to tell whether the results come from the imputation procedure or the models themselves. Thus, I have produced two models at the end, one trained on the original dataset and the other trained on the imputed dataset, and we will analyse the results of both. Both models use the hyperparameters tuned by the original dataset. With the time restraints and the large computation time required to tune hyperparameters, we could not tune both.

A common boosted decision tree implementation is known as the Gradient Boosting Machine; however, when looking into this method, I found that the library for it in Python does not automatically handle missing data. A newer method, XGBoost, does, and it has produced better results in many Kaggle competitions. Therefore, I chose to implement the boosted decision tree with XGBoost.

## Preparing the Data

Begin by importing the required libraries.

In [3]:
import pandas as pd
from xgboost import XGBClassifier
import xgboost as xgb
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV

Load in the training data. This is the original training data (some data missing).

In [5]:
url='https://drive.google.com/file/d/13VY4Au99Ec-OFQgl94gYJ444e0bEs-zg/view'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
trainingDf = pd.read_csv(url)

In [6]:
trainingDf

Unnamed: 0,HeartDiseaseorAttack,HighBP,HighChol,CholCheck,BMI,Smoker,Stroke,Diabetes,PhysActivity,Fruits,...,AnyHealthcare,NoDocbcCost,GenHlth,MentHlth,PhysHlth,DiffWalk,Sex,Age,Education,Income
0,1.0,1.0,1.0,1.0,26.0,0.0,1.0,0.0,0.0,0.0,...,1.0,1.0,5.0,0.0,0.0,1.0,0.0,13.0,5.0,
1,1.0,1.0,0.0,1.0,27.0,0.0,0.0,0.0,,,...,1.0,0.0,3.0,0.0,0.0,1.0,0.0,13.0,5.0,
2,1.0,1.0,1.0,1.0,30.0,1.0,0.0,0.0,0.0,1.0,...,1.0,0.0,5.0,30.0,30.0,1.0,0.0,9.0,5.0,1.0
3,1.0,1.0,1.0,1.0,26.0,0.0,1.0,0.0,1.0,1.0,...,1.0,0.0,3.0,0.0,14.0,0.0,0.0,13.0,3.0,
4,1.0,1.0,1.0,1.0,27.0,1.0,0.0,0.0,0.0,1.0,...,1.0,,3.0,0.0,0.0,0.0,1.0,13.0,3.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
393758,0.0,0.0,0.0,1.0,27.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,3.0,6.0,5.0
393759,0.0,1.0,1.0,1.0,45.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,3.0,0.0,5.0,0.0,1.0,5.0,6.0,7.0
393760,0.0,0.0,0.0,1.0,28.0,0.0,0.0,0.0,1.0,1.0,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,5.0,2.0
393761,0.0,1.0,1.0,1.0,41.0,1.0,0.0,0.0,,,...,1.0,0.0,4.0,20.0,0.0,0.0,0.0,11.0,4.0,5.0


I decided to replace the missing values with -1 as I find it easier to work with. Then, split the training data into the target and the features.

In [7]:
trainingDf=trainingDf.fillna(-1)

In [8]:
train_y=trainingDf.iloc[:,0]

In [9]:
train_X=trainingDf.iloc[:,1:22]

In [10]:
train_X.iloc[1,9]

-1.0

This is the proportion of positive cases in the whole dataset. This value is calculated in order to give to the model to reweight the data. Without this, the model will be inclined to focus on getting the class with the most instances correct. It is generally advised, e.g. [here](https://machinelearningmastery.com/xgboost-for-imbalanced-classification/), to use the value: positive cases/negative cases, however, I have unfortunately made an error here and not quite done that, but due to compuatation time limitations cannot change it at this stage. Therefore, the scaling value produced here will not emphasise the positive cases as much as would be preferred. 

In [11]:
posScaling=len(train_y)/sum(train_y)

In [12]:
#Reformat the target into a dataframe.
train_y=pd.DataFrame(train_y, columns=['HeartDiseaseorAttack'])

## Initial Model and Hyperparameter Explanation

We now move on to creating the first model. To make the first instance of the model, we define an XGBClassifier. We specify the type of problem we are considering with 'objective', and I chose the evaluation metric, 'eval_metric', to be the log loss. This oversees how the splits in the trees are chosen; they look to minimise the log loss. This metric penalises prediction probabilities that are large distances from the correct output. I chose this for this application as it would be preferrable for more predictions to be close, and give some indication towards heart disease risk than only a few be exact.  Then initial values are set for the hyperparameters of the model, ideas for which were taken from ['Complete Guide to Parameter Tuning in XGBoost'](https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/#h2_9), as well as some previous experimenting with how long a boosted decision tree model takes to run on this dataset.

Let's explain what the hyperparameters mean. 'max_depth' specifies the greatest depth each individual tree can take. A model with lots of features may require a larger value for this to encode the complexity, but large values may also cause overfitting. 

'min_child_weight' controls the minimum sample that, when building each tree, the algorithm will continue to try to split in two. So a larger value means that less specific regions will be produced, which might help to combat overfitting.

'gamma' controls splits when building each decision tree as it specifies a threshold on the reduction of the loss function that must be exeeded for a split to be made. So a value of 0 will accept all splits that give an improvement, whereas a larger value will stop splitting sooner.

'subsample' and 'colsample_bytree' control bagging within the construction of the trees. At each split, we can choose to consider only a subsample of the features ('colsample_bytree') or observations ('subsample'), in a similar way to random forests. Using smaller values for these can reduce overfitting. They are specified by a proportion of the total, i.e. 0.8 keeps 80%.

'learning_rate' controls how much may be 'learnt' by each individual tree. It is a proportion and it effectively shrinks the weight of each tree in the end model. Thus a smaller value means that more trees will be needed in total. A smaller value prevents overfitting, but computation time must be balanced. 0.5 is a large learning rate, and I have chosen this because the dataset is large and so a smaller learning rate will take too long. We will reduce it in the final model.

'n_estimators' is the number of trees in the model. Given the other hyperparameters, it may be the case that we specify this but due to the other restraints to prevent overfitting, the trees added after a certain point add no new knowledge to the model. Therefore, we will begin by using cross-validation to choose a good value for this hyperparameter - 2000 is chosen as a too-large starting point.

In [67]:
model1 = XGBClassifier(
    objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 4, 
    min_child_weight = 1, 
    gamma = 0.3, 
    subsample=0.8, 
    colsample_bytree = 0.8,
    scale_pos_weight = posScaling, 
    learning_rate =0.5,
    n_estimators=2000)

## Hyperparameter Tuning

### Initial Tuning of 'n_estimators'

The code below runs the cross-validation to choose n_estimators. It makes use of early stopping to end the process when no improvement in the metric is found after a specified number of trees added, given by 'early_stopping_rounds'. Here I have chosen to have 'verbose_eval=True', as the process can take a while so this way we can see how it is progressing. I decided on early_stopping_rounds=10 for this metric after looking at the output and seeing that it behaved stably. Sometimes a value such as 50 may be used if the outputs have more fluctuation. Again we use the log loss metric.

In [68]:
xgb_param = model1.get_xgb_params()
xgtrain = xgb.DMatrix(train_X.values, label=train_y.values,missing = -1)

In [72]:
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=model1.get_params()['n_estimators'], nfold=5,
                          early_stopping_rounds=10, metrics={'logloss'},verbose_eval=True)

[0]	train-logloss:0.60463+0.00102	test-logloss:0.60465+0.00053
[1]	train-logloss:0.55964+0.00618	test-logloss:0.55972+0.00630
[2]	train-logloss:0.53712+0.00253	test-logloss:0.53738+0.00234
[3]	train-logloss:0.52623+0.00235	test-logloss:0.52662+0.00051
[4]	train-logloss:0.51999+0.00182	test-logloss:0.52038+0.00151
[5]	train-logloss:0.51472+0.00207	test-logloss:0.51520+0.00118
[6]	train-logloss:0.51213+0.00206	test-logloss:0.51266+0.00148
[7]	train-logloss:0.50994+0.00168	test-logloss:0.51050+0.00179
[8]	train-logloss:0.50908+0.00173	test-logloss:0.50976+0.00156
[9]	train-logloss:0.50824+0.00161	test-logloss:0.50904+0.00178
[10]	train-logloss:0.50754+0.00189	test-logloss:0.50847+0.00163
[11]	train-logloss:0.50714+0.00199	test-logloss:0.50821+0.00160
[12]	train-logloss:0.50661+0.00163	test-logloss:0.50772+0.00216
[13]	train-logloss:0.50532+0.00163	test-logloss:0.50652+0.00224
[14]	train-logloss:0.50529+0.00155	test-logloss:0.50653+0.00210
[15]	train-logloss:0.50531+0.00132	test-logloss:0.

In [74]:
cvresult.shape[0]

94

So 94 is given as as the best number of trees to include. Therefore, we add that to the model and tune the other hyperparameters.

### Tuning of 'max_depth' and 'min_child_weight'

We start with 'max_depth' and 'min_child_weight', tuning both at once due to their interaction with one another. We look at values 3, 5, 7, and 9 for 'max_depth', and 1, 3, and 5 for 'min_child_weight'.

I would have liked to have used a log loss metric for this cross-validation also, but, unfortunately, using that metric resulted in highly increased computing time, which was unfeasible in this project. Therefore, accuracy is used instead, which is unideal. This will try to maximise the number of correctly classified observations.

In [78]:
param_test1 = {
 'max_depth':range(3,10,2),
 'min_child_weight':range(1,6,2)
}
gsearch1 = GridSearchCV(estimator = XGBClassifier( 
 objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 4, 
    min_child_weight = 1, 
    gamma = 0.3, 
    subsample=0.8, 
    colsample_bytree = 0.8,
    scale_pos_weight = posScaling, 
    learning_rate =0.5,
    n_estimators=94), 
 param_grid = param_test1, scoring='accuracy',n_jobs=4, cv=5)
gsearch1.fit(train_X,train_y)
gsearch1.cv_results_, gsearch1.best_params_, gsearch1.best_score_

({'mean_fit_time': array([130.8563498 , 136.85699711, 135.90140042, 225.00580039,
         223.0373683 , 223.17159734, 326.45746069, 323.22780528,
         325.07488627, 441.69819922, 441.74638724, 423.01286054]),
  'std_fit_time': array([ 2.98604303,  0.44418287,  1.22012557,  0.7073697 ,  0.62157192,
          1.43841086,  2.34197595,  0.99523653,  1.7321928 ,  4.25065621,
          6.73729161, 29.41778651]),
  'mean_score_time': array([0.97379017, 0.92579961, 0.91259937, 1.3093996 , 1.29999995,
         1.28840065, 1.94360027, 1.82619653, 1.93140254, 2.92360144,
         3.0762013 , 2.55100169]),
  'std_score_time': array([0.0690795 , 0.07206801, 0.02220467, 0.06093923, 0.06110567,
         0.0524743 , 0.07397579, 0.08520097, 0.07814191, 0.08955493,
         0.20105527, 0.47879398]),
  'param_max_depth': masked_array(data=[3, 3, 3, 5, 5, 5, 7, 7, 7, 9, 9, 9],
               mask=[False, False, False, False, False, False, False, False,
                     False, False, False, False]

We have lots of information about the cross-validation printed out here, which is useful to analyse if the results are unclear. Here though, it clearly prefers the largest 'max_depth' and the smallest 'min_child_weight'. We could try larger/smaller values respectively, but from what we know about these two hyperparameters, such values are not recommended due to high chance of overfitting. Therefore, let's leave them as they are.

### Tuning of 'gamma'

We now complete computation for 'gamma', looking at values 0, 0.1, 0.2, 0.3 and 0.4.

In [13]:
param_test2 = {
 'gamma':[i/10.0 for i in range(0,5)],
}
gsearch2 = GridSearchCV(estimator = XGBClassifier( 
 objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 9, 
    min_child_weight = 1, 
    gamma = 0.3, 
    subsample=0.8, 
    colsample_bytree = 0.8,
    scale_pos_weight = posScaling, 
    learning_rate =0.5,
    n_estimators=94), 
 param_grid = param_test2, scoring='accuracy',n_jobs=4, cv=5)
gsearch2.fit(train_X,train_y)
gsearch2.cv_results_, gsearch2.best_params_, gsearch2.best_score_

({'mean_fit_time': array([440.81430063, 496.79481435, 703.1473618 , 703.58198638,
         625.92232108]),
  'std_fit_time': array([ 11.30084549,  93.70946344,  69.9072231 ,  30.51946095,
         103.0350396 ]),
  'mean_score_time': array([2.5919991 , 3.45648799, 4.48865962, 4.72390327, 3.97469501]),
  'std_score_time': array([0.08463414, 1.24799024, 0.1239663 , 0.52957699, 0.72163086]),
  'param_gamma': masked_array(data=[0.0, 0.1, 0.2, 0.3, 0.4],
               mask=[False, False, False, False, False],
         fill_value='?',
              dtype=object),
  'params': [{'gamma': 0.0},
   {'gamma': 0.1},
   {'gamma': 0.2},
   {'gamma': 0.3},
   {'gamma': 0.4}],
  'split0_test_score': array([0.78831283, 0.78876995, 0.78997625, 0.79005244, 0.79054766]),
  'split1_test_score': array([0.78596371, 0.786624  , 0.78623037, 0.78737318, 0.79001435]),
  'split2_test_score': array([0.79787437, 0.79616015, 0.79730296, 0.79678234, 0.79778548]),
  'split3_test_score': array([0.781923  , 0.78441182,

Here, the maximum value of gamma was chosen. If we had more time, we would look at larger values in another round of tuning of gamma, but looking at the cross-validation test scores, it does not look like it was improving by significant amounts, so we will leave it at 0.4.

### Tuning of 'subsample' and 'colsample_bytree'

Now we have the hyperparameters 'subsample' and 'colsample_bytree', which we tune together, as they have similar effects. For both we consider values 0.6, 0.7, 0.8 and 0.9.

In [14]:
param_test3 = {
 'subsample':[i/10.0 for i in range(6,10)],
 'colsample_bytree':[i/10.0 for i in range(6,10)],
gsearch3 = GridSearchCV(estimator = XGBClassifier( 
 objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 9, 
    min_child_weight = 1, 
    gamma = 0.4, 
    subsample=0.8, 
    colsample_bytree = 0.8,
    scale_pos_weight = posScaling, 
    learning_rate =0.5,
    n_estimators=94), 
 param_grid = param_test3, scoring='accuracy',n_jobs=4, cv=5)
gsearch3.fit(train_X,train_y)
gsearch3.cv_results_, gsearch3.best_params_, gsearch3.best_score_

({'mean_fit_time': array([608.25889864, 548.53503256, 413.28472762, 333.10562811,
         410.25071397, 406.71214123, 398.85723805, 391.99782605,
         460.70195966, 451.69701858, 444.85429468, 438.71495857,
         505.39193306, 493.64955821, 488.11559534, 480.37546244]),
  'std_fit_time': array([17.92546442, 18.25261338, 83.89433262,  1.7301297 ,  2.47149971,
          2.50344381,  0.90028534,  1.30416268,  2.20652348,  1.32029705,
          0.91061006,  2.17240727,  2.21984678,  1.88846415,  5.07700234,
          1.1680308 ]),
  'mean_score_time': array([4.43833585, 3.50418429, 3.00179925, 2.73400455, 2.57920208,
         2.68920288, 2.51060171, 2.84880276, 3.32460289, 2.8390018 ,
         2.63320093, 2.42359638, 2.60620193, 2.85220881, 2.94559207,
         2.59919543]),
  'std_score_time': array([0.09909354, 0.77345889, 0.12902284, 0.11480099, 0.09662362,
         0.32670628, 0.044608  , 0.22409029, 0.25983936, 0.04059673,
         0.02747048, 0.30926701, 0.27091859, 0.2790255

0.9 is chosen for both. We could consider more values around 0.9, but looking at the cross-validation results, they do not seem that stable, so there is probably little improvement gained from spending more time pinpointing the exact best values, as it seems that there is lots of noise in this test.

### Tuning of 'reg_alpha' and 'reg_lambda'

Now we move onto two extra hyperparameters: 'reg_alpha' and 'reg_lambda'. These are for regularisation of the model and are analagous, respectively, to lasso and ridge regularisation. They both penalise large weights on individual trees, in L1 and L2 respectively, as these are likely to lead to an unstable model.

For both we test values $1 \times 10^{-5}$, $0.01$, $0.1$, $1$, and $100$.

In [16]:
param_test4 = {
 'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100],
}
gsearch4 = GridSearchCV(estimator = XGBClassifier( 
 objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 9, 
    min_child_weight = 1, 
    gamma = 0.4, 
    subsample=0.9, 
    colsample_bytree = 0.9,
    scale_pos_weight = posScaling, 
    learning_rate =0.5,
    n_estimators=94), 
 param_grid = param_test4, scoring='accuracy',n_jobs=4, cv=5)
gsearch4.fit(train_X,train_y)
gsearch4.cv_results_, gsearch4.best_params_, gsearch4.best_score_

({'mean_fit_time': array([491.20559878, 492.83539801, 492.56037807, 496.44072423,
         437.06539006]),
  'std_fit_time': array([ 1.42171548,  1.4584232 ,  1.13147229,  4.80763817, 64.85614399]),
  'mean_score_time': array([2.86800117, 2.9734344 , 2.74175544, 2.90321994, 2.1663938 ]),
  'std_score_time': array([0.2223736 , 0.19935924, 0.22262693, 0.37993394, 0.44581645]),
  'param_reg_alpha': masked_array(data=[1e-05, 0.01, 0.1, 1, 100],
               mask=[False, False, False, False, False],
         fill_value='?',
              dtype=object),
  'params': [{'reg_alpha': 1e-05},
   {'reg_alpha': 0.01},
   {'reg_alpha': 0.1},
   {'reg_alpha': 1},
   {'reg_alpha': 100}],
  'split0_test_score': array([0.79167778, 0.79157619, 0.79362056, 0.79505543, 0.73979404]),
  'split1_test_score': array([0.79119526, 0.7878684 , 0.79048417, 0.79238886, 0.73931152]),
  'split2_test_score': array([0.79976636, 0.79994413, 0.79963938, 0.80216627, 0.75453634]),
  'split3_test_score': array([0.78951646,

These results appear unstable; $1$ performs best, but $1 \times 10^{-5}$ is better than $0.01$. We will take this as a result of noise, meaning there is not much improvement made by this hyperparameter, but $1$ was chosen as the best so we will use that.

Now move on to 'reg_lambda'.

In [20]:
param_test6 = {
 'reg_lambda':[1e-5, 1e-2, 0.1, 1, 100]
}
gsearch6 = GridSearchCV(estimator = XGBClassifier( 
 objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 9, 
    min_child_weight = 1, 
    gamma = 0.4, 
    subsample=0.9, 
    colsample_bytree = 0.9,
    scale_pos_weight = posScaling, 
    learning_rate =0.5,
    n_estimators=94,
    reg_alpha=1), 
 param_grid = param_test6, scoring='accuracy',n_jobs=4, cv=5)
gsearch6.fit(train_X,train_y)
gsearch6.cv_results_, gsearch6.best_params_, gsearch6.best_score_

({'mean_fit_time': array([481.65821285, 480.57177215, 477.72864852, 483.7852397 ,
         466.3897366 ]),
  'std_fit_time': array([ 0.83573483,  0.6038617 ,  1.35518252,  2.92298034, 36.33852137]),
  'mean_score_time': array([2.93778987, 2.53040733, 2.55220017, 3.31840048, 2.46564975]),
  'std_score_time': array([0.18929225, 0.06296904, 0.06221348, 0.47601476, 0.3114201 ]),
  'param_reg_lambda': masked_array(data=[1e-05, 0.01, 0.1, 1, 100],
               mask=[False, False, False, False, False],
         fill_value='?',
              dtype=object),
  'params': [{'reg_lambda': 1e-05},
   {'reg_lambda': 0.01},
   {'reg_lambda': 0.1},
   {'reg_lambda': 1},
   {'reg_lambda': 100}],
  'split0_test_score': array([0.79411578, 0.790954  , 0.79442053, 0.79505543, 0.76869453]),
  'split1_test_score': array([0.7927444 , 0.79263012, 0.79053496, 0.79238886, 0.76846596]),
  'split2_test_score': array([0.8013282 , 0.79872513, 0.80000762, 0.80216627, 0.78033853]),
  'split3_test_score': array([0.786

We observe a similar result here. Again, take 1 as our choice.

### Choosing Final 'n_estimators'

Now we have finished the hyperparameter tuning, we decrease the learning rate by a factor of 10, which should improve our results, and choose our corresponding number of trees, 'n_estimators' as before. We must start with a much larger value of 10000 this time as we have reduced the learning rate a lot so we do not know what it will increase to.

In [27]:
model2 = XGBClassifier(
    objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 9, 
    min_child_weight = 1, 
    gamma = 0.4, 
    subsample=0.9, 
    colsample_bytree = 0.9,
    scale_pos_weight = posScaling, 
    learning_rate =0.05,
    n_estimators=10000,
    reg_alpha=1)

In [28]:
xgb_param = model2.get_xgb_params()
xgtrain = xgb.DMatrix(train_X.values, label=train_y.values,missing = -1)

In [29]:
cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round=model2.get_params()['n_estimators'], nfold=5,
                          early_stopping_rounds=10, metrics={'logloss'},verbose_eval=True)

[0]	train-logloss:0.67552+0.00011	test-logloss:0.67577+0.00006
[1]	train-logloss:0.65961+0.00040	test-logloss:0.66008+0.00032
[2]	train-logloss:0.64542+0.00114	test-logloss:0.64611+0.00073
[3]	train-logloss:0.63268+0.00218	test-logloss:0.63362+0.00164
[4]	train-logloss:0.62112+0.00204	test-logloss:0.62227+0.00141
[5]	train-logloss:0.61078+0.00171	test-logloss:0.61217+0.00101
[6]	train-logloss:0.60054+0.00160	test-logloss:0.60220+0.00078
[7]	train-logloss:0.59156+0.00144	test-logloss:0.59345+0.00054
[8]	train-logloss:0.58342+0.00152	test-logloss:0.58552+0.00053
[9]	train-logloss:0.57573+0.00135	test-logloss:0.57808+0.00030
[10]	train-logloss:0.56858+0.00154	test-logloss:0.57115+0.00033
[11]	train-logloss:0.56226+0.00146	test-logloss:0.56508+0.00030
[12]	train-logloss:0.55620+0.00140	test-logloss:0.55925+0.00016
[13]	train-logloss:0.55093+0.00114	test-logloss:0.55420+0.00051
[14]	train-logloss:0.54573+0.00112	test-logloss:0.54922+0.00048
[15]	train-logloss:0.54102+0.00134	test-logloss:0.

[128]	train-logloss:0.44529+0.00168	test-logloss:0.46910+0.00186
[129]	train-logloss:0.44498+0.00172	test-logloss:0.46893+0.00182
[130]	train-logloss:0.44475+0.00173	test-logloss:0.46882+0.00177
[131]	train-logloss:0.44440+0.00176	test-logloss:0.46861+0.00176
[132]	train-logloss:0.44407+0.00177	test-logloss:0.46841+0.00176
[133]	train-logloss:0.44368+0.00177	test-logloss:0.46819+0.00178
[134]	train-logloss:0.44329+0.00175	test-logloss:0.46795+0.00180
[135]	train-logloss:0.44301+0.00176	test-logloss:0.46780+0.00180
[136]	train-logloss:0.44263+0.00174	test-logloss:0.46757+0.00182
[137]	train-logloss:0.44227+0.00171	test-logloss:0.46736+0.00180
[138]	train-logloss:0.44198+0.00165	test-logloss:0.46719+0.00182
[139]	train-logloss:0.44172+0.00167	test-logloss:0.46705+0.00184
[140]	train-logloss:0.44147+0.00179	test-logloss:0.46691+0.00177
[141]	train-logloss:0.44114+0.00187	test-logloss:0.46672+0.00175
[142]	train-logloss:0.44080+0.00184	test-logloss:0.46653+0.00175
[143]	train-logloss:0.440

[255]	train-logloss:0.40934+0.00179	test-logloss:0.44823+0.00207
[256]	train-logloss:0.40908+0.00185	test-logloss:0.44808+0.00206
[257]	train-logloss:0.40881+0.00188	test-logloss:0.44794+0.00202
[258]	train-logloss:0.40858+0.00187	test-logloss:0.44782+0.00203
[259]	train-logloss:0.40829+0.00187	test-logloss:0.44765+0.00203
[260]	train-logloss:0.40806+0.00194	test-logloss:0.44752+0.00202
[261]	train-logloss:0.40781+0.00194	test-logloss:0.44737+0.00201
[262]	train-logloss:0.40758+0.00188	test-logloss:0.44724+0.00204
[263]	train-logloss:0.40731+0.00191	test-logloss:0.44710+0.00205
[264]	train-logloss:0.40705+0.00184	test-logloss:0.44696+0.00210
[265]	train-logloss:0.40689+0.00186	test-logloss:0.44687+0.00208
[266]	train-logloss:0.40671+0.00187	test-logloss:0.44678+0.00204
[267]	train-logloss:0.40648+0.00190	test-logloss:0.44666+0.00204
[268]	train-logloss:0.40629+0.00188	test-logloss:0.44656+0.00203
[269]	train-logloss:0.40602+0.00189	test-logloss:0.44641+0.00200
[270]	train-logloss:0.405

[382]	train-logloss:0.37975+0.00246	test-logloss:0.43191+0.00182
[383]	train-logloss:0.37948+0.00246	test-logloss:0.43176+0.00184
[384]	train-logloss:0.37929+0.00246	test-logloss:0.43167+0.00181
[385]	train-logloss:0.37903+0.00249	test-logloss:0.43152+0.00179
[386]	train-logloss:0.37881+0.00244	test-logloss:0.43140+0.00184
[387]	train-logloss:0.37860+0.00239	test-logloss:0.43128+0.00188
[388]	train-logloss:0.37838+0.00241	test-logloss:0.43118+0.00185
[389]	train-logloss:0.37814+0.00241	test-logloss:0.43104+0.00184
[390]	train-logloss:0.37788+0.00244	test-logloss:0.43089+0.00183
[391]	train-logloss:0.37765+0.00245	test-logloss:0.43076+0.00181
[392]	train-logloss:0.37750+0.00247	test-logloss:0.43070+0.00181
[393]	train-logloss:0.37728+0.00249	test-logloss:0.43057+0.00182
[394]	train-logloss:0.37702+0.00254	test-logloss:0.43042+0.00181
[395]	train-logloss:0.37680+0.00256	test-logloss:0.43029+0.00177
[396]	train-logloss:0.37661+0.00253	test-logloss:0.43020+0.00178
[397]	train-logloss:0.376

[509]	train-logloss:0.35343+0.00228	test-logloss:0.41788+0.00190
[510]	train-logloss:0.35326+0.00227	test-logloss:0.41780+0.00188
[511]	train-logloss:0.35307+0.00229	test-logloss:0.41770+0.00187
[512]	train-logloss:0.35286+0.00230	test-logloss:0.41759+0.00188
[513]	train-logloss:0.35270+0.00230	test-logloss:0.41751+0.00188
[514]	train-logloss:0.35255+0.00232	test-logloss:0.41744+0.00186
[515]	train-logloss:0.35236+0.00230	test-logloss:0.41734+0.00187
[516]	train-logloss:0.35218+0.00232	test-logloss:0.41726+0.00188
[517]	train-logloss:0.35202+0.00235	test-logloss:0.41719+0.00186
[518]	train-logloss:0.35183+0.00236	test-logloss:0.41709+0.00186
[519]	train-logloss:0.35158+0.00236	test-logloss:0.41695+0.00188
[520]	train-logloss:0.35139+0.00240	test-logloss:0.41685+0.00189
[521]	train-logloss:0.35119+0.00241	test-logloss:0.41676+0.00191
[522]	train-logloss:0.35103+0.00243	test-logloss:0.41669+0.00192
[523]	train-logloss:0.35086+0.00241	test-logloss:0.41659+0.00193
[524]	train-logloss:0.350

[636]	train-logloss:0.33080+0.00240	test-logloss:0.40653+0.00194
[637]	train-logloss:0.33062+0.00242	test-logloss:0.40644+0.00195
[638]	train-logloss:0.33049+0.00245	test-logloss:0.40638+0.00196
[639]	train-logloss:0.33033+0.00248	test-logloss:0.40631+0.00193
[640]	train-logloss:0.33016+0.00246	test-logloss:0.40622+0.00196
[641]	train-logloss:0.32999+0.00251	test-logloss:0.40614+0.00194
[642]	train-logloss:0.32981+0.00251	test-logloss:0.40605+0.00195
[643]	train-logloss:0.32963+0.00250	test-logloss:0.40596+0.00195
[644]	train-logloss:0.32945+0.00248	test-logloss:0.40587+0.00195
[645]	train-logloss:0.32929+0.00245	test-logloss:0.40578+0.00197
[646]	train-logloss:0.32909+0.00243	test-logloss:0.40569+0.00195
[647]	train-logloss:0.32894+0.00244	test-logloss:0.40561+0.00195
[648]	train-logloss:0.32876+0.00240	test-logloss:0.40552+0.00196
[649]	train-logloss:0.32859+0.00244	test-logloss:0.40544+0.00198
[650]	train-logloss:0.32844+0.00244	test-logloss:0.40536+0.00200
[651]	train-logloss:0.328

[763]	train-logloss:0.31008+0.00221	test-logloss:0.39652+0.00199
[764]	train-logloss:0.30988+0.00218	test-logloss:0.39641+0.00199
[765]	train-logloss:0.30976+0.00216	test-logloss:0.39637+0.00199
[766]	train-logloss:0.30962+0.00216	test-logloss:0.39631+0.00197
[767]	train-logloss:0.30949+0.00217	test-logloss:0.39626+0.00196
[768]	train-logloss:0.30936+0.00214	test-logloss:0.39620+0.00199
[769]	train-logloss:0.30922+0.00212	test-logloss:0.39613+0.00201
[770]	train-logloss:0.30908+0.00214	test-logloss:0.39606+0.00199
[771]	train-logloss:0.30895+0.00217	test-logloss:0.39600+0.00200
[772]	train-logloss:0.30880+0.00219	test-logloss:0.39593+0.00200
[773]	train-logloss:0.30865+0.00220	test-logloss:0.39586+0.00201
[774]	train-logloss:0.30850+0.00214	test-logloss:0.39579+0.00204
[775]	train-logloss:0.30834+0.00215	test-logloss:0.39571+0.00206
[776]	train-logloss:0.30819+0.00217	test-logloss:0.39564+0.00204
[777]	train-logloss:0.30800+0.00216	test-logloss:0.39555+0.00203
[778]	train-logloss:0.307

[890]	train-logloss:0.29200+0.00217	test-logloss:0.38820+0.00225
[891]	train-logloss:0.29183+0.00217	test-logloss:0.38811+0.00222
[892]	train-logloss:0.29172+0.00220	test-logloss:0.38806+0.00222
[893]	train-logloss:0.29160+0.00219	test-logloss:0.38800+0.00220
[894]	train-logloss:0.29149+0.00222	test-logloss:0.38796+0.00222
[895]	train-logloss:0.29131+0.00222	test-logloss:0.38787+0.00224
[896]	train-logloss:0.29118+0.00225	test-logloss:0.38780+0.00225
[897]	train-logloss:0.29106+0.00223	test-logloss:0.38775+0.00226
[898]	train-logloss:0.29092+0.00224	test-logloss:0.38770+0.00229
[899]	train-logloss:0.29079+0.00223	test-logloss:0.38763+0.00230
[900]	train-logloss:0.29067+0.00221	test-logloss:0.38758+0.00230
[901]	train-logloss:0.29056+0.00219	test-logloss:0.38754+0.00232
[902]	train-logloss:0.29042+0.00216	test-logloss:0.38747+0.00232
[903]	train-logloss:0.29030+0.00213	test-logloss:0.38743+0.00229
[904]	train-logloss:0.29018+0.00211	test-logloss:0.38737+0.00232
[905]	train-logloss:0.290

[1016]	train-logloss:0.27581+0.00199	test-logloss:0.38111+0.00234
[1017]	train-logloss:0.27567+0.00199	test-logloss:0.38104+0.00234
[1018]	train-logloss:0.27552+0.00202	test-logloss:0.38097+0.00234
[1019]	train-logloss:0.27541+0.00203	test-logloss:0.38094+0.00235
[1020]	train-logloss:0.27530+0.00201	test-logloss:0.38089+0.00235
[1021]	train-logloss:0.27519+0.00198	test-logloss:0.38084+0.00237
[1022]	train-logloss:0.27506+0.00201	test-logloss:0.38079+0.00236
[1023]	train-logloss:0.27490+0.00201	test-logloss:0.38072+0.00234
[1024]	train-logloss:0.27474+0.00201	test-logloss:0.38064+0.00234
[1025]	train-logloss:0.27461+0.00204	test-logloss:0.38059+0.00235
[1026]	train-logloss:0.27449+0.00207	test-logloss:0.38055+0.00233
[1027]	train-logloss:0.27440+0.00206	test-logloss:0.38052+0.00235
[1028]	train-logloss:0.27429+0.00204	test-logloss:0.38049+0.00238
[1029]	train-logloss:0.27420+0.00204	test-logloss:0.38046+0.00239
[1030]	train-logloss:0.27408+0.00205	test-logloss:0.38042+0.00239
[1031]	tra

[1141]	train-logloss:0.26111+0.00210	test-logloss:0.37514+0.00234
[1142]	train-logloss:0.26102+0.00211	test-logloss:0.37511+0.00236
[1143]	train-logloss:0.26090+0.00206	test-logloss:0.37507+0.00238
[1144]	train-logloss:0.26077+0.00209	test-logloss:0.37503+0.00239
[1145]	train-logloss:0.26068+0.00205	test-logloss:0.37499+0.00239
[1146]	train-logloss:0.26056+0.00207	test-logloss:0.37495+0.00240
[1147]	train-logloss:0.26046+0.00206	test-logloss:0.37492+0.00241
[1148]	train-logloss:0.26035+0.00205	test-logloss:0.37488+0.00241
[1149]	train-logloss:0.26023+0.00206	test-logloss:0.37484+0.00240
[1150]	train-logloss:0.26010+0.00207	test-logloss:0.37479+0.00239
[1151]	train-logloss:0.26000+0.00206	test-logloss:0.37476+0.00241
[1152]	train-logloss:0.25989+0.00203	test-logloss:0.37471+0.00241
[1153]	train-logloss:0.25975+0.00201	test-logloss:0.37466+0.00241
[1154]	train-logloss:0.25964+0.00203	test-logloss:0.37462+0.00239
[1155]	train-logloss:0.25955+0.00203	test-logloss:0.37458+0.00236
[1156]	tra

[1266]	train-logloss:0.24781+0.00172	test-logloss:0.37004+0.00253
[1267]	train-logloss:0.24771+0.00171	test-logloss:0.37001+0.00252
[1268]	train-logloss:0.24761+0.00172	test-logloss:0.36998+0.00252
[1269]	train-logloss:0.24751+0.00176	test-logloss:0.36994+0.00251
[1270]	train-logloss:0.24740+0.00176	test-logloss:0.36990+0.00252
[1271]	train-logloss:0.24729+0.00174	test-logloss:0.36986+0.00252
[1272]	train-logloss:0.24719+0.00176	test-logloss:0.36982+0.00251
[1273]	train-logloss:0.24710+0.00174	test-logloss:0.36979+0.00253
[1274]	train-logloss:0.24700+0.00176	test-logloss:0.36974+0.00252
[1275]	train-logloss:0.24688+0.00176	test-logloss:0.36969+0.00251
[1276]	train-logloss:0.24676+0.00174	test-logloss:0.36963+0.00251
[1277]	train-logloss:0.24665+0.00173	test-logloss:0.36958+0.00252
[1278]	train-logloss:0.24654+0.00174	test-logloss:0.36954+0.00251
[1279]	train-logloss:0.24644+0.00175	test-logloss:0.36951+0.00250
[1280]	train-logloss:0.24633+0.00175	test-logloss:0.36946+0.00248
[1281]	tra

[1391]	train-logloss:0.23495+0.00174	test-logloss:0.36522+0.00254
[1392]	train-logloss:0.23483+0.00177	test-logloss:0.36518+0.00253
[1393]	train-logloss:0.23475+0.00176	test-logloss:0.36515+0.00253
[1394]	train-logloss:0.23466+0.00179	test-logloss:0.36512+0.00251
[1395]	train-logloss:0.23456+0.00176	test-logloss:0.36508+0.00253
[1396]	train-logloss:0.23446+0.00178	test-logloss:0.36505+0.00252
[1397]	train-logloss:0.23435+0.00178	test-logloss:0.36501+0.00250
[1398]	train-logloss:0.23426+0.00182	test-logloss:0.36498+0.00250
[1399]	train-logloss:0.23416+0.00181	test-logloss:0.36494+0.00250
[1400]	train-logloss:0.23407+0.00181	test-logloss:0.36491+0.00250
[1401]	train-logloss:0.23399+0.00181	test-logloss:0.36489+0.00250
[1402]	train-logloss:0.23390+0.00179	test-logloss:0.36485+0.00250
[1403]	train-logloss:0.23381+0.00181	test-logloss:0.36481+0.00250
[1404]	train-logloss:0.23372+0.00180	test-logloss:0.36478+0.00249
[1405]	train-logloss:0.23365+0.00179	test-logloss:0.36476+0.00251
[1406]	tra

[1516]	train-logloss:0.22371+0.00186	test-logloss:0.36130+0.00253
[1517]	train-logloss:0.22362+0.00184	test-logloss:0.36127+0.00253
[1518]	train-logloss:0.22353+0.00184	test-logloss:0.36125+0.00253
[1519]	train-logloss:0.22344+0.00184	test-logloss:0.36122+0.00253
[1520]	train-logloss:0.22336+0.00184	test-logloss:0.36120+0.00253
[1521]	train-logloss:0.22329+0.00183	test-logloss:0.36118+0.00252
[1522]	train-logloss:0.22320+0.00180	test-logloss:0.36116+0.00255
[1523]	train-logloss:0.22311+0.00183	test-logloss:0.36113+0.00253
[1524]	train-logloss:0.22301+0.00182	test-logloss:0.36110+0.00255
[1525]	train-logloss:0.22292+0.00182	test-logloss:0.36107+0.00254
[1526]	train-logloss:0.22282+0.00181	test-logloss:0.36103+0.00255
[1527]	train-logloss:0.22274+0.00180	test-logloss:0.36100+0.00256
[1528]	train-logloss:0.22263+0.00179	test-logloss:0.36095+0.00257
[1529]	train-logloss:0.22253+0.00180	test-logloss:0.36092+0.00256
[1530]	train-logloss:0.22243+0.00179	test-logloss:0.36088+0.00256
[1531]	tra

[1641]	train-logloss:0.21293+0.00187	test-logloss:0.35768+0.00254
[1642]	train-logloss:0.21282+0.00186	test-logloss:0.35764+0.00255
[1643]	train-logloss:0.21272+0.00186	test-logloss:0.35761+0.00256
[1644]	train-logloss:0.21262+0.00186	test-logloss:0.35757+0.00255
[1645]	train-logloss:0.21254+0.00187	test-logloss:0.35755+0.00254
[1646]	train-logloss:0.21245+0.00186	test-logloss:0.35752+0.00256
[1647]	train-logloss:0.21235+0.00186	test-logloss:0.35749+0.00256
[1648]	train-logloss:0.21224+0.00185	test-logloss:0.35745+0.00257
[1649]	train-logloss:0.21215+0.00183	test-logloss:0.35741+0.00257
[1650]	train-logloss:0.21206+0.00182	test-logloss:0.35739+0.00259
[1651]	train-logloss:0.21200+0.00181	test-logloss:0.35737+0.00260
[1652]	train-logloss:0.21191+0.00183	test-logloss:0.35735+0.00260
[1653]	train-logloss:0.21183+0.00181	test-logloss:0.35732+0.00261
[1654]	train-logloss:0.21173+0.00181	test-logloss:0.35728+0.00261
[1655]	train-logloss:0.21166+0.00182	test-logloss:0.35726+0.00260
[1656]	tra

[1766]	train-logloss:0.20280+0.00180	test-logloss:0.35459+0.00257
[1767]	train-logloss:0.20272+0.00182	test-logloss:0.35456+0.00255
[1768]	train-logloss:0.20265+0.00179	test-logloss:0.35454+0.00257
[1769]	train-logloss:0.20256+0.00178	test-logloss:0.35450+0.00257
[1770]	train-logloss:0.20249+0.00178	test-logloss:0.35448+0.00257
[1771]	train-logloss:0.20241+0.00177	test-logloss:0.35446+0.00258
[1772]	train-logloss:0.20235+0.00176	test-logloss:0.35443+0.00259
[1773]	train-logloss:0.20227+0.00173	test-logloss:0.35440+0.00260
[1774]	train-logloss:0.20218+0.00173	test-logloss:0.35437+0.00260
[1775]	train-logloss:0.20212+0.00176	test-logloss:0.35436+0.00259
[1776]	train-logloss:0.20205+0.00177	test-logloss:0.35434+0.00258
[1777]	train-logloss:0.20196+0.00175	test-logloss:0.35431+0.00259
[1778]	train-logloss:0.20188+0.00177	test-logloss:0.35430+0.00259
[1779]	train-logloss:0.20183+0.00175	test-logloss:0.35429+0.00260
[1780]	train-logloss:0.20172+0.00172	test-logloss:0.35425+0.00261
[1781]	tra

[1891]	train-logloss:0.19339+0.00185	test-logloss:0.35184+0.00268
[1892]	train-logloss:0.19334+0.00185	test-logloss:0.35184+0.00267
[1893]	train-logloss:0.19327+0.00185	test-logloss:0.35181+0.00267
[1894]	train-logloss:0.19318+0.00184	test-logloss:0.35178+0.00268
[1895]	train-logloss:0.19311+0.00184	test-logloss:0.35176+0.00268
[1896]	train-logloss:0.19305+0.00184	test-logloss:0.35176+0.00269
[1897]	train-logloss:0.19296+0.00184	test-logloss:0.35172+0.00268
[1898]	train-logloss:0.19289+0.00182	test-logloss:0.35171+0.00269
[1899]	train-logloss:0.19282+0.00182	test-logloss:0.35169+0.00269
[1900]	train-logloss:0.19273+0.00182	test-logloss:0.35166+0.00269
[1901]	train-logloss:0.19265+0.00182	test-logloss:0.35164+0.00269
[1902]	train-logloss:0.19256+0.00179	test-logloss:0.35160+0.00272
[1903]	train-logloss:0.19249+0.00180	test-logloss:0.35158+0.00273
[1904]	train-logloss:0.19241+0.00180	test-logloss:0.35155+0.00272
[1905]	train-logloss:0.19234+0.00179	test-logloss:0.35153+0.00273
[1906]	tra

[2016]	train-logloss:0.18473+0.00178	test-logloss:0.34958+0.00274
[2017]	train-logloss:0.18465+0.00177	test-logloss:0.34955+0.00275
[2018]	train-logloss:0.18459+0.00177	test-logloss:0.34954+0.00275
[2019]	train-logloss:0.18453+0.00176	test-logloss:0.34952+0.00276
[2020]	train-logloss:0.18445+0.00176	test-logloss:0.34950+0.00277
[2021]	train-logloss:0.18437+0.00177	test-logloss:0.34947+0.00276
[2022]	train-logloss:0.18432+0.00179	test-logloss:0.34946+0.00275
[2023]	train-logloss:0.18426+0.00177	test-logloss:0.34944+0.00275
[2024]	train-logloss:0.18419+0.00177	test-logloss:0.34942+0.00276
[2025]	train-logloss:0.18412+0.00176	test-logloss:0.34941+0.00276
[2026]	train-logloss:0.18404+0.00176	test-logloss:0.34939+0.00276
[2027]	train-logloss:0.18395+0.00176	test-logloss:0.34936+0.00276
[2028]	train-logloss:0.18388+0.00176	test-logloss:0.34934+0.00276
[2029]	train-logloss:0.18376+0.00176	test-logloss:0.34930+0.00278
[2030]	train-logloss:0.18370+0.00173	test-logloss:0.34929+0.00280
[2031]	tra

[2141]	train-logloss:0.17644+0.00157	test-logloss:0.34758+0.00297
[2142]	train-logloss:0.17639+0.00157	test-logloss:0.34757+0.00297
[2143]	train-logloss:0.17630+0.00157	test-logloss:0.34754+0.00296
[2144]	train-logloss:0.17623+0.00158	test-logloss:0.34752+0.00297
[2145]	train-logloss:0.17617+0.00159	test-logloss:0.34751+0.00296
[2146]	train-logloss:0.17610+0.00158	test-logloss:0.34748+0.00296
[2147]	train-logloss:0.17604+0.00158	test-logloss:0.34748+0.00297
[2148]	train-logloss:0.17600+0.00157	test-logloss:0.34748+0.00298
[2149]	train-logloss:0.17596+0.00157	test-logloss:0.34748+0.00298
[2150]	train-logloss:0.17590+0.00156	test-logloss:0.34746+0.00298
[2151]	train-logloss:0.17584+0.00155	test-logloss:0.34745+0.00299
[2152]	train-logloss:0.17578+0.00154	test-logloss:0.34743+0.00299
[2153]	train-logloss:0.17569+0.00154	test-logloss:0.34740+0.00300
[2154]	train-logloss:0.17564+0.00154	test-logloss:0.34739+0.00300
[2155]	train-logloss:0.17558+0.00153	test-logloss:0.34738+0.00300
[2156]	tra

[2266]	train-logloss:0.16880+0.00139	test-logloss:0.34596+0.00315
[2267]	train-logloss:0.16874+0.00138	test-logloss:0.34595+0.00315
[2268]	train-logloss:0.16866+0.00137	test-logloss:0.34594+0.00316
[2269]	train-logloss:0.16860+0.00136	test-logloss:0.34592+0.00316
[2270]	train-logloss:0.16855+0.00135	test-logloss:0.34591+0.00317
[2271]	train-logloss:0.16848+0.00134	test-logloss:0.34589+0.00318
[2272]	train-logloss:0.16842+0.00134	test-logloss:0.34588+0.00319
[2273]	train-logloss:0.16837+0.00133	test-logloss:0.34588+0.00320
[2274]	train-logloss:0.16830+0.00134	test-logloss:0.34587+0.00319
[2275]	train-logloss:0.16826+0.00133	test-logloss:0.34587+0.00319
[2276]	train-logloss:0.16821+0.00132	test-logloss:0.34586+0.00318
[2277]	train-logloss:0.16813+0.00131	test-logloss:0.34584+0.00320
[2278]	train-logloss:0.16808+0.00132	test-logloss:0.34584+0.00320
[2279]	train-logloss:0.16803+0.00133	test-logloss:0.34584+0.00319
[2280]	train-logloss:0.16798+0.00134	test-logloss:0.34583+0.00319
[2281]	tra

[2391]	train-logloss:0.16162+0.00136	test-logloss:0.34463+0.00326
[2392]	train-logloss:0.16156+0.00135	test-logloss:0.34462+0.00326
[2393]	train-logloss:0.16151+0.00136	test-logloss:0.34462+0.00325
[2394]	train-logloss:0.16146+0.00137	test-logloss:0.34461+0.00325
[2395]	train-logloss:0.16140+0.00138	test-logloss:0.34459+0.00324
[2396]	train-logloss:0.16135+0.00137	test-logloss:0.34459+0.00325
[2397]	train-logloss:0.16131+0.00139	test-logloss:0.34458+0.00324
[2398]	train-logloss:0.16126+0.00139	test-logloss:0.34458+0.00325
[2399]	train-logloss:0.16121+0.00138	test-logloss:0.34457+0.00325
[2400]	train-logloss:0.16116+0.00138	test-logloss:0.34457+0.00324
[2401]	train-logloss:0.16112+0.00139	test-logloss:0.34458+0.00324
[2402]	train-logloss:0.16109+0.00138	test-logloss:0.34458+0.00325
[2403]	train-logloss:0.16102+0.00137	test-logloss:0.34456+0.00325
[2404]	train-logloss:0.16097+0.00139	test-logloss:0.34456+0.00324
[2405]	train-logloss:0.16091+0.00140	test-logloss:0.34454+0.00324
[2406]	tra

[2516]	train-logloss:0.15492+0.00147	test-logloss:0.34352+0.00335
[2517]	train-logloss:0.15486+0.00147	test-logloss:0.34351+0.00335
[2518]	train-logloss:0.15482+0.00147	test-logloss:0.34352+0.00335
[2519]	train-logloss:0.15476+0.00148	test-logloss:0.34351+0.00335
[2520]	train-logloss:0.15471+0.00147	test-logloss:0.34349+0.00337
[2521]	train-logloss:0.15466+0.00146	test-logloss:0.34348+0.00338
[2522]	train-logloss:0.15462+0.00147	test-logloss:0.34347+0.00338
[2523]	train-logloss:0.15458+0.00146	test-logloss:0.34347+0.00339
[2524]	train-logloss:0.15452+0.00146	test-logloss:0.34345+0.00339
[2525]	train-logloss:0.15448+0.00147	test-logloss:0.34346+0.00339
[2526]	train-logloss:0.15444+0.00145	test-logloss:0.34346+0.00340
[2527]	train-logloss:0.15439+0.00146	test-logloss:0.34345+0.00340
[2528]	train-logloss:0.15434+0.00145	test-logloss:0.34345+0.00340
[2529]	train-logloss:0.15429+0.00144	test-logloss:0.34343+0.00341
[2530]	train-logloss:0.15424+0.00146	test-logloss:0.34343+0.00339
[2531]	tra

[2641]	train-logloss:0.14870+0.00139	test-logloss:0.34265+0.00350
[2642]	train-logloss:0.14864+0.00140	test-logloss:0.34265+0.00349
[2643]	train-logloss:0.14860+0.00139	test-logloss:0.34264+0.00349
[2644]	train-logloss:0.14857+0.00139	test-logloss:0.34264+0.00349
[2645]	train-logloss:0.14854+0.00139	test-logloss:0.34266+0.00349
[2646]	train-logloss:0.14849+0.00139	test-logloss:0.34266+0.00349
[2647]	train-logloss:0.14845+0.00138	test-logloss:0.34265+0.00350
[2648]	train-logloss:0.14838+0.00139	test-logloss:0.34263+0.00350
[2649]	train-logloss:0.14834+0.00138	test-logloss:0.34262+0.00351
[2650]	train-logloss:0.14827+0.00137	test-logloss:0.34261+0.00351
[2651]	train-logloss:0.14821+0.00138	test-logloss:0.34259+0.00350
[2652]	train-logloss:0.14817+0.00138	test-logloss:0.34259+0.00351
[2653]	train-logloss:0.14811+0.00139	test-logloss:0.34258+0.00350
[2654]	train-logloss:0.14807+0.00139	test-logloss:0.34257+0.00350
[2655]	train-logloss:0.14803+0.00140	test-logloss:0.34257+0.00350
[2656]	tra

[2766]	train-logloss:0.14280+0.00135	test-logloss:0.34202+0.00357
[2767]	train-logloss:0.14276+0.00136	test-logloss:0.34203+0.00357
[2768]	train-logloss:0.14269+0.00134	test-logloss:0.34202+0.00358
[2769]	train-logloss:0.14265+0.00135	test-logloss:0.34202+0.00357
[2770]	train-logloss:0.14260+0.00135	test-logloss:0.34201+0.00357
[2771]	train-logloss:0.14255+0.00134	test-logloss:0.34201+0.00359
[2772]	train-logloss:0.14249+0.00133	test-logloss:0.34200+0.00359
[2773]	train-logloss:0.14245+0.00134	test-logloss:0.34199+0.00358
[2774]	train-logloss:0.14243+0.00133	test-logloss:0.34199+0.00359
[2775]	train-logloss:0.14239+0.00133	test-logloss:0.34199+0.00359
[2776]	train-logloss:0.14235+0.00134	test-logloss:0.34199+0.00359
[2777]	train-logloss:0.14231+0.00134	test-logloss:0.34199+0.00359
[2778]	train-logloss:0.14227+0.00134	test-logloss:0.34199+0.00360
[2779]	train-logloss:0.14222+0.00135	test-logloss:0.34198+0.00360
[2780]	train-logloss:0.14218+0.00134	test-logloss:0.34198+0.00360
[2781]	tra

[2891]	train-logloss:0.13725+0.00137	test-logloss:0.34159+0.00366
[2892]	train-logloss:0.13721+0.00137	test-logloss:0.34159+0.00367
[2893]	train-logloss:0.13716+0.00138	test-logloss:0.34159+0.00365
[2894]	train-logloss:0.13712+0.00138	test-logloss:0.34159+0.00365
[2895]	train-logloss:0.13709+0.00137	test-logloss:0.34159+0.00365
[2896]	train-logloss:0.13704+0.00137	test-logloss:0.34159+0.00365
[2897]	train-logloss:0.13699+0.00137	test-logloss:0.34159+0.00366
[2898]	train-logloss:0.13693+0.00137	test-logloss:0.34157+0.00365
[2899]	train-logloss:0.13688+0.00136	test-logloss:0.34156+0.00365
[2900]	train-logloss:0.13683+0.00139	test-logloss:0.34157+0.00363
[2901]	train-logloss:0.13681+0.00139	test-logloss:0.34157+0.00362
[2902]	train-logloss:0.13677+0.00140	test-logloss:0.34157+0.00362
[2903]	train-logloss:0.13672+0.00139	test-logloss:0.34157+0.00363
[2904]	train-logloss:0.13668+0.00139	test-logloss:0.34156+0.00362
[2905]	train-logloss:0.13664+0.00139	test-logloss:0.34156+0.00362
[2906]	tra

Our new 'n_estimators' is given below.

In [30]:
cvresult.shape[0]

2971

## Define, Train, and Make Predictions From the Final Models

So, we define a new model and train it on our two versions of the training set.

In [31]:
model3 = XGBClassifier(
    objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 9, 
    min_child_weight = 1, 
    gamma = 0.4, 
    subsample=0.9, 
    colsample_bytree = 0.9,
    scale_pos_weight = posScaling, 
    learning_rate =0.05,
    n_estimators=2971,
    reg_alpha=1)

In [32]:
model3.fit(train_X, train_y)

This is loading in the test set (with missing data), and formatting it similarly to before.

In [33]:
url='https://drive.google.com/file/d/1Fn3Gzh0ZWo9kv9XL0wlFZaaqeJWI7PKA/view'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
testdf = pd.read_csv(url)

In [34]:
y_test=testdf.iloc[:,0]
y_test=pd.DataFrame(y_test, columns=['HeartDiseaseorAttack'])
X_test=testdf.iloc[:,1:22]

Here we make the predictions, as well as the prediction probabilities.

In [35]:
y_pred_probs = model3.predict_proba(X_test)
y_pred = model3.predict(X_test)

In [37]:
y_pred

array([1, 1, 0, ..., 0, 1, 0])

Look at the accuracy quickly, to check it is in the range we would expect.

In [38]:
accuracy5 = accuracy_score(y_test, y_pred)
accuracy5

0.843455006742703

We now obtain the column of probabilities of the observation having heart disease and save it.

In [60]:
pred_probs_1s_df=pd.DataFrame(y_pred_probs[:,1],columns=['Probability of Heart Disease'])

In [61]:
pred_probs_1s_df.to_csv(r'C:\Users\Emelia\SkyDrive\Documents\DST\Extra Files\EmeliasNoMissingResults.csv', index=False)

As we would also like to look at the model on the imputed version of the test set, we will now load in the imputed training set.

In [39]:
url='https://drive.google.com/file/d/1S7em_UuD7yOiN4JMq__mlNDfcwV6NCYU/view'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
imputedTraindf = pd.read_csv(url)

In [40]:
imputedTrain_y=imputedTraindf.iloc[:,0]
imputedTrain_X=imputedTraindf.iloc[:,1:22]
imputedPosScaling=len(imputedTrain_y)/sum(imputedTrain_y)
imputedTrain_y=pd.DataFrame(imputedTrain_y, columns=['HeartDiseaseorAttack'])

Define a new model and train it on the imputed data. As mentioned previously, there is not available time to retune the hyperparameters for this version.

In [41]:
model4 = XGBClassifier(
    objective = 'binary:logistic',
    eval_metric='logloss', 
    missing=-1,
    max_depth = 9, 
    min_child_weight = 1, 
    gamma = 0.4, 
    subsample=0.9, 
    colsample_bytree = 0.9,
    scale_pos_weight = imputedPosScaling, 
    learning_rate =0.05,
    n_estimators=2971,
    reg_alpha=1)

In [42]:
model4.fit(imputedTrain_X, imputedTrain_y)

This loads in the imputed test set. Then complete the same steps to get the prediction probabilities.

In [44]:
url='https://drive.google.com/file/d/1qkbZ7ZctdRZXez9FPPHNavVRfXGD8PSf/view'
url='https://drive.google.com/uc?id=' + url.split('/')[-2]
imputedTestdf = pd.read_csv(url)

In [46]:
imputedy_test=imputedTestdf.iloc[:,0]
imputedy_test=pd.DataFrame(imputedy_test, columns=['HeartDiseaseorAttack'])
imputedX_test=imputedTestdf.iloc[:,1:22]

In [47]:
imputedy_pred_probs = model4.predict_proba(imputedX_test)
imputedy_pred = model4.predict(imputedX_test)

Looking at the accuracy, this is considerably higher than the original data, even though the hyperparameters were not tuned on this model. It may indicate that the imputation helps the boosted decision tree model a lot more than having missing values. Our analysis of the impuatation did show that is provided better results than randomly assigning 0.5 in the binary features. Although XGBoost is supposed to learn about which way to categorise missing values, it may be that this method is just not as good as the imputation method we chose. We will go on to analyse the outputs more formally, but unfortunately it seems that the hyperparameter tuning may be worthless as the other model will probably give better results.

In [48]:
imputedaccuracy5 = accuracy_score(imputedy_test, imputedy_pred)
imputedaccuracy5

0.8839797947475486

Get the prediction probabilites and save them as with the other set.

In [56]:
imputedy_pred_probs[:,1]

array([6.7051691e-01, 5.0073653e-01, 5.2544427e-01, ..., 5.4649801e-05,
       8.5179992e-02, 1.0892048e-03], dtype=float32)

In [57]:
imputedy_pred_probs_1s_df=pd.DataFrame(imputedy_pred_probs[:,1],columns=['Probability of Heart Disease'])

In [59]:
imputedy_pred_probs_1s_df.to_csv(r'C:\Users\Emelia\SkyDrive\Documents\DST\Extra Files\EmeliasResults.csv', index=False)