https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/


https://www.kaggle.com/lifesailor/xgboost

# XGBoost Advantages
## Regularization
- Standard GBM은 regularization과정이 없다.
- XGBoosts is also known as 'regularized boosting'
## Parallel Processing
- boosting은 sequential process인데 어떻게 병렬로?
- 각 트리들은 이전 트리가 있어야만 생성되므로
## High Flexibility
- allow users to define custom optimization objectives and evaluation criteria
## Handling Missing Values
- have an in-built routine to handle missing values
## Tree Pruning
- GBM : stop splitting a node when it encounters a negative loss in the split(greedy algorithm)
- XGBoost : splits upto the max_depth. start pruning the tree backwards and remove splits beyond which there is no positie gain
## Built-in CV
- run a cross-validation at each iteration
## Continue on Existing Model
- User can start training an XGBoost model from its last iteration of previous run

# XGBoost Hyperparameter

## General Parameter : Guide the overall functioning
### booster[default = gbtree] 
- Select the type of model to run at each iteration
    - gbtree : tree-based models
    - gblinear : linear models
    
### silent[default = 0]
- activated is set to 1. 실행중인 메시지 인쇄x

### nthread[default : maximum number of threads]
- 병렬 처리에 사용되는 코어 수 입력
- 아무것도 안 하면(default) 는 최대

## Booster Parameters : Guide the individual booster at each step
consider only tree booster

### eta [default = 0.3]
- learning rate
### min_child_weight[default = 1]
- child에 필요한 모든 관측치의 최소 가중치 합
- GBM의 min_child_leaf와 유사하지만, 정확히는 아님. 관측치의 최소'가중치'
- 오버피팅을 제어하는데 사용.
- 값이 클수록 specific to the particular sample selected for a tree
- too high -> under-fitting. it should be tuned using CV
### max_depth [default = 6]
- maximum depth of a tree
- 오버피팅을 제어하는데 사용. higher depth -> learn relations very specific to a particular sample
- should be tuned using CV
- 3-10
### max_leaf_nodes
- 하나의 트리에서 node 개수
- max_depth 대신 정의 가능
### gamma[default = 0]
- split할 때, 수행하는 데 필요한 최소 손실 감소 지정
### max_delta_step [delata = 0]
- 0이면, 제약 조건이 없음
- 양수 값으로 설정하면, 보수적으로 설정
### subsample [default=1]
- 관측치의 비율을 각 트리의 무작위 샘플로
- 값이 낮을수록 보수적.
- 0.5-1
### colsample_bytree [default=1]

### lambda [default=1]
- L2
- 잘 안 씀
### alpha [default=0]
- L1
### scale_pos_weight [default=1]

## Learning Task Paramters : Guide the optimization performed
### objective [default = reg:linear]
- binary:logistic(이진 분류)
- multi:softmax(다중 분류)
- multi:softprob(다중 확률)
### eval_metric[default according to objective]
- rmse
- mae
- logloss
- error
- merror
- mlogloss
- auc

# General Approach for Paramter tuning
## Choose a relatively high learning rate & optimum number of trees for this learning rate
- 0.05 - 0.3
- 'cv'를 이용하면, 각 boosting 반복마다 최적 수의 트리 반환

## Tune tree-specific parmeters
- max_depth, min_child_weight, gamma, subsample, colsample_bytree 

## regularization parameters
- lambda, alpha

## lower the learning rate

# Practice using titanic data
- 여기서는 자세한 전처리 x

In [1]:
import numpy as np
import pandas as pd
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from sklearn import metrics
from sklearn.model_selection import GridSearchCV

import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12,4

In [3]:
import os
os.listdir('./titanic')

['gender_submission.csv', 'train.csv', 'test.csv']

In [5]:
path = './titanic/'

train = pd.read_csv(path + 'train.csv')
test = pd.read_csv(path + 'test.csv')

In [3]:
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [4]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


## preprocessing

In [6]:
# 처리하기 힘든 columns 제거
del train['Ticket']; del test['Ticket']
del train['Cabin']; del test['Cabin']
del train['Name']; del test['Name']

In [7]:
train.columns

Index(['PassengerId', 'Survived', 'Pclass', 'Sex', 'Age', 'SibSp', 'Parch',
       'Fare', 'Embarked'],
      dtype='object')

In [8]:
#test에 예측값인 Survived 추가해서 합침
test.insert(loc = 1, column = 'Survived', value = 0)
total = pd.concat([train, test], axis = 0)

In [9]:
total.shape

(1309, 9)

In [10]:
#One hot encoding
sex = pd.get_dummies(total.Sex)
embarked = pd.get_dummies(total.Embarked)

#기존 칼럼 제거
del total['Sex']
del total['Embarked']

#One hot 한거 다시 합치기
total = pd.concat([total, sex, embarked], axis = 1)

In [11]:
total["Family"] = total["Parch"] + total["SibSp"]

In [12]:
train = total[:len(train)]
test = total[len(train):]

In [13]:
train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare,female,male,C,Q,S,Family
0,1,0,3,22.0,1,0,7.25,0,1,0,0,1,1
1,2,1,1,38.0,1,0,71.2833,1,0,1,0,0,1
2,3,1,3,26.0,0,0,7.925,1,0,0,0,1,0
3,4,1,1,35.0,1,0,53.1,1,0,0,0,1,1
4,5,0,3,35.0,0,0,8.05,0,1,0,0,1,0


## Prediction model

In [14]:
target = 'Survived'
IDcol = 'PassengerId'

In [15]:
def modelfit(algo, dtrain, predictors, useTrainCV = True, cv_folds = 5, early_stopping_rounds = 100) :
    
    if useTrainCV :
        xgb_param = algo.get_xgb_params()
        xgtrain = xgb.DMatrix(dtrain[predictors].values, label = dtrain[target].values)
        cvresult = xgb.cv(xgb_param, xgtrain, num_boost_round = algo.get_xgb_params()['n_estimators'], nfold = cv_folds,
                    metrics = 'error', early_stopping_rounds = early_stopping_rounds)
        algo.set_params(n_estimators = cvresult.shape[0])
        print(algo)
        
    algo.fit(dtrain[predictors], dtrain['Survived'], eval_metric = 'error')
    
    dtrain_predictions = algo.predict(dtrain[predictors])
    dtrain_predprob = algo.predict_proba(dtrain[predictors])[:,1] #predict probability
    
    print('\nModel Report')
    print('Training Accuracy : %.4g' %metrics.accuracy_score(dtrain['Survived'].values, dtrain_predictions))

## Learing rate, estimator
- max_depth = 5 : 보통 4-6
- min_child_weight = 1 : 향후에 튜닝할 것
- gamma = 9 : 0.1 - 0.2 나중에 튜닝
- subsample, colsample_bytree = 0.8 : 보통 0.5-0.9
- scale_pos_weight = 1 : because of high class imbalance

In [16]:
predictors = [x for x in train.columns if x not in [target, IDcol]]

xgb1 = XGBClassifier(
    learning_rate=0.1,
    n_estimators=1000,
    max_depth=5,
    min_child_weight=1,
    gamma = 0,
    subsample=0.8,
    colsample_bytree=0.8,
    objective= 'binary:logistic',
    nthread=-1,
    scale_pos_weight=1,
    seed=2019
)

modelfit(xgb1, train, predictors)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.8, gamma=0, learning_rate=0.1, max_delta_step=0,
       max_depth=5, min_child_weight=1, missing=None, n_estimators=178,
       n_jobs=1, nthread=-1, objective='binary:logistic', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=2019,
       silent=True, subsample=0.8)

Model Report
Training Accuracy : 0.9405


In [17]:
param_test1 = {
    'max_depth' : range(3,10,3),
    'min_child_weight' : range(1,6,2)
}

gsearch1 = GridSearchCV(estimator = XGBClassifier(learning_rate = 0.1,
                                                  n_estimators=1000,
                                                  max_depth=5,
                                                  min_child_weight = 1,
                                                  gamma = 0,
                                                  subsample = 0.8,
                                                  colsample_bytree = 0.8,
                                                  objective= 'binary:logistic',
                                                  nthread=-1,
                                                  scale_pos_weight=1,
                                                  seed=2019),
        param_grid=param_test1, scoring = 'accuracy', n_jobs = -1, iid = False, cv = 5, verbose = 10)

gsearch1.fit(train[predictors], train[target])
gsearch1.cv_results_, gsearch1.best_params_, gsearch1.best_score_

Fitting 5 folds for each of 9 candidates, totalling 45 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    2.1s
[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    2.7s
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    3.7s
[Parallel(n_jobs=-1)]: Done  35 out of  45 | elapsed:    4.7s remaining:    1.3s
[Parallel(n_jobs=-1)]: Done  40 out of  45 | elapsed:    5.0s remaining:    0.6s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:    5.4s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  45 out of  45 | elapsed:    5.4s finished


({'mean_fit_time': array([0.84293032, 0.74992938, 0.51704259, 0.73263183, 0.68282795,
         0.62738881, 0.82155585, 0.75793433, 0.58821659]),
  'mean_score_time': array([0.00770764, 0.00780239, 0.00731893, 0.01220026, 0.01102514,
         0.00910645, 0.01250825, 0.0103817 , 0.00740929]),
  'mean_test_score': array([0.8182518 , 0.8294879 , 0.84292076, 0.81147856, 0.81820772,
         0.83283993, 0.80361325, 0.81709675, 0.83395717]),
  'mean_train_score': array([0.96577285, 0.94781422, 0.93434368, 0.98344665, 0.97250576,
         0.95314303, 0.98372755, 0.97278627, 0.95370483]),
  'param_max_depth': masked_array(data=[3, 3, 3, 6, 6, 6, 9, 9, 9],
               mask=[False, False, False, False, False, False, False, False,
                     False],
         fill_value='?',
              dtype=object),
  'param_min_child_weight': masked_array(data=[1, 3, 5, 1, 3, 5, 1, 3, 5],
               mask=[False, False, False, False, False, False, False, False,
                     False],
    

## Gamma

In [19]:
param_test2 = {
    'gamma' : [i/10 for i in range(0,5)]
}
param_test2['gamma']

[0.0, 0.1, 0.2, 0.3, 0.4]

In [20]:
gsearch2 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, 
                                                  n_estimators=1000, 
                                                  max_depth=3,
                                                  min_child_weight=5, 
                                                  gamma=0, 
                                                  subsample=0.8, 
                                                  colsample_bytree=0.8,
                                                  objective= 'binary:logistic', 
                                                  thread=-1, 
                                                  scale_pos_weight=1,
                                                  seed=2019), 
                        param_grid = param_test2, scoring='accuracy', n_jobs=-1, iid=False, cv=5)
gsearch2.fit(train[predictors],train[target])
gsearch2.cv_results_, gsearch2.best_params_, gsearch2.best_score_



({'mean_fit_time': array([0.61900434, 0.57575178, 0.53889151, 0.53299499, 0.476548  ]),
  'mean_score_time': array([0.00766706, 0.00759559, 0.0075964 , 0.00803833, 0.00711441]),
  'mean_test_score': array([0.84292076, 0.84067357, 0.84068612, 0.8395499 , 0.84178447]),
  'mean_train_score': array([0.93434368, 0.93462497, 0.93097722, 0.93097761, 0.92676846]),
  'param_gamma': masked_array(data=[0.0, 0.1, 0.2, 0.3, 0.4],
               mask=[False, False, False, False, False],
         fill_value='?',
              dtype=object),
  'params': [{'gamma': 0.0},
   {'gamma': 0.1},
   {'gamma': 0.2},
   {'gamma': 0.3},
   {'gamma': 0.4}],
  'rank_test_score': array([1, 4, 3, 5, 2], dtype=int32),
  'split0_test_score': array([0.83240223, 0.83798883, 0.82681564, 0.83240223, 0.84357542]),
  'split0_train_score': array([0.93117978, 0.93258427, 0.92837079, 0.93117978, 0.92275281]),
  'split1_test_score': array([0.82122905, 0.81564246, 0.81564246, 0.81564246, 0.81005587]),
  'split1_train_score': arr

## subsample and colsample_bytree

In [21]:
param_test3 = {
 'subsample':[i/10.0 for i in range(6,10)],
 'colsample_bytree':[i/10.0 for i in range(6,10)]
}
gsearch3 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, 
                                                  n_estimators=1000, 
                                                  max_depth=3,
                                                  min_child_weight=5, 
                                                  gamma=0, 
                                                  subsample=0.8, 
                                                  colsample_bytree=0.8,
                                                  objective= 'binary:logistic', 
                                                  thread=-1, 
                                                  scale_pos_weight=1,
                                                  seed=2019), 
                        param_grid = param_test3, scoring='accuracy', n_jobs=-1, iid=False, cv=5, verbose=10)
gsearch3.fit(train[predictors],train[target])
gsearch3.cv_results_, gsearch3.best_params_, gsearch3.best_score_

Fitting 5 folds for each of 16 candidates, totalling 80 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    1.4s
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    2.3s
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    2.9s
[Parallel(n_jobs=-1)]: Done  45 tasks      | elapsed:    3.5s
[Parallel(n_jobs=-1)]: Done  56 tasks      | elapsed:    4.2s
[Parallel(n_jobs=-1)]: Done  74 out of  80 | elapsed:    5.7s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  80 out of  80 | elapsed:    5.9s finished


({'mean_fit_time': array([0.60735598, 0.585116  , 0.51571569, 0.51966867, 0.53691659,
         0.52016096, 0.51768885, 0.51958051, 0.52128882, 0.53322258,
         0.52088256, 0.50818386, 0.54736056, 0.54200907, 0.51588144,
         0.49803333]),
  'mean_score_time': array([0.00756731, 0.00754328, 0.00763516, 0.00742121, 0.00757761,
         0.007478  , 0.00746498, 0.0073431 , 0.00742784, 0.00732594,
         0.00771322, 0.00727406, 0.00746398, 0.00753517, 0.00751123,
         0.00574565]),
  'mean_test_score': array([0.83951189, 0.84287654, 0.84179709, 0.83841996, 0.84288916,
         0.83730257, 0.84177812, 0.83841996, 0.84514898, 0.84177191,
         0.84292076, 0.83729636, 0.84288274, 0.83726491, 0.83951837,
         0.83168466]),
  'mean_train_score': array([0.9163878 , 0.92059616, 0.92424116, 0.92620824, 0.91947257,
         0.92536397, 0.92873357, 0.93181873, 0.92480296, 0.92873043,
         0.93434368, 0.93434446, 0.92508267, 0.93209964, 0.93518558,
         0.93827232]),
  'pa

## subsmple

In [22]:
param_test4 = {
 'subsample':[i/100.0 for i in range(40,80)],
}
gsearch4 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, 
                                                  n_estimators=1000, 
                                                  max_depth=3,
                                                  min_child_weight=5, 
                                                  gamma=0, 
                                                  subsample=0.6, 
                                                  colsample_bytree=0.8,
                                                  objective= 'binary:logistic', 
                                                  thread=-1, 
                                                  scale_pos_weight=1,
                                                  seed=2019), 
                        param_grid = param_test4, scoring='accuracy', n_jobs=-1, iid=False, cv=5, verbose=10)
gsearch4.fit(train[predictors],train[target])
gsearch4.cv_results_, gsearch4.best_params_, gsearch4.best_score_

Fitting 5 folds for each of 40 candidates, totalling 200 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.1s
[Parallel(n_jobs=-1)]: Done  16 tasks      | elapsed:    1.3s
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:    2.3s
[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:    2.8s
[Parallel(n_jobs=-1)]: Done  45 tasks      | elapsed:    3.5s
[Parallel(n_jobs=-1)]: Done  56 tasks      | elapsed:    4.2s
[Parallel(n_jobs=-1)]: Done  69 tasks      | elapsed:    5.2s
[Parallel(n_jobs=-1)]: Done  82 tasks      | elapsed:    6.2s
[Parallel(n_jobs=-1)]: Done  97 tasks      | elapsed:    7.3s
[Parallel(n_jobs=-1)]: Done 112 tasks      | elapsed:    8.3s
[Parallel(n_jobs=-1)]: Done 129 tasks      | elapsed:    9.6s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:   10.7s
[Parallel(n_jobs=-1)]: Done 165 tasks      | elapsed:   12.1s
[Parallel(n_jobs=-1)]: Done 184 tasks      | elapsed:   

({'mean_fit_time': array([0.52663989, 0.54592228, 0.53300538, 0.51812186, 0.53084121,
         0.52108588, 0.52377338, 0.52543802, 0.51764407, 0.52498813,
         0.52609825, 0.52710471, 0.52293944, 0.5204845 , 0.5137219 ,
         0.53041739, 0.52431474, 0.52187872, 0.52965169, 0.51892619,
         0.54811273, 0.51327   , 0.52449827, 0.53194833, 0.51064906,
         0.51682243, 0.51874514, 0.50825253, 0.52831097, 0.53119144,
         0.51170964, 0.52105465, 0.51728716, 0.51327381, 0.51864862,
         0.52008214, 0.51348438, 0.51171517, 0.51946468, 0.41826086]),
  'mean_score_time': array([0.00901346, 0.00735865, 0.00731416, 0.00729771, 0.00739918,
         0.00726089, 0.00738935, 0.00744252, 0.00726428, 0.00737419,
         0.00736785, 0.00737462, 0.00728421, 0.00748019, 0.0073184 ,
         0.00734296, 0.00738125, 0.00734944, 0.00741096, 0.00739679,
         0.00738177, 0.00731111, 0.00786686, 0.00735931, 0.00734172,
         0.00732851, 0.00740452, 0.00727415, 0.00736594, 0.007419

## Regularization Parameter

In [23]:
param_test5 = {
 'reg_alpha':[1e-5, 1e-2, 0.1, 1, 100]
}
gsearch5 = GridSearchCV(estimator = XGBClassifier(learning_rate =0.1, 
                                                  n_estimators=1000, 
                                                  max_depth=3,
                                                  min_child_weight=5, 
                                                  gamma=0, 
                                                  subsample=0.67, 
                                                  colsample_bytree=0.8,
                                                  objective= 'binary:logistic', 
                                                  thread=-1, 
                                                  scale_pos_weight=1,
                                                  seed=2019), 
                        param_grid = param_test5, scoring='accuracy', n_jobs=-1, iid=False, cv=5, verbose=10)
gsearch5.fit(train[predictors],train[target])
gsearch5.cv_results_, gsearch5.best_params_, gsearch5.best_score_

Fitting 5 folds for each of 5 candidates, totalling 25 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done   2 tasks      | elapsed:    0.6s
[Parallel(n_jobs=-1)]: Done   9 tasks      | elapsed:    1.2s
[Parallel(n_jobs=-1)]: Done  13 out of  25 | elapsed:    1.2s remaining:    1.1s
[Parallel(n_jobs=-1)]: Done  16 out of  25 | elapsed:    1.3s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  19 out of  25 | elapsed:    1.7s remaining:    0.5s
[Parallel(n_jobs=-1)]: Done  22 out of  25 | elapsed:    1.8s remaining:    0.2s
[Parallel(n_jobs=-1)]: Done  25 out of  25 | elapsed:    1.9s remaining:    0.0s
[Parallel(n_jobs=-1)]: Done  25 out of  25 | elapsed:    1.9s finished


({'mean_fit_time': array([0.57420387, 0.5646049 , 0.54375162, 0.56630392, 0.40690165]),
  'mean_score_time': array([0.00761447, 0.00787315, 0.00753694, 0.00670028, 0.00249352]),
  'mean_test_score': array([0.8496246 , 0.848501  , 0.84288288, 0.84400013, 0.61616491]),
  'mean_train_score': array([0.92676453, 0.92676689, 0.92508346, 0.91638504, 0.61616182]),
  'param_reg_alpha': masked_array(data=[1e-05, 0.01, 0.1, 1, 100],
               mask=[False, False, False, False, False],
         fill_value='?',
              dtype=object),
  'params': [{'reg_alpha': 1e-05},
   {'reg_alpha': 0.01},
   {'reg_alpha': 0.1},
   {'reg_alpha': 1},
   {'reg_alpha': 100}],
  'rank_test_score': array([1, 2, 4, 3, 5], dtype=int32),
  'split0_test_score': array([0.84357542, 0.83798883, 0.83798883, 0.83798883, 0.61452514]),
  'split0_train_score': array([0.91994382, 0.91994382, 0.92275281, 0.90730337, 0.61657303]),
  'split1_test_score': array([0.83798883, 0.84357542, 0.83240223, 0.83240223, 0.61452514]),
 

## learing rate 감소

In [24]:
predictors = [x for x in train.columns if x not in [target, IDcol]]
xgb1 = XGBClassifier(
    learning_rate =0.01,
    n_estimators=5000,
    max_depth=3,
    min_child_weight=5,
    gamma=0,
    reg_alpha=1e-05,
    subsample=0.67,
    colsample_bytree=0.8,
    objective= 'binary:logistic',
    nthread=-1,
    scale_pos_weight=1,
    seed=2019
)
modelfit(xgb1, train, predictors)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=0.8, gamma=0, learning_rate=0.01, max_delta_step=0,
       max_depth=3, min_child_weight=5, missing=None, n_estimators=7,
       n_jobs=1, nthread=-1, objective='binary:logistic', random_state=0,
       reg_alpha=1e-05, reg_lambda=1, scale_pos_weight=1, seed=2019,
       silent=True, subsample=0.67)

Model Report
Training Accuracy : 0.8238
