### Gradient Boosted Trees

In [92]:
import pandas as pd 
import numpy as np
import pandas_profiling
import seaborn as sns
import sklearn as sk
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier  
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, roc_auc_score, f1_score,classification_report, recall_score, balanced_accuracy_score,precision_score
from sklearn.ensemble import RandomForestRegressor
import pickle

### Import train  test

We load the sample we have created in our notebook called "Sample"

In [3]:
X_ada = pd.read_csv('../data/X_ada.csv', engine = 'python')

In [4]:
y_ada = pd.read_csv('../data/y_ada.csv', engine = 'python')

In [5]:
X_test = pd.read_csv('../data/X_test.csv', engine = 'python')

In [6]:
y_test = pd.read_csv('../data/y_test.csv', engine = 'python')

### Standarization of the data


Due to having 58 variables in our dataset, we need to proceed to standarize our data, to ensure uniformity to certain practices within the industry.

In [96]:
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_ada= scaler.fit_transform(X_ada)
X_test= scaler.fit_transform(X_test)

## Model

We use the python function GradientBoostingClassifier, with a number of estimators at 200, a learning rate of 0.1, a random state 0f 40 and a loss of deviance for classification,

In [22]:
import xgboost as xgb
from sklearn import metrics

We proceed to apply the parameters selected into the model.

In [87]:
xgbmodel = xgb.XGBClassifier(max_depth=50, min_child_weight=2, n_estimators= 200, n_jobs=-1, learning_rate= 0.15)

In [97]:
xgbmodel.fit(X_ada, y_ada)

  return f(**kwargs)


XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.15, max_delta_step=0, max_depth=50,
              min_child_weight=2, missing=nan, monotone_constraints='()',
              n_estimators=200, n_jobs=-1, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

We proceed with the prediction base on the model we just built, and to calculate the following indicators:

 - Confusion Matrix 
 - Accuracy score. 
 - Recall Score.
 - Precision.
 - Roc Auc score.
 - F1 score
 


In [98]:
y_xgb = xgbmodel.predict(X_test)

### Confusion Matrix

In [99]:
confusion_matrix(y_test, y_xgb)

array([[18717,   732],
       [ 5369, 57063]], dtype=int64)

According to our confussion Matrix we can interpretate the following:

 - 18.298 True Negatives.
 - 732 False Positives.
 - 5.369 Flase Negatives.
 - 60.943 True Postives.
 
So with our model we have predicted that 732 loans where Charged Off when in reality they were fully paid and 5.369 where Fully Paid and in reality where Charged off

Of a total sample of 81.890 observations our model has predicted wrong 6.101, which is a 7% of the total

### Accuracy Score

In [100]:
accuracy_score(y_test, y_xgb)

0.9254894297822449

With this model we were able to obtain a 92,77% of accuracy, which means if we have 100 observation we are able to predict altmost 97% right. The issue with this score is when our model is imbalanced, meaning this score can deceive us into believing that a bad model is a good model. So to be certain we are going to use the balanced_accuracy.

In [101]:
balanced_accuracy_score(y_test, y_xgb, sample_weight=None, adjusted=False)

0.9381827690751816

We can see that our score has drop down to 93%, but still is a pretty great model.

### Recall Score

In [102]:
recall_score(y_test, y_xgb)

0.914002434648898

The ratio is number of true positives/(true positives + false negatives), it informs us about the quantity that our model can predict being 1 the best value and 0 the worst values, in our case we have obtain an outstanding result

### Precision

We measure the quality of our model, the formula is TruePositive/(TruePositives+FalsePositives)

In [45]:
precision_score(y_test, y_xgb)

0.9814635874641672

The precision is intuitively the ability of the classifier not to label as positive a sample that is negative being best value 1 and worst value 0.

### ROC_AUC

The ROC  is created by  the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity, recall or probability of detection in machine learning. 

In [31]:
roc_auc_score(y_test, y_xgb)

0.9584848153342947

### Classification Report

We uild a text report showing the main classification metrics.

In [34]:
classif = classification_report(y_test, y_xgb)
print(classif)

              precision    recall  f1-score   support

           0       0.92      0.94      0.93     19449
           1       0.98      0.98      0.98     62432

    accuracy                           0.97     81881
   macro avg       0.95      0.96      0.96     81881
weighted avg       0.97      0.97      0.97     81881



Is a breef summary of what we have shown above.

### Hyperparameter

By utilizing weak learners (aka “stumps”), boosting algos like AdaBoost (documentation) and Gradient Boosting (documentation) focus on what the model misclassifies. By overweighting these misclassified data points, the model focuses on what it got wrong in order to learn how to get them right.
Similar to Decision Trees and Random Forests, we will focus on the bias-variance tradeoff usual suspects.

- N_estimators is the maximum number of estimators at which boosting is terminated. If a perfect fit is reached, the algo is stopped. The default here is 50. Bias-Variance Tradeoff: the higher the number of estimators in your model the lower the bias.

Other Important Parameters

- Learning_rate is the rate at which we are adjusting the weights of our model with respect to the loss gradient. In layman’s terms: the lower the learning_rate, the slower we travel along the slope of the loss function. Important note: there is a trade-off between learning_rate and n_estimators as a tiny learning_rate and a large n_estimators will not necessarily improve results relative to the large computational costs.

- Base_estimator (AdaBoost) / Loss (Gradient Boosting) is the base estimator from which the boosted ensemble is built. For AdaBoost the default value is None, which equates to a Decision Tree Classifier with max depth of 1 (a stump). For Gradient Boosting the default value is deviance, which equates to Logistic Regression. If “exponential” is passed, the AdaBoost algorithm is used.



In [57]:
import xgboost as xgb
model=xgb.XGBClassifier()
param = {
    "max_depth": [10,20,40,50],
    "min_child_weight": [1,3,6],
    "n_estimators": [200],
    "learning_rate": [0.001, 0.01, 0.1, 0.2, 0.5]
}

grsearch = GridSearchCV(model, param_grid=param, cv= 4, verbose=10, n_jobs=-1)

In [59]:
grsearch.fit(X_ada, y_ada)

Fitting 4 folds for each of 60 candidates, totalling 240 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed: 15.7min
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed: 16.1min
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed: 52.8min
[Parallel(n_jobs=-1)]: Done  26 tasks      | elapsed: 92.7min
[Parallel(n_jobs=-1)]: Done  37 tasks      | elapsed: 140.9min
[Parallel(n_jobs=-1)]: Done  48 tasks      | elapsed: 154.0min
[Parallel(n_jobs=-1)]: Done  61 tasks      | elapsed: 201.5min
[Parallel(n_jobs=-1)]: Done  74 tasks      | elapsed: 249.3min
[Parallel(n_jobs=-1)]: Done  89 tasks      | elapsed: 306.2min
[Parallel(n_jobs=-1)]: Done 104 tasks      | elapsed: 328.0min
[Parallel(n_jobs=-1)]: Done 121 tasks      | elapsed: 413.1min
[Parallel(n_jobs=-1)]: Done 138 tasks      | elapsed: 479.3min
[Parallel(n_jobs=-1)]: Done 157 tasks      | elapsed: 523.6min
[Parallel(n_jobs=-1)]: Done 176 tasks      | elapsed: 581.0min
[Parallel(n_jobs=-1)]: Done 197 tasks      | 

GridSearchCV(cv=4,
             estimator=XGBClassifier(base_score=None, booster=None,
                                     colsample_bylevel=None,
                                     colsample_bynode=None,
                                     colsample_bytree=None, gamma=None,
                                     gpu_id=None, importance_type='gain',
                                     interaction_constraints=None,
                                     learning_rate=None, max_delta_step=None,
                                     max_depth=None, min_child_weight=None,
                                     missing=nan, monotone_constraints=None,
                                     n_estimators=100, n_jobs=None,
                                     num_parallel_tree=None, random_state=None,
                                     reg_alpha=None, reg_lambda=None,
                                     scale_pos_weight=None, subsample=None,
                                     tree_method=None,

In [60]:
grsearch.best_estimator_

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.2, max_delta_step=0, max_depth=10,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=200, n_jobs=0, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [72]:
grsearch.best_score_

0.9706727305600054

Comparing both models, we can see that the hyperparameter model has improve by 0.01% in all parameters.

So we have built a robust model.

In [104]:
pickle.dump(xgbmodel, open("xgbmodel", "wb"))