# Extreme Gradient Boosting (XGBoost)

* **XGBoost** is an optimized version of **GBM** to increase its speed and prediction performance, scalable and integratable on different platforms


* It can be  used with R, Python HAdoop Scala, julia

* It is scalable


* It is fast


* Success of prediction  is high
 
 
* The success of this model proven in many kaggel competitions

## 1-)MODEL

In [1]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [2]:
diabetes = pd.read_csv("diabetes.csv")
df = diabetes.copy()
df = df.dropna()
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)



In [3]:
# !pip install xgboost

In [4]:
from xgboost import XGBClassifier

In [5]:
xgb_model = XGBClassifier().fit(X_train, y_train)
xgb_model

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

## 2-)Prediction

In [7]:
y_pred = xgb_model.predict(X_test)
y_pred[0:10]

array([1, 0, 0, 0, 0, 1, 0, 1, 1, 0], dtype=int64)

In [8]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [9]:
accuracy_score(y_test, y_pred) # before model tuning

0.7359307359307359

In [10]:
confusion_matrix(y_test, y_pred) # before model tuning

array([[116,  35],
       [ 26,  54]], dtype=int64)

In [11]:
print(classification_report(y_test, y_pred)) # before model tuning

              precision    recall  f1-score   support

           0       0.82      0.77      0.79       151
           1       0.61      0.68      0.64        80

    accuracy                           0.74       231
   macro avg       0.71      0.72      0.72       231
weighted avg       0.74      0.74      0.74       231



## 3-)Model Tuning

* In this section, we will try to determine the optimum **n_estimators, max_depth, learning_rate, min_samples_split, subsample**  with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **n_estimators, max_depth, learning_rate, min_samples_split, subsample**  .





* **n_estimators, max_depth, learning_rate, min_samples_split, subsample**  are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.




* **max_depth**:The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.





* **n_estimators**:  The number of trees in the forest.



* **min_samples_split** the minimum number of samples required to split an internal node:


In [13]:
xgb_model# default value of xgb_model

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [14]:
from sklearn.model_selection import GridSearchCV

In [15]:
xgb_params = {
        'n_estimators': [100, 500, 1000, 2000],
        'subsample': [0.6, 0.8, 1.0],
        'max_depth': [3, 4, 5,6],
        'learning_rate': [0.1,0.01,0.02,0.05],
        "min_samples_split": [2,5,10]}

In [17]:
xgb = XGBClassifier()

xgb_cv_model = GridSearchCV(xgb, 
                            xgb_params, 
                            cv = 10, 
                            n_jobs = -1, verbose = 2)

In [18]:
xgb_cv_model.fit(X_train, y_train)

Fitting 10 folds for each of 576 candidates, totalling 5760 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    3.2s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:   25.9s
[Parallel(n_jobs=-1)]: Done 357 tasks      | elapsed:  1.1min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed:  2.0min
[Parallel(n_jobs=-1)]: Done 1005 tasks      | elapsed:  3.3min
[Parallel(n_jobs=-1)]: Done 1450 tasks      | elapsed:  5.0min
[Parallel(n_jobs=-1)]: Done 1977 tasks      | elapsed:  6.7min
[Parallel(n_jobs=-1)]: Done 2584 tasks      | elapsed:  9.3min
[Parallel(n_jobs=-1)]: Done 3273 tasks      | elapsed: 12.2min
[Parallel(n_jobs=-1)]: Done 4042 tasks      | elapsed: 15.3min
[Parallel(n_jobs=-1)]: Done 4893 tasks      | elapsed: 18.6min


Parameters: { min_samples_split } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




[Parallel(n_jobs=-1)]: Done 5760 out of 5760 | elapsed: 22.1min finished


GridSearchCV(cv=10,
             estimator=XGBClassifier(base_score=None, booster=None,
                                     colsample_bylevel=None,
                                     colsample_bynode=None,
                                     colsample_bytree=None, gamma=None,
                                     gpu_id=None, importance_type='gain',
                                     interaction_constraints=None,
                                     learning_rate=None, max_delta_step=None,
                                     max_depth=None, min_child_weight=None,
                                     missing=nan, monotone_constraints=None,
                                     n_estimators=100, n_job...,
                                     num_parallel_tree=None, random_state=None,
                                     reg_alpha=None, reg_lambda=None,
                                     scale_pos_weight=None, subsample=None,
                                     tree_method=None, v

In [19]:
xgb_cv_model.best_params_

{'learning_rate': 0.02,
 'max_depth': 3,
 'min_samples_split': 2,
 'n_estimators': 100,
 'subsample': 0.6}

### 3.1-) Tuned Model

In [20]:
xgb = XGBClassifier(learning_rate = 0.02, 
                    max_depth = 3,
                    min_samples_split = 2,
                    n_estimators = 100,
                    subsample = 0.6)

In [21]:
xgb_tuned =  xgb.fit(X_train,y_train)

Parameters: { min_samples_split } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




In [22]:
y_pred1 = xgb_tuned.predict(X_test)
y_pred1[0:10]

array([0, 0, 0, 0, 0, 1, 0, 1, 1, 1], dtype=int64)

In [23]:
accuracy_score(y_test, y_pred1)# after model tuning

0.7575757575757576

In [24]:
confusion_matrix(y_test, y_pred1)# after model tuning

array([[124,  27],
       [ 29,  51]], dtype=int64)

In [25]:
print(classification_report(y_test, y_pred1))# after model tuning

              precision    recall  f1-score   support

           0       0.81      0.82      0.82       151
           1       0.65      0.64      0.65        80

    accuracy                           0.76       231
   macro avg       0.73      0.73      0.73       231
weighted avg       0.76      0.76      0.76       231

