# Gradient Boosting Machines (GBM)

* **Adaptive Boosting(ADABOOST)** :An algorithm that combines weak classifiers to form a powerful classifier.


* The boosting methods should generally be seen as an optimization of residuals.


* In the **BAGGING method**, trees are created independently from each other.



* In the **ADABOOST method**, trees are created interdependently.

**Adaboosting Algorithm Visualization**
![alt text](https://miro.medium.com/proxy/1*m2UHkzWWJ0kfQyL5tBFNsQ.png)

* **Gradient Boosting Machines (GBM)** : Generalized version of **Adaboost Algoritm** that is easily adapted to classification and regression problems.



*  A series of models in the form of a single predictive model is built on the residuals.




* GBM creates a series of models in the form of a single predictive model.





* A model in the series is created by fitting  on the residuals of a previous model in the series.




* GMB uses the **GRADIENT DESCENT** algorithm that can optimize any loss function that can be derived.




* GMB consists of  **Boosting**  and **Gradient Descent** combination


## 1-)MODEL

In [1]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [2]:
diabetes = pd.read_csv("diabetes.csv")
df = diabetes.copy()
df = df.dropna()
y = df["Outcome"]
X = df.drop(['Outcome'], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.30, 
                                                    random_state=42)



In [3]:
from sklearn.ensemble import GradientBoostingClassifier

In [4]:
gbm_model = GradientBoostingClassifier()
gbm_model.fit(X_train, y_train)
gbm_model

GradientBoostingClassifier()

## 2-)Prediction

In [7]:
y_pred = gbm_model.predict(X_test)
y_pred[0:10]

array([1, 0, 0, 0, 0, 1, 0, 1, 0, 1], dtype=int64)

In [8]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

In [9]:
accuracy_score(y_test, y_pred) # before model tuning

0.7445887445887446

In [10]:
confusion_matrix(y_test, y_pred) # before model tuning

array([[119,  32],
       [ 27,  53]], dtype=int64)

In [11]:
print(classification_report(y_test, y_pred)) # before model tuning

              precision    recall  f1-score   support

           0       0.82      0.79      0.80       151
           1       0.62      0.66      0.64        80

    accuracy                           0.74       231
   macro avg       0.72      0.73      0.72       231
weighted avg       0.75      0.74      0.75       231



## 3-)Model Tuning

* In this section, we will try to determine the optimum **n_estimators, max_depth, learning_rate, min_samples_split**  with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **n_estimators, max_depth, learning_rate, min_samples_split**  .





* **n_estimators, max_depth, learning_rate, min_samples_split**  are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.




* **max_depth**:The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.





* **n_estimators**:  The number of trees in the forest.



* **min_samples_split** the minimum number of samples required to split an internal node:


In [12]:
 gbm_model.learning_rate# default value of  learning_rate

0.1

In [13]:
print( gbm_model.max_depth)# default value of max depth

3


In [14]:
 gbm_model.min_samples_split# default value of min_samples_split

2

In [15]:
 gbm_model.n_estimators#default value of n_estimators

100

In [18]:
from sklearn.model_selection import GridSearchCV

In [19]:
gbm_params = {"learning_rate" : [0.001, 0.01, 0.1, 0.05],
             "n_estimators": [100,500,100],
             "max_depth": [3,5,10],
             "min_samples_split": [2,5,10]}

In [20]:
gbm = GradientBoostingClassifier()

gbm_cv = GridSearchCV(gbm, gbm_params, cv = 10, n_jobs = -1, verbose = 2)

In [21]:
gbm_cv.fit(X_train, y_train)

Fitting 10 folds for each of 108 candidates, totalling 1080 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    6.0s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:   27.4s
[Parallel(n_jobs=-1)]: Done 357 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed:  2.9min
[Parallel(n_jobs=-1)]: Done 1005 tasks      | elapsed:  4.9min
[Parallel(n_jobs=-1)]: Done 1080 out of 1080 | elapsed:  5.5min finished


GridSearchCV(cv=10, estimator=GradientBoostingClassifier(), n_jobs=-1,
             param_grid={'learning_rate': [0.001, 0.01, 0.1, 0.05],
                         'max_depth': [3, 5, 10],
                         'min_samples_split': [2, 5, 10],
                         'n_estimators': [100, 500, 100]},
             verbose=2)

In [22]:
gbm_cv.best_params_

{'learning_rate': 0.1,
 'max_depth': 3,
 'min_samples_split': 5,
 'n_estimators': 100}

### 3.1-) Tuned Model

In [23]:
gbm = GradientBoostingClassifier(learning_rate = 0.1, 
                                 max_depth = 3,
                                min_samples_split = 5,
                                n_estimators = 100)

In [24]:
gbm_tuned =  gbm.fit(X_train,y_train)

In [25]:
y_pred1 = gbm_tuned.predict(X_test)
y_pred1[0:10]

array([1, 0, 0, 0, 0, 1, 0, 1, 0, 0], dtype=int64)

In [26]:
accuracy_score(y_test, y_pred1)# after model tuning

0.7445887445887446

In [27]:
confusion_matrix(y_test, y_pred1)# after model tuning

array([[118,  33],
       [ 26,  54]], dtype=int64)

In [28]:
print(classification_report(y_test, y_pred1))# after model tuning

              precision    recall  f1-score   support

           0       0.82      0.78      0.80       151
           1       0.62      0.68      0.65        80

    accuracy                           0.74       231
   macro avg       0.72      0.73      0.72       231
weighted avg       0.75      0.74      0.75       231

