# Gradient Boosting Machines (GBM)

* **Adaptive Boosting(ADABOOST)** :An algorithm that combines weak classifiers to form a powerful classifier.


* The boosting methods should generally be seen as an optimization of residuals.


* In the **BAGGING method**, trees are created independently from each other.



* In the **ADABOOST method**, trees are created interdependently.

**Adaboosting Algorithm Visualization**
![alt text](https://miro.medium.com/proxy/1*m2UHkzWWJ0kfQyL5tBFNsQ.png)


* **Gradient Boosting Machines (GBM)** : Generalized version of **Adaboost Algoritm** that is easily adapted to classification and regression problems.



*  A series of models in the form of a single predictive model is built on the residuals.




* GBM creates a series of models in the form of a single predictive model.





* A model in the series is created by fitting  on the residuals of a previous model in the series.




* GMB uses the **GRADIENT DESCENT** algorithm that can optimize any loss function that can be derived.




* GMB consists of  **Boosting**  and **Gradient Descent** combination


## 1-)Data Preprocessing

In [1]:
import numpy as np
import pandas as pd 
from sklearn.model_selection import train_test_split

In [2]:
hit = pd.read_csv("Hitters.csv")
df = hit.copy()
df = df.dropna()
dms = pd.get_dummies(df[['League', 'Division', 'NewLeague']])
y = df["Salary"]
X_ = df.drop(['Salary', 'League', 'Division', 'NewLeague'], axis=1).astype('float64')
X = pd.concat([X_, dms[['League_N', 'Division_W', 'NewLeague_N']]], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.25, 
                                                    random_state=42)

## 2-) Model

In [3]:
from sklearn.ensemble import GradientBoostingRegressor

In [5]:
gbm_model = GradientBoostingRegressor(random_state = 42)
gbm_model.fit(X_train, y_train)

GradientBoostingRegressor(random_state=42)

* GBM can be both a linear method and a tree-based method.We will use the tree based method

## 3-) Prediction

In [6]:
from sklearn.metrics import mean_squared_error

In [8]:
y_pred = gbm_model.predict(X_test)

In [9]:
test_error_before=np.sqrt(mean_squared_error(y_test, y_pred))
test_error_before #test error before  model tuning

355.2571883779714

## 4-) Model Tuning

* In this section, we will try to determine the optimum **learning_rate, max_depth, subsample ,n_estimators**   with the GridSearchCV method.


* GridSearchCV: Grid Search Cross Validation Methode



* Then , we will create the most optimum model by using optimum **learning_rate, max_depth, subsample ,n_estimators** .





* **learning_rate, max_depth, subsample ,n_estimators**  are the hyperparameters that we will determine according to ourselves and we want it to be the most optimum.



* But instead of relying on our own feeling and sense in order to find the  optimum value of these hyperparameters   , we will find the optimum value of these hyperparameters   by using the gridsearch method.


* **max_features** ==>> Maximum number of independent variables to be used


* **n_estimators** ==>> number of trees to be used

In [10]:
from sklearn.model_selection import GridSearchCV

In [11]:
gbm_params = {
    'learning_rate': [0.001, 0.01, 0.1, 0.2],
    'max_depth': [3, 5, 8,50,100],
    'n_estimators': [200, 500, 1000, 2000],
    'subsample': [1,0.5,0.75],
}

In [12]:
gbm = GradientBoostingRegressor(random_state = 42)



In [13]:
gbm_cv_model = GridSearchCV(gbm, gbm_params, cv = 10, n_jobs = -1, verbose = 2)

* For the most optimum model, multiple parameters are crossed with each other to find the best parameters.




* This process takes a long time. The **n_jobs=-1** variable can be used to reduce this time. This variables allows the processor to run at full performance.



* The _**verbose**_ parameter shows us in detail what operations have been performed for how long and as shown below.




In [14]:
gbm_cv_model.fit(X_train, y_train)

Fitting 10 folds for each of 240 candidates, totalling 2400 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:    4.6s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:   40.4s
[Parallel(n_jobs=-1)]: Done 357 tasks      | elapsed:  2.4min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed:  4.9min
[Parallel(n_jobs=-1)]: Done 1005 tasks      | elapsed:  7.5min
[Parallel(n_jobs=-1)]: Done 1450 tasks      | elapsed: 11.5min
[Parallel(n_jobs=-1)]: Done 1977 tasks      | elapsed: 16.2min
[Parallel(n_jobs=-1)]: Done 2400 out of 2400 | elapsed: 20.6min finished


GridSearchCV(cv=10, estimator=GradientBoostingRegressor(random_state=42),
             n_jobs=-1,
             param_grid={'learning_rate': [0.001, 0.01, 0.1, 0.2],
                         'max_depth': [3, 5, 8, 50, 100],
                         'n_estimators': [200, 500, 1000, 2000],
                         'subsample': [1, 0.5, 0.75]},
             verbose=2)

In [15]:
gbm_cv_model.best_params_

{'learning_rate': 0.01, 'max_depth': 5, 'n_estimators': 2000, 'subsample': 0.5}

### 4.1)Tuned Model

In [16]:
gbm_tuned = GradientBoostingRegressor(learning_rate = 0.1,  
                                      max_depth = 5, 
                                      n_estimators = 200, 
                                      subsample = 0.5)



In [17]:
gbm_tuned = gbm_tuned.fit(X_train,y_train)

In [20]:
y_pred1 = gbm_tuned.predict(X_test)

In [21]:
test_error_after=np.sqrt(mean_squared_error(y_test, y_pred1))
test_error_after # test error after model tuning

323.0101155611289