## Boosting 

#### Table of Contents

- [Preliminaries](#Preliminaries)
- [AdaBoost](#AdaBoost)
- [Null Model](#Null-Model)
- [Manual Gradient Boosting](#Manual-Gradient-Boosting)
- [Gradient Boosting](#Gradient-Boosting)
- [Extreme Gradient Boosting](#Extreme-Gradient-Boosting)


Today, we shall predict `pct_d_rgdp`.
We will fit 

1. an AdaBoost model
2. a manual gradient boosting model
3. `skearn`'s gradient boosting model
4. and finally `xgboost`'s EXTREME!!!!!!!!!! gradient boosting model


We are performing a regression problem.

```
conda install -c conda-forge xgboost
```
****************

```
def r2(yhat, y):
    SSres = ((yhat - y)**2).sum()
    SStot = ((y - y.mean())**2).sum()
    r2 = 1 - SSres/SStot
    return r2
```

We have been using RMSE, MSE, and MAE to evaluate the performance of regression problems. 
Now we are going to use $R^2$ to take advantage of `sklearn`'s `.score()` method.

HOWEVER! Let's grab our metric functions from our helper script so I can make a point!

In [None]:
%run metrics.py

In [None]:
# utilities
import pandas as pd
import numpy as np

# processing
from sklearn.metrics import plot_confusion_matrix
from sklearn.model_selection import GridSearchCV, train_test_split

# algorithms
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
import xgboost as xgb

# plotting
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_pickle('C:/Users/johnj/Documents/Data/aml in econ 02 spring 2021/class data/class_data.pkl')
df = df.drop(columns = 'urate_bin').join([
    pd.get_dummies(df['urate_bin'], drop_first = True)
])

In [None]:
y = df['pct_d_rgdp']
x = df.drop(columns = 'pct_d_rgdp')

x_train, x_test, y_train, y_test = train_test_split(x, y,
                                                   train_size = 2/3,
                                                   random_state = 490)

*****
# Null Model
[TOP](#Boosting)

In [None]:
r2_null = r2(np.mean(y_train), y_test)

In [None]:
rmse_null = rmse(sum(y_train)/len(y_train), y_test)
rmse_null

***************
# AdaBoost
[TOP](#Boosting)

In [None]:
reg_ada = AdaBoostRegressor(base_estimator = DecisionTreeRegressor(max_depth = 1),
                           n_estimators = 200, 
                           learning_rate = 0.5)
reg_ada.fit(x_train, y_train)

Let's see how we did!

In [None]:
r2_ada = reg_ada.score(x_test, y_test)
r2_ada

What? A negative $R^2$?!

In [None]:
rmse_ada = rmse(reg_ada.predict(x_test), y_test)
rmse_ada

We have overfit our training data. 
Perhaps we should take a look at some good ol' cross-validation.

In [None]:
%%time
param_grid = {
    'n_estimators': [15, 25, 50, 75],
    'learning_rate': 10.**np.arange(-6, -2)
}

ada_cv = AdaBoostRegressor(base_estimator = DecisionTreeRegressor(max_depth = 1),
                          random_state = 490)

grid_search = GridSearchCV(ada_cv, param_grid,
                          scoring = 'r2',
                          cv = 5,
                          n_jobs = 10).fit(x_train, y_train)
best = grid_search.best_params_
best

In [None]:
reg_ada_best = AdaBoostRegressor(base_estimator = DecisionTreeRegressor(max_depth = 1),
                                n_estimators = best['n_estimators'],
                                learning_rate = best['learning_rate'])
reg_ada_best.fit(x_train, y_train)
r2_ada_best = reg_ada_best.score(x_test, y_test)
r2_ada_best

We explained 1% of the variation in the test data... Yikes...

*****
# Manual Gradient Boosting
[TOP](#Boosting)

Our textbook lays out how to manually fit a gradient descent problem. 
Since the mid-semester feedback told me most students are not reading the textbook, let's demonstrate how to do it.

In [None]:
reg1 = DecisionTreeRegressor(max_depth = 2).fit(x_train, y_train)
y_train2 = y_train - reg1.predict(x_train)

reg2 = DecisionTreeRegressor(max_depth = 2).fit(x_train, y_train2)
y_train3 = y_train2 - reg2.predict(x_train)

reg3 = DecisionTreeRegressor(max_depth = 2).fit(x_train, y_train3)

yhat = sum(reg.predict(x_test) for reg in (reg1, reg2, reg3))
r2_manual = r2(yhat, y_test)
r2_manual

Better than AdaBoost...

********
# Gradient Boosting
[TOP](#Boosting)

Fortunately, unlike `AdaBoostRegressor()`, `GradientBoostingRegressor()` has early stopping. 

This means cross-validation is not necessary!! WOOO!!!!!

In [None]:
reg_gb = GradientBoostingRegressor(n_estimators = 200,
                         max_depth = 2,
                         learning_rate = 0.1,
                         validation_fraction = 1/8,
                         random_state = 490,
                         n_iter_no_change = 4,
                                  verbose = 2)
reg_gb.fit(x_train, y_train)

Let's see how we did!

In [None]:
r2_gb = reg_gb.score(x_test, y_test)
r2_gb

Relatively speaking, not to shabby.
Not super exciting either.

*****
# Extreme Gradient Boosting
[TOP](#Boosting)

Now we get to test to see if extreme gradient boosting is all that it is made out to be!

In [None]:
x_train_train, x_train_test, y_train_train, y_train_test = train_test_split(x_train, y_train, 
                                                                           train_size = 4/5,
                                                                           random_state = 490)

In [None]:
reg_xgb = xgb.XGBRegressor(n_estimators = 200,
                         max_depth = 2,
                         learning_rate = 0.1,
                         verbosity = 1,
                         random_state = 490)
reg_xgb.fit(x_train_train, y_train_train,
           eval_set = [(x_train_test, y_train_test)],
           early_stopping_rounds = 4)

In [None]:
r2_xgb = reg_xgb.score(x_test, y_test)
r2_xgb

Well, there you have it.

Let us sloppily print out these values for a conclusion.

In [None]:
print('R2 Null', r2_null)
print('R2 ada', r2_ada)
print('R2 ada cv', r2_ada_best)
print('R2 manual gb', r2_manual)
print('R2 gb', r2_gb)
print('R2 xgb', r2_xgb)