# Boosting

It is an Ensemble Method that combines Weak Learners into a Strong Learner. The general idea of most boosting methods is to train predictors sequentially, each trying to correct its predecessor.

## Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as pimg
from sklearn.tree import DecisionTreeClassifier,DecisionTreeRegressor,export_graphviz
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingRegressor
import xgboost
from sklearn.datasets import load_boston, load_wine
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.model_selection import train_test_split

from IPython.display import display, Math, Latex, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

## Importing and Understanding Datasets

### Wine Recognition Dataset

**Description:**
- Alcohol
- Malic acid
- Ash
- Alcalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline

This is a Classification Problem

In [2]:
ClassificationDataset = load_wine()
Xc = ClassificationDataset['data']
yc = ClassificationDataset['target']
Featuresc = ClassificationDataset['feature_names']
TargetsNames = ClassificationDataset['target_names']

Xc_train, Xc_test, yc_train, yc_test = train_test_split(Xc, yc, test_size=0.33, stratify=yc)

dfc = pd.DataFrame(data=np.append(Xc,np.expand_dims(yc,axis=-1),axis=1), columns=Featuresc + ['Class'])
dfc

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,Class
0,14.23,1.71,2.43,15.6,127.0,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0.0
1,13.20,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050.0,0.0
2,13.16,2.36,2.67,18.6,101.0,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185.0,0.0
3,14.37,1.95,2.50,16.8,113.0,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480.0,0.0
4,13.24,2.59,2.87,21.0,118.0,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95.0,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740.0,2.0
174,13.40,3.91,2.48,23.0,102.0,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750.0,2.0
175,13.27,4.28,2.26,20.0,120.0,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835.0,2.0
176,13.17,2.59,2.37,20.0,120.0,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840.0,2.0


### Boston House Prices Dataset
**Description:**
- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000$
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in 1000’s

This is a Regression Problem

In [3]:
RegressionDataset = load_boston()
Xr = RegressionDataset['data']
yr = RegressionDataset['target']
Featuresr = RegressionDataset['feature_names'].tolist()


Xr_train, Xr_test, yr_train, yr_test = train_test_split(Xr, yr, test_size=0.33)

dfr = pd.DataFrame(data=np.append(Xr,np.expand_dims(yr,axis=-1),axis=1), columns= Featuresr + ['MEDV'])
dfr

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.0900,1.0,296.0,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0.0,0.573,6.593,69.1,2.4786,1.0,273.0,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0.0,0.573,6.120,76.7,2.2875,1.0,273.0,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0.0,0.573,6.976,91.0,2.1675,1.0,273.0,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0.0,0.573,6.794,89.3,2.3889,1.0,273.0,21.0,393.45,6.48,22.0


## AdaBoost
One way for a new predictor to correct its predecessor is to pay a bit more attention
to the training instances that the predecessor underfitted. This results in new predictors focusing more and more on the hard cases. This is the technique used by AdaBoost.

**AdaBoost Sequential Training:**

<img src="Images/AdaBoost.png" style="width:750px; height:500px">

This sequential learning technique has some similarities with Gradient Descent, except that instead of tweaking a single predictor’s parameters to minimize a cost function, AdaBoost adds predictors to the ensemble, gradually making it better. Once all predictors are trained, the ensemble makes predictions very much like bagging or pasting, except that predictors have different weights depending on their overall accuracy on the weighted training set.


In [4]:
Classifier = AdaBoostClassifier(DecisionTreeClassifier(max_depth=2), n_estimators=1000,
algorithm="SAMME.R", learning_rate=0.1)
Classifier.fit(Xc_train,yc_train)
accuracy_score(yc_test, Classifier.predict(Xc_test))

0.9152542372881356

## Gradient Boosting

Just like AdaBoost, Gradient Boosting works by sequentially adding predictors to an ensemble, each one correcting its predecessor. However, instead of tweaking the instance weights at every iteration like AdaBoost does, this method tries to fit the new predictor to the residual errors made by the previous predictor.

In [5]:
Regressor = GradientBoostingRegressor(max_depth=3, n_estimators=200, learning_rate=0.25)
Regressor.fit(Xr_train,yr_train)
mean_squared_error(yr_test,Regressor.predict(Xr_test))

7.609690966776319

### XGBoost

In [6]:
xgbRegressor = xgboost.XGBRegressor()
xgbRegressor.fit(Xr_train, yr_train)
mean_squared_error(yr_test,xgbRegressor.predict(Xr_test))

8.568328728519727