# Ensemble : Boosting Introduction

<br>

**Idea** : Combine multiple weak learners to form a strong learner to increase the model performance.


<img src="https://media.giphy.com/media/LPYocbQ5kIdXwPLpVv/giphy.gif" width=300 align=right> 

**Bagging** : Models _(high var, low bias)_ + randomization + aggregation

**Boosting** : Models _(low var, high bias)_ + additively combine

<br>

**Note : Bagging is Parallel. Boosting is Sequential.**

<br>

Most Popular Boosting algorithms are: 
- Gradient Boosting
- Adaptive Boosting

# Boosting Intuition

**Idea :** Boosting reduces high bias, while keeping the variance same.

| Hours Studied   | Bunked Lectures   | Assignment Submitted   | Marks   |
|:---------------:|:-----------------:|:----------------------:|:-------:|
| 7               | 2                 | 9                      | 93      |
| 2               | 5                 | 4                      | 65      |
| 5               | 3                 | 7                      | 77      |
| 6               | 1                 | 8                      | 85      |

# Boosting Example Walkthrough

| Hours Studied   | Bunked Lectures   | Assignment Submitted   | Marks   |
|:---------------:|:-----------------:|:----------------------:|:-------:|
| 7               | 2                 | 9                      | 93      |
| 2               | 5                 | 4                      | 65      |
| 5               | 3                 | 7                      | 77      |
| 6               | 1                 | 8                      | 85      |

# Concept of Pseudo-residuals

<img src="https://media.giphy.com/media/3oKIPlLZEbEbacWqOc/giphy.gif" width=300>

# Gradient Boosting Algorithm

# Bias Variance Tradeoff
- regularization

<img src="https://media.giphy.com/media/l0G17EPaALmQmZjjy/giphy.gif" width=300 align=left>

# Gradient Boosting Code

In [1]:
from sklearn.datasets import make_regression
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
np.set_printoptions(legacy='1.25')

In [2]:
X, y = make_regression(n_samples=1000, n_features=10, n_informative=6, noise=2.0)

In [9]:
print(X)

[[ 1.507351   -0.00506205  1.29872142 ...  0.08719195 -0.78098537
   0.80148943]
 [ 0.44405516 -0.73482864  0.9339078  ... -0.97654031 -0.69888858
   0.6193573 ]
 [-1.28235512 -0.81314143 -0.66896698 ... -0.82044585 -0.59066225
   0.81241286]
 ...
 [-1.21430384  0.1142811   0.21154294 ... -0.81475177  1.60753394
  -0.60635252]
 [-0.40569563 -0.49018682  0.74922406 ...  1.62318296 -0.5053878
  -0.00802841]
 [-0.03578656  0.26306256 -1.44409351 ...  0.09394738  1.54759341
   0.45719034]]


In [10]:
X.shape

(1000, 10)

In [11]:
y.shape

(1000,)

In [12]:
from sklearn.model_selection import train_test_split

In [13]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [14]:
from sklearn.ensemble import GradientBoostingRegressor

In [15]:
model = GradientBoostingRegressor()

In [16]:
model.fit(X_train, y_train)

0,1,2
,loss,'squared_error'
,learning_rate,0.1
,n_estimators,100
,subsample,1.0
,criterion,'friedman_mse'
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_depth,3
,min_impurity_decrease,0.0


In [18]:
model.score(X_train, y_train)

0.993513588275537

In [19]:
model.score(X_test, y_test)

0.9664439723140695

In [20]:
M = [10, 50, 100, 200, 500, 1000]

In [21]:
train_scores = []
test_scores = []

In [22]:
for m in M:
    model = GradientBoostingRegressor(n_estimators=m)
    model.fit(X_train, y_train)
    tr_sc = model.score(X_train, y_train)
    te_sc = model.score(X_test, y_test)
    
    train_scores.append(tr_sc)
    test_scores.append(te_sc)

In [23]:
train_scores

[0.6741815776737647,
 0.9749896503820287,
 0.993513588275537,
 0.9968202383138657,
 0.999330411954021,
 0.9999327176508322]

In [24]:
test_scores

[0.645021763665979,
 0.9376374195682717,
 0.9660410942368531,
 0.9725632117848634,
 0.9758374817136123,
 0.9764252023744514]

# XGBoost

In [None]:
#! pip install xgboost

Collecting xgboost
  Downloading xgboost-3.1.1-py3-none-win_amd64.whl (72.0 MB)
     -------------------------------------- 72.0/72.0 MB 778.6 kB/s eta 0:00:00
Installing collected packages: xgboost
Successfully installed xgboost-3.1.1


You should consider upgrading via the 'C:\Users\User\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip' command.


# Adaptive Boosting

upenn

# AdaBoost Code