# BOOSTING

### What is Boosting?

- Boosting is an ensemble technique where each models are trained sequentially and each new model focuses more on the mistakes made by the previous models.

- bagging = parallel
- boosting = sequential

### Why Boosting?
- Boosting converts weak learners into a strong learner.

### How boosting works?
- Train the first weak model.
- Identify misclassified and highly error points.
- give more importance to misclassified points.
- Train next models.
- repeat.
- combine all models into a final prediction.

### Types of Boosting

- AdaBoost - mainly focuses on misclassified points.
- Gradient - mainly focuses on residual error.
- XGBoost - it is for the optimized version.

In [23]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier,DecisionTreeRegressor
from sklearn.metrics import classification_report,accuracy_score,mean_squared_error,r2_score,mean_absolute_error
from sklearn.datasets import _california_housing
from sklearn.ensemble import AdaBoostClassifier,AdaBoostRegressor,GradientBoostingClassifier,GradientBoostingRegressor
import matplotlib.pyplot as plt

In [24]:
data=_california_housing.fetch_california_housing(as_frame=True).frame

In [18]:
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [19]:
df.isnull().sum()

MedInc         0
HouseAge       0
AveRooms       0
AveBedrms      0
Population     0
AveOccup       0
Latitude       0
Longitude      0
MedHouseVal    0
dtype: int64

In [20]:
x = df.drop(columns=["MedHouseVal"])
y = df["MedHouseVal"]

In [25]:
x=data.drop('MedHouseVal',axis=1)
y=data['MedHouseVal']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)

In [26]:
model = DecisionTreeRegressor(max_depth=4)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Decision Tree Regressor - MSE: {mse}, R2: {r2}")

Decision Tree Regressor - MSE: 0.567419678079617, R2: 0.5676934558406004


In [31]:
model = DecisionTreeRegressor()
gbr = GradientBoostingRegressor(
    n_estimators=100,     # Number of boosting stages
    learning_rate=0.1,   # Step size
    max_depth=3,         # Depth of each tree
    random_state=42
)
gbr.fit(x_train, y_train)
y_pred = gbr.predict(x_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
print(f"R^2 Score: {r2}")
print(f"Mean Absolute Error: {mae}")

Mean Squared Error: 0.28836337869645623
R^2 Score: 0.7803012822391022
Mean Absolute Error: 0.37144609147335444


In [30]:
ada_model = AdaBoostRegressor(
    estimator=model,
    n_estimators=100,
    learning_rate=0.1,
    random_state=42
)
ada_model.fit(x_train, y_train)
y_pred_ada = ada_model.predict(x_test)
mse_ada = mean_squared_error(y_test, y_pred_ada)
r2_ada = r2_score(y_test, y_pred_ada)
mae_ada = mean_absolute_error(y_test, y_pred_ada)
print(f"AdaBoost Mean Squared Error: {mse_ada}")
print(f"AdaBoost R^2 Score: {r2_ada}")
print(f"AdaBoost Mean Absolute Error: {mae_ada}")

AdaBoost Mean Squared Error: 0.47352303542762847
AdaBoost R^2 Score: 0.6392315689184405
AdaBoost Mean Absolute Error: 0.5471081654720075
