## Ensemble Methods

In our continuous quest to enhance the accuracy and robustness of our predictive models for California housing prices, we delve into the realm of ensemble methods. Ensemble methods, renowned for their capability to combine multiple models to achieve superior predictive performance, offer a promising avenue for refining our housing price predictions.

#### Loading and preparing the data

In [1]:
from sklearn.datasets import  fetch_california_housing
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor, RandomForestRegressor,AdaBoostRegressor, GradientBoostingRegressor

from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

In [None]:
california = fetch_california_housing()
print(california["DESCR"])

In [None]:
df_cali = pd.DataFrame(california["data"], columns = california["feature_names"])
df_cali["median_house_value"] = california["target"]

df_cali.head()

#### Normalization & Feature Selection

Like we did in Feature Engineering lesson, we are going to normalize our data and select a subset of columns as our features.

#### Train Test Split

In [4]:
features = df_cali.drop(columns = ["median_house_value","AveOccup", "Population", "AveBedrms"])
target = df_cali["median_house_value"]

In [5]:
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size = 0.20, random_state=0)

Create an instance of the normalizer

In [None]:
normalizer = MinMaxScaler()

normalizer.fit(X_train)

In [7]:
X_train_norm = normalizer.transform(X_train)

X_test_norm = normalizer.transform(X_test)

In [None]:
X_train_norm = pd.DataFrame(X_train_norm, columns = X_train.columns)
X_train_norm.head()

In [None]:
X_test_norm = pd.DataFrame(X_test_norm, columns = X_test.columns)
X_test_norm.head()

## Bagging and Pasting

Bagging involves training multiple instances of the same base model on different subsets of the training data. The final prediction is obtained by averaging or voting over predictions from these models.

Just for baseline, our current best model is a Decision Tree with R-Squared of 0.70, lets see how ensembles works

In [10]:
bagging_reg = BaggingRegressor(DecisionTreeRegressor(max_depth=20),
                               n_estimators=100,
                               max_samples = 1000)

Training Bagging model with our normalized data

In [None]:
bagging_reg.fit(X_train_norm, y_train)

Evaluate model's performance

In [None]:
pred = bagging_reg.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", bagging_reg.score(X_test_norm, y_test))

Combining multiple trees, in this case 100, indeed yield a stronger model, now we are at 0.72 R-Squared!

Let's explore more!

In Bagging methods, we have many base estimators, so there is no feature importance method implemented.

## Random Patches

While in Bagging/Pasting, we randomize the training data that each predictor (estimator) learns from. However, in a Random Patches Method, we go a step further by also **randomizing the features** that each predictor trains with.

- Initialize a Random Forest

In [13]:
forest = RandomForestRegressor(n_estimators=100,
                             max_depth=20)

- Training the model

In [None]:
forest.fit(X_train_norm, y_train)

- Evaluate the model

In [None]:
pred = forest.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", forest.score(X_test_norm, y_test))

By randomizing data also features that every estimators will learn from, we obtain even a better model!

We are now at 0.82 R-Squared.

## AdaBoost

Now, instead of training our estimators independently by training them in parallel, each estimators will learn at its predecessor's errors and focus on those datapoints where it failed.

- Initialize a AdaBoost model

In [16]:
ada_reg = AdaBoostRegressor(DecisionTreeRegressor(max_depth=20),
                            n_estimators=100)

- Training the model

In [None]:
ada_reg.fit(X_train_norm, y_train)

- Evaluate the model

In [None]:
pred = ada_reg.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", ada_reg.score(X_test_norm, y_test))

Even better! By randomizing training set, features and also focusing where the previous estimator failed, we obtained a better model!

## Gradient Boosting

Now, each estimator will predict the error caused by its predecessor.

- Initialize a AdaBoost model

In [19]:
gb_reg = GradientBoostingRegressor(max_depth=20,
                                   n_estimators=100)

- Training the model

In [None]:
gb_reg.fit(X_train_norm, y_train)

- Evaluate the model

In [None]:
pred = gb_reg.predict(X_test_norm)

print("MAE", mean_absolute_error(pred, y_test))
print("RMSE", mean_squared_error(pred, y_test, squared=False))
print("R2 score", gb_reg.score(X_test_norm, y_test))

Gradient Boosting compared with AdaBoosting, really doesnt seems doing a great job.

**However, note that none of the hyperparameters of all models we've tried where fine tunned.**

