# Ensemble Techniques in Machine Learning

Ensemble techniques are machine learning methods that combine multiple models to produce a single, stronger, and more accurate model than any individual model alone.

## Types of Ensemble Techniques

There are two main ensemble methods:
1. Bagging
2. Boosting

## Bagging (Bootstrap Aggregating)

Bagging is an ensemble technique where multiple models are trained independently using different random samples of the dataset, and their predictions are combined to produce the final result.

Bagging stands for Bootstrap Aggregating.

## Why Do We Need Bagging?

A single decision tree:
1. Has high variance
2. Easily overfits
3. Changes significantly when the data changes slightly

## Bootstrap Sampling

Bootstrap sampling works as follows:
- From the original dataset, multiple new datasets are created
- Each dataset is created by random sampling with replacement
- Each bootstrap dataset has the same size as the original dataset
- Some data points may repeat, while some may be missing
- Missing data points are called Out-of-Bag (OOB) samples

### Training Process

1. Train one model per bootstrap dataset
2. Each model (tree) is trained independently
3. Make predictions using all models
4. Aggregate the predictions (voting or averaging)

The final aggregated prediction is more stable and accurate than a single decision tree.

## When Should We Use Bagging?

Bagging is useful when:
- The dataset is large
- The model is overfitting
- Decision trees give unstable results


In [1]:
from sklearn.datasets import fetch_california_housing
import pandas as pd
import numpy as np



In [2]:
df=fetch_california_housing()
df1=pd.DataFrame(df.data, columns=df.feature_names)
df1['MedHouseVal']=df.target
print(df1.head())

   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   

   Longitude  MedHouseVal  
0    -122.23        4.526  
1    -122.22        3.585  
2    -122.24        3.521  
3    -122.25        3.413  
4    -122.25        3.422  


In [3]:
print(df1.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   MedInc       20640 non-null  float64
 1   HouseAge     20640 non-null  float64
 2   AveRooms     20640 non-null  float64
 3   AveBedrms    20640 non-null  float64
 4   Population   20640 non-null  float64
 5   AveOccup     20640 non-null  float64
 6   Latitude     20640 non-null  float64
 7   Longitude    20640 non-null  float64
 8   MedHouseVal  20640 non-null  float64
dtypes: float64(9)
memory usage: 1.4 MB
None


In [4]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split



In [None]:
#parameter tuning
X=df1.drop('MedHouseVal', axis=1)
y=df1['MedHouseVal']
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)
rf=RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
rf.fit(X_train, y_train)
y_pred=rf.predict(X_test)
dt=DecisionTreeRegressor(max_depth=10, random_state=42)
dt.fit(X_train, y_train)
y_pred_dt=dt.predict(X_test)
from sklearn.metrics import mean_squared_error
mse_rf=mean_squared_error(y_test, y_pred)
mse_dt=mean_squared_error(y_test, y_pred_dt)
r2_rf=rf.score(X_test, y_test)
r2_dt=dt.score(X_test, y_test)
print(f"Random Forest MSE: {mse_rf}, R2: {r2_rf}")
print(f"Decision Tree MSE: {mse_dt}, R2: {r2_dt}")


Random Forest MSE: 0.29649278336294826, R2: 0.7737402686595128
Decision Tree MSE: 0.4154681981618525, R2: 0.6829476865157171


In [None]:
from sklearn.ensemble import RandomForestClassifier
x=