# Exercise 8.2: Voting Ensembles on Energy Efficiency (Regression)

**Objective:** Compare individual regression models vs. a VotingRegressor ensemble on the Energy Efficiency dataset (heating load prediction).

## Experiment Setup
- **Dataset:** ENB2012 Energy Efficiency (8 features, 2 targets)  
- **Target:** Heating load (Y1)  
- **Test Size:** 30% holdout  
- **Metrics:** RMSE & MAE  
- **Individual Models:**
  1. Linear Regression  
  2. Ridge Regression  
  3. Lasso Regression  
  4. Support Vector Regressor (SVR)  
  5. 1-layer Neural Network (MLPRegressor)  
- **Ensemble:** VotingRegressor (simple and weighted)


## 1️⃣ Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import VotingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error


## 2️⃣ Data Loading & Preprocessing

Load the ENB2012_data.csv (must be in `datasets/ENB2012_data.csv`).
- Use the first 8 columns as features, column 9 (Y1) as target.
- Split 70% train / 30% test and standardize features.

In [None]:
def load_data(test_size=0.3, random_state=42):
    df = pd.read_csv('../../datasets/ENB2012_data.csv')
    X = df.iloc[:, 0:8].values
    y = df.iloc[:, 8].values   # Y1: Heating Load
    X_tr, X_te, y_tr, y_te = train_test_split(
        X, y, test_size=test_size, random_state=random_state)
    scaler = StandardScaler().fit(X_tr)
    return scaler.transform(X_tr), scaler.transform(X_te), y_tr, y_te

X_train, X_test, y_train, y_test = load_data()
print(f"Train: {X_train.shape}, Test: {X_test.shape}")

## 3️⃣ Individual Models
Train and evaluate five regressors and record RMSE & MAE.

In [None]:
models = {
    'Linear': LinearRegression(),
    'Ridge': Ridge(alpha=1.0),
    'Lasso': Lasso(alpha=0.1),
    'SVR': SVR(kernel='rbf', C=1.0, epsilon=0.1),
    'MLP': MLPRegressor(
        hidden_layer_sizes=(50,), solver='adam',
        learning_rate_init=0.01, max_iter=1000,
        random_state=42)
}

results = []
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    mae  = mean_absolute_error(y_test, y_pred)
    results.append((name, rmse, mae))

df_indiv = pd.DataFrame(results, columns=['Model','RMSE','MAE'])
print(df_indiv)

## 4️⃣ Voting Regressor (Simple Ensemble)
Combine the five models with equal weights.

In [None]:
estimators = [(name, m) for name, m in models.items()]
voting_simple = VotingRegressor(estimators=estimators)
voting_simple.fit(X_train, y_train)
y_vs = voting_simple.predict(X_test)
rmse_vs = np.sqrt(mean_squared_error(y_test, y_vs))
mae_vs  = mean_absolute_error(y_test, y_vs)
print(f"Voting (simple) | RMSE: {rmse_vs:.3f}, MAE: {mae_vs:.3f}")

## 5️⃣ Voting Regressor (Weighted Ensemble)
Assign higher weight to models that performed better (e.g., MLP & Ridge).

In [None]:
# Example weights: Linear=1, Ridge=2, Lasso=1, SVR=1, MLP=2
weights = [1, 2, 1, 1, 2]
voting_weighted = VotingRegressor(
    estimators=estimators,
    weights=weights
)
voting_weighted.fit(X_train, y_train)
y_vw = voting_weighted.predict(X_test)
rmse_vw = np.sqrt(mean_squared_error(y_test, y_vw))
mae_vw  = mean_absolute_error(y_test, y_vw)
print(f"Voting (weighted) | RMSE: {rmse_vw:.3f}, MAE: {mae_vw:.3f}")

## 6️⃣  Challenges
1. **Optimize weights**: use a small grid search over `weights` in `VotingRegressor` to minimize validation RMSE.  
2. **Add another regressor** (e.g. `DecisionTreeRegressor`, `KNeighborsRegressor`) and compare.  
3. **StackingRegressor**: replace the VotingRegressor with a `StackingRegressor` using Ridge as the final estimator.  
4. **Learn curve**: plot training vs. test RMSE as you vary `max_iter` in the MLPRegressor.  
5. **Different target**: repeat all experiments predicting cooling load (`Y2` in column 9) instead of heating load.