# Exercise 8.3: Stacking on Abalone Dataset (Regression)

**Objective:** Compare several base regressors and build a stacking ensemble to predict abalone age (rings).

## Experiment Setup
- **Dataset:** Abalone (4,177 samples, 8 features + sex)  
- **Target:** Number of rings (proxy for age)  
- **Test Size:** 30% holdout  
- **Metrics:** RMSE & MAE  
- **Base Models:**  
  1. Decision Tree Regressor  
  2. K-Nearest Neighbors Regressor  
  3. Gradient Boosting Regressor  
  4. MLP Regressor  
- **Ensemble:** `StackingRegressor` with Ridge as meta-learner

## 1️⃣ Imports

In [None]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import GradientBoostingRegressor, StackingRegressor, VotingRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, mean_absolute_error


## 2️⃣ Data Loading & Preprocessing

- Fetch Abalone data from UCI repository  
- Assign column names, one-hot encode `sex`  
- Split 70% train / 30% test  
- Standardize numeric features

In [None]:
# load data
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
cols = [
    'Sex','Length','Diameter','Height',
    'WholeWeight','ShuckedWeight','VisceraWeight','ShellWeight','Rings'
]
df = pd.read_csv(url, header=None, names=cols)

# one-hot encode Sex
df = pd.get_dummies(df, columns=['Sex'], drop_first=True)

# features & target
X = df.drop('Rings', axis=1).values
y = df['Rings'].values

# train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

# scale features
scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test  = scaler.transform(X_test)

print(f"Train shape: {X_train.shape}, Test shape: {X_test.shape}")

## 3️⃣ Individual Regressors

Train 4 base models and report RMSE & MAE.

In [None]:
models = {
    'DecisionTree': DecisionTreeRegressor(max_depth=5, random_state=42),
    'KNN': KNeighborsRegressor(n_neighbors=5),
    'GBR': GradientBoostingRegressor(n_estimators=100, learning_rate=0.1,
                                        max_depth=3, random_state=42),
    'MLP': MLPRegressor(hidden_layer_sizes=(50,), learning_rate_init=0.01,
                        max_iter=500, random_state=42)
}

results = []
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    mae  = mean_absolute_error(y_test, y_pred)
    results.append((name, rmse, mae))

df_ind = pd.DataFrame(results, columns=['Model','RMSE','MAE'])
print(df_ind)

## 4️⃣ Voting Regressor (Baseline Ensemble)
Combine base models with equal weights.

In [None]:
estimators = [(name, m) for name, m in models.items()]
voter = VotingRegressor(estimators=estimators)
voter.fit(X_train, y_train)
y_v = voter.predict(X_test)
rmse_v = np.sqrt(mean_squared_error(y_test, y_v))
mae_v  = mean_absolute_error(y_test, y_v)
print(f"VotingRegressor | RMSE: {rmse_v:.3f}, MAE: {mae_v:.3f}")

## 5️⃣ Stacking Regressor

Build a `StackingRegressor` using Ridge as the meta-learner.

In [None]:
stack = StackingRegressor(
    estimators=estimators,
    final_estimator=Ridge(alpha=1.0),
    passthrough=False,
    cv=5
)
stack.fit(X_train, y_train)
y_s = stack.predict(X_test)
rmse_s = np.sqrt(mean_squared_error(y_test, y_s))
mae_s  = mean_absolute_error(y_test, y_s)
print(f"StackingRegressor | RMSE: {rmse_s:.3f}, MAE: {mae_s:.3f}")

## 6️⃣ Challenges

1. **Different meta-learner:** swap `Ridge` for `Lasso` or `SVR`.  
2. **Manual stacking:** generate out-of-fold predictions from base models and train meta-learner yourself.  
3. **Feature selection:** use only top-5 features (by importance from GBR) and re-run stacking.  
4. **Hyperparameter tuning:** use `GridSearchCV` to tune `final_estimator__alpha` in Ridge within the stacking pipeline.