# py_stacks: Model Ensembling via Stacking

This notebook demonstrates **py_stacks**, which implements model stacking (ensembling) using meta-learning with elastic net regularization.

## What is Model Stacking?

Model stacking (or stacked generalization) combines predictions from multiple base models using a meta-learner:

1. Train multiple diverse base models
2. Collect their predictions (meta-features)
3. Train a meta-learner to optimally combine predictions
4. The ensemble often outperforms individual models

## Key Functions

- **stacks()** - Create empty ensemble
- **add_candidates()** - Add base model predictions
- **blend_predictions()** - Fit meta-learner with elastic net
- **get_model_weights()** - Extract model contributions
- **compare_to_candidates()** - Compare ensemble vs base models

In [None]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Import py-tidymodels packages
from py_parsnip import linear_reg, rand_forest, decision_tree
from py_recipes import recipe, step_date, step_lag, step_normalize
from py_workflows import workflow
from py_rsample import initial_time_split
from py_stacks import stacks
from py_visualize import plot_model_comparison

print("âœ“ All packages imported successfully")

## Setup: Create Sample Data

We'll create a time series dataset for regression.

In [None]:
# Create time series data
np.random.seed(42)
dates = pd.date_range('2020-01-01', periods=400, freq='D')
time_index = np.arange(len(dates))

# Complex pattern: trend + multiple seasonalities + noise
trend = time_index * 0.3
weekly_season = 8 * np.sin(2 * np.pi * time_index / 7)
monthly_season = 15 * np.sin(2 * np.pi * time_index / 30)
noise = np.random.randn(len(dates)) * 5

y = trend + weekly_season + monthly_season + noise + 100

data = pd.DataFrame({
    'date': dates,
    'value': y
})

# Split data
split = initial_time_split(data, prop=0.75)
train_data = split.training()
test_data = split.testing()

print(f"Total observations: {len(data)}")
print(f"Training: {len(train_data)} observations")
print(f"Testing: {len(test_data)} observations")
print(f"\nTarget statistics:")
print(f"  Mean: {data['value'].mean():.2f}")
print(f"  Std: {data['value'].std():.2f}")

## Step 1: Train Multiple Base Models

We'll create diverse base models with different strengths:
- **Linear Regression**: Good for linear trends
- **Ridge Regression**: Regularized linear model
- **Lasso Regression**: Sparse linear model
- **Random Forest**: Captures non-linear patterns
- **Decision Tree**: Simple non-linear model

In [None]:
# Create feature engineering recipe
rec = (
    recipe(value ~ date, data=train_data)
    .step_date('date', features=['month', 'week', 'doy', 'dow'])
    .step_lag('value', lags=[1, 7, 14, 30])
    .step_normalize(['value_lag_1', 'value_lag_7', 'value_lag_14', 'value_lag_30'])
)

print("âœ“ Recipe created with date features and lags")

In [None]:
# Model 1: Linear Regression
wf_linear = (
    workflow()
    .add_recipe(rec)
    .add_model(linear_reg())
)

fit_linear = wf_linear.fit(train_data)
pred_linear = fit_linear.predict(test_data)

print("âœ“ Model 1: Linear Regression fitted")

In [None]:
# Model 2: Ridge Regression (L2 regularization)
wf_ridge = (
    workflow()
    .add_recipe(rec)
    .add_model(linear_reg(penalty=0.1, mixture=0.0))  # mixture=0 â†’ Ridge
)

fit_ridge = wf_ridge.fit(train_data)
pred_ridge = fit_ridge.predict(test_data)

print("âœ“ Model 2: Ridge Regression fitted")

In [None]:
# Model 3: Lasso Regression (L1 regularization)
wf_lasso = (
    workflow()
    .add_recipe(rec)
    .add_model(linear_reg(penalty=0.1, mixture=1.0))  # mixture=1 â†’ Lasso
)

fit_lasso = wf_lasso.fit(train_data)
pred_lasso = fit_lasso.predict(test_data)

print("âœ“ Model 3: Lasso Regression fitted")

In [None]:
# Model 4: Random Forest
wf_rf = (
    workflow()
    .add_recipe(rec)
    .add_model(rand_forest(trees=100, min_n=5, mode='regression'))
)

fit_rf = wf_rf.fit(train_data)
pred_rf = fit_rf.predict(test_data)

print("âœ“ Model 4: Random Forest fitted")

In [None]:
# Model 5: Decision Tree
wf_tree = (
    workflow()
    .add_recipe(rec)
    .add_model(decision_tree(min_n=10, mode='regression'))
)

fit_tree = wf_tree.fit(train_data)
pred_tree = fit_tree.predict(test_data)

print("âœ“ Model 5: Decision Tree fitted")

## Step 2: Compare Individual Model Performance

Before ensembling, let's see how each model performs individually.

In [None]:
# Extract stats from each model
_, _, stats_linear = fit_linear.extract_outputs()
_, _, stats_ridge = fit_ridge.extract_outputs()
_, _, stats_lasso = fit_lasso.extract_outputs()
_, _, stats_rf = fit_rf.extract_outputs()
_, _, stats_tree = fit_tree.extract_outputs()

# Visualize comparison
fig = plot_model_comparison(
    stats_list=[stats_linear, stats_ridge, stats_lasso, stats_rf, stats_tree],
    model_names=["Linear", "Ridge", "Lasso", "Random Forest", "Decision Tree"],
    metrics=["rmse", "mae", "r_squared"],
    split="test",
    plot_type="bar",
    title="Individual Model Performance",
    height=500
)

fig.show()

# Print test RMSE for each model
print("\nTest RMSE by model:")
test_rmses = {
    "Linear": stats_linear[stats_linear['metric'] == 'rmse']['value'].values[1],  # test split
    "Ridge": stats_ridge[stats_ridge['metric'] == 'rmse']['value'].values[1],
    "Lasso": stats_lasso[stats_lasso['metric'] == 'rmse']['value'].values[1],
    "Random Forest": stats_rf[stats_rf['metric'] == 'rmse']['value'].values[1],
    "Decision Tree": stats_tree[stats_tree['metric'] == 'rmse']['value'].values[1]
}

for model, rmse in sorted(test_rmses.items(), key=lambda x: x[1]):
    print(f"  {model:20s}: {rmse:.4f}")

print(f"\nðŸ“Š Best individual model: {min(test_rmses, key=test_rmses.get)}")

## Step 3: Create Ensemble with Stacking

Now we'll use **py_stacks** to combine these models via meta-learning.

In [None]:
# Extract predictions from each model
outputs_linear, _, _ = fit_linear.extract_outputs()
outputs_ridge, _, _ = fit_ridge.extract_outputs()
outputs_lasso, _, _ = fit_lasso.extract_outputs()
outputs_rf, _, _ = fit_rf.extract_outputs()
outputs_tree, _, _ = fit_tree.extract_outputs()

# Filter to test set predictions (for meta-learning)
test_outputs_linear = outputs_linear[outputs_linear['split'] == 'test'].copy()
test_outputs_ridge = outputs_ridge[outputs_ridge['split'] == 'test'].copy()
test_outputs_lasso = outputs_lasso[outputs_lasso['split'] == 'test'].copy()
test_outputs_rf = outputs_rf[outputs_rf['split'] == 'test'].copy()
test_outputs_tree = outputs_tree[outputs_tree['split'] == 'test'].copy()

# Rename .pred columns to avoid conflicts
test_outputs_linear = test_outputs_linear.rename(columns={'.pred': '.pred'})
test_outputs_ridge = test_outputs_ridge.rename(columns={'.pred': '.pred'})
test_outputs_lasso = test_outputs_lasso.rename(columns={'.pred': '.pred'})
test_outputs_rf = test_outputs_rf.rename(columns={'.pred': '.pred'})
test_outputs_tree = test_outputs_tree.rename(columns={'.pred': '.pred'})

print("âœ“ Extracted predictions from all models")

In [None]:
# Create stacks ensemble
ensemble = (
    stacks()
    .add_candidates(test_outputs_linear, name="linear")
    .add_candidates(test_outputs_ridge, name="ridge")
    .add_candidates(test_outputs_lasso, name="lasso")
    .add_candidates(test_outputs_rf, name="random_forest")
    .add_candidates(test_outputs_tree, name="decision_tree")
    .blend_predictions(
        penalty=0.01,        # Small penalty for regularization
        mixture=1.0,         # Lasso (L1) for sparsity
        non_negative=True    # Weights must be >= 0 (interpretability)
    )
)

print("âœ“ Ensemble created with 5 base models")
print("âœ“ Meta-learner: Elastic Net with non-negative constraint")

## Step 4: Analyze Model Weights

The meta-learner assigns weights to each base model. Higher weights = more important.

In [None]:
# Get model weights
weights = ensemble.get_model_weights()

print("Model Weights and Contributions:")
print("=" * 70)
print(weights.to_string(index=False))

print("\nðŸ“Š Interpretation:")
print("  â€¢ weight: Meta-learner coefficient for each model")
print("  â€¢ contribution_pct: Percentage contribution to ensemble")
print("  â€¢ Models with weight=0 are not used by the ensemble")
print("  â€¢ Non-negative constraint ensures weights >= 0")

In [None]:
# Visualize weights (excluding intercept)
import plotly.graph_objects as go

model_weights = weights[weights['model'] != '(Intercept)'].copy()

fig = go.Figure()

fig.add_trace(go.Bar(
    x=model_weights['model'],
    y=model_weights['weight'],
    text=model_weights['contribution_pct'].apply(lambda x: f"{x:.1f}%"),
    textposition='auto',
    marker_color='steelblue'
))

fig.update_layout(
    title="Model Weights in Ensemble",
    xaxis_title="Model",
    yaxis_title="Weight",
    height=400
)

fig.show()

print("\nðŸ“Š The ensemble learns to emphasize models that complement each other")

## Step 5: Compare Ensemble to Base Models

Does the ensemble outperform individual models?

In [None]:
# Get comparison
comparison = ensemble.compare_to_candidates()

print("Ensemble vs Base Models (Test Set):")
print("=" * 80)
print(comparison.to_string(index=False))

print("\nðŸ“Š Results:")
best_model = comparison.iloc[0]['model']
best_rmse = comparison.iloc[0]['rmse']

if best_model == 'Ensemble':
    improvement = comparison.iloc[1]['rmse'] - best_rmse
    pct_improvement = (improvement / comparison.iloc[1]['rmse']) * 100
    print(f"  âœ“ Ensemble is the best model!")
    print(f"  âœ“ Improved RMSE by {improvement:.4f} ({pct_improvement:.2f}%)")
else:
    print(f"  â€¢ Best model: {best_model}")
    ensemble_rank = comparison[comparison['model'] == 'Ensemble'].index[0] + 1
    print(f"  â€¢ Ensemble rank: #{ensemble_rank}")

In [None]:
# Visualize comparison
fig = go.Figure()

# Highlight ensemble
colors = ['red' if model == 'Ensemble' else 'steelblue' 
          for model in comparison['model']]

fig.add_trace(go.Bar(
    x=comparison['model'],
    y=comparison['rmse'],
    marker_color=colors,
    text=comparison['rmse'].apply(lambda x: f"{x:.4f}"),
    textposition='auto'
))

fig.update_layout(
    title="Test RMSE: Ensemble vs Base Models (Lower is Better)",
    xaxis_title="Model",
    yaxis_title="RMSE",
    height=500,
    showlegend=False
)

fig.show()

print("\nðŸ“Š Red bar = Ensemble")

## Step 6: Get Ensemble Metrics

View detailed performance metrics for the ensemble.

In [None]:
# Get ensemble metrics
metrics = ensemble.get_metrics()

print("Ensemble Performance Metrics:")
print("=" * 40)
for _, row in metrics.iterrows():
    print(f"{row['metric']:12s}: {row['value']:.6f}")

print("\nðŸ“Š Metrics calculated on test set")

## Step 7: Experiment with Different Penalties

The penalty parameter controls regularization strength. Let's compare different values.

In [None]:
# Try different penalty values
penalties = [0.001, 0.01, 0.1, 1.0]

results = []

for pen in penalties:
    ensemble_temp = (
        stacks()
        .add_candidates(test_outputs_linear, name="linear")
        .add_candidates(test_outputs_ridge, name="ridge")
        .add_candidates(test_outputs_lasso, name="lasso")
        .add_candidates(test_outputs_rf, name="random_forest")
        .add_candidates(test_outputs_tree, name="decision_tree")
        .blend_predictions(penalty=pen, mixture=1.0, non_negative=True)
    )
    
    metrics_temp = ensemble_temp.get_metrics()
    rmse = metrics_temp[metrics_temp['metric'] == 'rmse']['value'].values[0]
    
    # Count non-zero weights
    weights_temp = ensemble_temp.get_model_weights()
    n_nonzero = (weights_temp[weights_temp['model'] != '(Intercept)']['weight'] > 0.001).sum()
    
    results.append({
        'penalty': pen,
        'rmse': rmse,
        'n_models_used': n_nonzero
    })

results_df = pd.DataFrame(results)

print("Effect of Penalty on Ensemble:")
print("=" * 50)
print(results_df.to_string(index=False))

print("\nðŸ“Š Observations:")
print("  â€¢ Lower penalty â†’ More models used (less regularization)")
print("  â€¢ Higher penalty â†’ Fewer models used (more sparsity)")
print("  â€¢ Trade-off between model complexity and performance")

In [None]:
# Visualize penalty effect
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=results_df['penalty'],
    y=results_df['rmse'],
    mode='lines+markers',
    name='RMSE',
    line=dict(color='steelblue', width=2),
    marker=dict(size=10)
))

fig.update_layout(
    title="Effect of Penalty on Ensemble RMSE",
    xaxis_title="Penalty (log scale)",
    xaxis_type="log",
    yaxis_title="RMSE",
    height=400
)

fig.show()

## Summary: Why Use py_stacks?

### Benefits

1. **Improved Performance**: Ensembles often outperform individual models
2. **Automatic Weight Learning**: Meta-learner finds optimal model combination
3. **Interpretability**: Non-negative weights show each model's contribution
4. **Regularization**: Elastic net prevents overfitting and promotes sparsity
5. **Flexibility**: Easy to add/remove candidate models

### Best Practices

1. **Diversity**: Use diverse base models (linear, tree-based, etc.)
2. **Quality**: Start with good individual models
3. **Regularization**: Tune penalty parameter to avoid overfitting
4. **Non-negativity**: Use `non_negative=True` for interpretability
5. **Validation**: Always evaluate on held-out test set

### When to Stack

- When you have multiple good models with different strengths
- When individual models make different types of errors
- When prediction accuracy is critical
- When you can afford the extra computational cost

### Method Chaining API

```python
ensemble = (
    stacks()
    .add_candidates(pred1, name="model_1")
    .add_candidates(pred2, name="model_2")
    .add_candidates(pred3, name="model_3")
    .blend_predictions(penalty=0.01, non_negative=True)
)

# Analyze results
weights = ensemble.get_model_weights()
metrics = ensemble.get_metrics()
comparison = ensemble.compare_to_candidates()
```