# Ensemble Forecasting for 2025

This notebook creates ensemble forecasts by combining predictions from multiple models.

## Ensemble Methods
1. **Simple Average**: Equal weight to all models
2. **Weighted by Inverse MAPE**: Better models get higher weight
3. **Top-2 Average**: Use only the best 2 models per metric
4. **Hybrid ML+Human**: Blend machine learning with traditional forecasts

## Available Model Forecasts
- **CatBoost**: Gradient boosting with categorical features (WORKS - monthly variation)
- **Seasonal Naive**: Previous year's value (simple baseline)
- **Prophet**: Time series decomposition
- **SARIMAX**: Statistical ARIMA model
- **Human Method**: 2024 average (total ÷ 12)

**Note**: LightGBM/XGBoost excluded due to flat forecasts (insufficient training data)

In [None]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print("✓ Libraries imported successfully")

## Section 1: Load Model Forecasts and Performance Metrics

In [None]:
# Load 2025 forecasts from each model
data_dir = Path('../data/processed')

# CatBoost (ML champion)
catboost_2025 = pd.read_csv(data_dir / 'catboost_forecast_2025.csv')
catboost_2025['date'] = pd.to_datetime(catboost_2025['date'])

# Load historical data for baselines
df_hist = pd.read_csv(data_dir / 'monthly_aggregated_full_company.csv')
df_hist['date'] = pd.to_datetime(df_hist['date'])

print(f"✓ Loaded CatBoost forecasts: {len(catboost_2025)} months")
print(f"✓ Loaded historical data: {len(df_hist)} months ({df_hist['date'].min()} to {df_hist['date'].max()})")

In [None]:
# Load validation performance metrics
catboost_metrics = pd.read_csv(data_dir / 'catboost_metrics.csv')
baseline_metrics = pd.read_csv(data_dir / 'baseline_metrics.csv')
prophet_metrics = pd.read_csv(data_dir / 'prophet_metrics.csv')
sarimax_metrics = pd.read_csv(data_dir / 'sarimax_metrics.csv')

print("\n✓ Loaded performance metrics for all models")
print(f"  - CatBoost: {len(catboost_metrics)} metrics")
print(f"  - Baseline: {len(baseline_metrics)} metrics")
print(f"  - Prophet: {len(prophet_metrics)} metrics")
print(f"  - SARIMAX: {len(sarimax_metrics)} metrics")

## Section 2: Generate Baseline 2025 Forecasts

Create simple baseline forecasts for comparison.

In [None]:
# Target metrics (all 10)
target_metrics = [
    'total_orders',
    'total_km_billed',
    'total_km_actual',
    'total_tours',
    'total_drivers',
    'revenue_total',
    'external_drivers',
    'vehicle_km_cost',
    'vehicle_time_cost',
    'total_vehicle_cost'
]

# Create 2025 date range
dates_2025 = pd.date_range(start='2025-01-01', end='2025-12-01', freq='MS')

# Initialize forecast dataframes
seasonal_naive_2025 = pd.DataFrame({'date': dates_2025})
human_method_2025 = pd.DataFrame({'date': dates_2025})

print("Generating baseline forecasts...")
print("="*80)

for metric in target_metrics:
    # Seasonal Naive: Use 2024 values
    values_2024 = df_hist[df_hist['date'].dt.year == 2024][metric].values
    if len(values_2024) == 12:
        seasonal_naive_2025[metric] = values_2024
    else:
        print(f"  ⚠️  Warning: {metric} has {len(values_2024)} months in 2024, expected 12")
        seasonal_naive_2025[metric] = values_2024.mean()  # Fallback to mean
    
    # Human Method: 2024 total ÷ 12
    total_2024 = df_hist[df_hist['date'].dt.year == 2024][metric].sum()
    human_method_2025[metric] = total_2024 / 12
    
    print(f"\n{metric}:")
    print(f"  Seasonal Naive: {seasonal_naive_2025[metric].min():.0f} - {seasonal_naive_2025[metric].max():.0f}")
    print(f"  Human Method: {human_method_2025[metric].iloc[0]:.0f} (constant)")

print(f"\n{'='*80}")
print(f"✓ Baseline forecasts generated successfully!")

## Section 3: Create Model Performance Summary

Identify best model per metric based on validation MAPE.

In [None]:
# Combine all metrics
all_metrics = pd.concat([
    catboost_metrics,
    baseline_metrics[baseline_metrics['model'] == 'Seasonal Naive'],  # Best baseline
    prophet_metrics,
    sarimax_metrics
], ignore_index=True)

# Find best model per metric
best_models = all_metrics.loc[all_metrics.groupby('metric')['MAPE'].idxmin()]

print("Best Model per Metric (by MAPE):")
print("="*80)
for _, row in best_models.iterrows():
    print(f"{row['metric']:25s} → {row['model']:15s} (MAPE: {row['MAPE']:.2f}%)")

# Count wins
model_wins = best_models['model'].value_counts()
print(f"\n{'='*80}")
print("Model Win Count:")
print("="*80)
for model, count in model_wins.items():
    print(f"  {model}: {count}/10 metrics")

## Section 4: Ensemble Method 1 - Weighted by Inverse MAPE

Weight each model's forecast by its inverse MAPE (better models get higher weight).

In [None]:
# Initialize ensemble forecast
ensemble_weighted_2025 = pd.DataFrame({'date': dates_2025})

print("Creating Weighted Ensemble (Inverse MAPE)...")
print("="*80)

for metric in target_metrics:
    print(f"\n{metric}:")
    
    # Get MAPE for each model
    metric_perf = all_metrics[all_metrics['metric'] == metric].copy()
    
    # Calculate inverse MAPE weights (lower MAPE = higher weight)
    metric_perf['inv_mape'] = 1 / (metric_perf['MAPE'] + 0.01)  # Add small value to avoid division by zero
    metric_perf['weight'] = metric_perf['inv_mape'] / metric_perf['inv_mape'].sum()
    
    # Get forecasts for this metric
    forecasts = {
        'CatBoost': catboost_2025[metric].values,
        'Seasonal Naive': seasonal_naive_2025[metric].values,
        'Human Method': human_method_2025[metric].values
    }
    
    # Calculate weighted average
    weighted_forecast = np.zeros(12)
    
    for _, row in metric_perf.iterrows():
        model_name = row['model']
        weight = row['weight']
        
        if model_name in forecasts:
            weighted_forecast += forecasts[model_name] * weight
            print(f"  {model_name:15s}: weight={weight:.3f}, MAPE={row['MAPE']:.2f}%")
    
    ensemble_weighted_2025[metric] = weighted_forecast
    print(f"  → Ensemble: {weighted_forecast.min():.0f} - {weighted_forecast.max():.0f} (variation: {((weighted_forecast.max()/weighted_forecast.min()-1)*100):.1f}%)")

print(f"\n{'='*80}")
print("✓ Weighted ensemble created successfully!")

## Section 5: Ensemble Method 2 - Best Model per Metric

Use the single best model for each metric (no averaging).

In [None]:
# Initialize best-model ensemble
ensemble_best_2025 = pd.DataFrame({'date': dates_2025})

print("Creating Best-Model Ensemble...")
print("="*80)

for metric in target_metrics:
    # Get best model for this metric
    best_row = best_models[best_models['metric'] == metric].iloc[0]
    best_model = best_row['model']
    best_mape = best_row['MAPE']
    
    # Select forecast from best model
    if best_model == 'CatBoost':
        ensemble_best_2025[metric] = catboost_2025[metric].values
    elif best_model == 'Seasonal Naive':
        ensemble_best_2025[metric] = seasonal_naive_2025[metric].values
    elif best_model == 'Human Method':
        ensemble_best_2025[metric] = human_method_2025[metric].values
    else:
        # Fallback to CatBoost
        print(f"  ⚠️  {metric}: {best_model} not available, using CatBoost")
        ensemble_best_2025[metric] = catboost_2025[metric].values
    
    print(f"{metric:25s} → {best_model:15s} (MAPE: {best_mape:.2f}%)")

print(f"\n{'='*80}")
print("✓ Best-model ensemble created successfully!")

## Section 6: Ensemble Method 3 - Hybrid ML+Human

Blend CatBoost (ML) with Human Method (60% ML / 40% Human).

In [None]:
# Initialize hybrid ensemble
ensemble_hybrid_2025 = pd.DataFrame({'date': dates_2025})

ml_weight = 0.6
human_weight = 0.4

print(f"Creating Hybrid Ensemble ({ml_weight*100:.0f}% ML / {human_weight*100:.0f}% Human)...")
print("="*80)

for metric in target_metrics:
    # Blend CatBoost and Human Method
    ensemble_hybrid_2025[metric] = (
        catboost_2025[metric].values * ml_weight +
        human_method_2025[metric].values * human_weight
    )
    
    print(f"{metric:25s}: {ensemble_hybrid_2025[metric].min():.0f} - {ensemble_hybrid_2025[metric].max():.0f}")

print(f"\n{'='*80}")
print("✓ Hybrid ensemble created successfully!")

## Section 7: Save Ensemble Forecasts

In [None]:
# Save all ensemble forecasts
output_dir = Path('../data/processed')

ensemble_weighted_2025.to_csv(output_dir / 'ensemble_weighted_2025.csv', index=False)
ensemble_best_2025.to_csv(output_dir / 'ensemble_best_model_2025.csv', index=False)
ensemble_hybrid_2025.to_csv(output_dir / 'ensemble_hybrid_2025.csv', index=False)

# Also save baselines for reference
seasonal_naive_2025.to_csv(output_dir / 'seasonal_naive_2025.csv', index=False)
human_method_2025.to_csv(output_dir / 'human_method_2025.csv', index=False)

print("Saved ensemble forecasts:")
print("  ✓ ensemble_weighted_2025.csv (inverse MAPE weights)")
print("  ✓ ensemble_best_model_2025.csv (best model per metric)")
print("  ✓ ensemble_hybrid_2025.csv (60% ML / 40% Human)")
print("  ✓ seasonal_naive_2025.csv (baseline)")
print("  ✓ human_method_2025.csv (baseline)")

## Section 8: Visualize Ensemble Comparison

Compare all ensemble methods for key metrics.

In [None]:
# Visualize for key metrics
key_metrics = ['total_orders', 'revenue_total', 'total_drivers', 'total_vehicle_cost']

for metric in key_metrics:
    fig = go.Figure()
    
    # Add each ensemble method
    fig.add_trace(go.Scatter(
        x=ensemble_weighted_2025['date'],
        y=ensemble_weighted_2025[metric],
        mode='lines+markers',
        name='Weighted (Inverse MAPE)',
        line=dict(width=3)
    ))
    
    fig.add_trace(go.Scatter(
        x=ensemble_best_2025['date'],
        y=ensemble_best_2025[metric],
        mode='lines+markers',
        name='Best Model',
        line=dict(width=2, dash='dash')
    ))
    
    fig.add_trace(go.Scatter(
        x=ensemble_hybrid_2025['date'],
        y=ensemble_hybrid_2025[metric],
        mode='lines+markers',
        name='Hybrid (60% ML / 40% Human)',
        line=dict(width=2, dash='dot')
    ))
    
    # Add individual models for reference
    fig.add_trace(go.Scatter(
        x=catboost_2025['date'],
        y=catboost_2025[metric],
        mode='lines',
        name='CatBoost (ML)',
        line=dict(width=1, color='lightgray'),
        opacity=0.5
    ))
    
    fig.add_trace(go.Scatter(
        x=seasonal_naive_2025['date'],
        y=seasonal_naive_2025[metric],
        mode='lines',
        name='Seasonal Naive',
        line=dict(width=1, color='lightblue'),
        opacity=0.5
    ))
    
    fig.update_layout(
        title=f"2025 Ensemble Forecasts - {metric.replace('_', ' ').title()}",
        xaxis_title="Date",
        yaxis_title=metric.replace('_', ' ').title(),
        height=600,
        hovermode='x unified'
    )
    
    fig.show()
    
    # Save
    results_dir = Path('../results')
    results_dir.mkdir(exist_ok=True)
    fig.write_html(results_dir / f'ensemble_comparison_{metric}.html')
    print(f"✓ Saved: results/ensemble_comparison_{metric}.html")

## Section 9: Ensemble Summary Report

In [None]:
# Create summary table
summary_data = []

for metric in target_metrics:
    summary_data.append({
        'Metric': metric,
        'CatBoost Min': catboost_2025[metric].min(),
        'CatBoost Max': catboost_2025[metric].max(),
        'CatBoost Var%': ((catboost_2025[metric].max() / catboost_2025[metric].min() - 1) * 100),
        'Weighted Min': ensemble_weighted_2025[metric].min(),
        'Weighted Max': ensemble_weighted_2025[metric].max(),
        'Weighted Var%': ((ensemble_weighted_2025[metric].max() / ensemble_weighted_2025[metric].min() - 1) * 100),
        'Best Model': best_models[best_models['metric'] == metric]['model'].iloc[0],
        'Best MAPE': best_models[best_models['metric'] == metric]['MAPE'].iloc[0]
    })

summary_df = pd.DataFrame(summary_data)

print("\nEnsemble Forecast Summary:")
print("="*120)
print(summary_df.to_string(index=False))

# Save summary
summary_df.to_csv(output_dir / 'ensemble_summary.csv', index=False)
print("\n✓ Saved: data/processed/ensemble_summary.csv")

print(f"\n{'='*120}")
print("ENSEMBLE FORECASTING COMPLETE!")
print("="*120)
print("\nThree ensemble methods created:")
print("  1. Weighted by Inverse MAPE - Better models get higher weight")
print("  2. Best Model per Metric - Use single best performer")
print("  3. Hybrid ML+Human - Blend CatBoost (60%) with human judgment (40%)")
print("\nNext: Run Notebook 18 to validate all forecasts against actual 2025 data")