# Seasonal Naive Forecasting

This notebook implements Seasonal Naive forecasting for 2025 predictions.

## Approach
For each month in 2025, forecast = average of that same month from historical years (2022-2024).

**Example:**
- January 2025 forecast = mean(Jan 2022, Jan 2023, Jan 2024)
- February 2025 forecast = mean(Feb 2022, Feb 2023, Feb 2024)

## Benefits
- **Simple and interpretable**: Easy to explain to stakeholders
- **Captures seasonality**: Uses historical monthly patterns
- **Robust**: Performs well in Notebook 15 validation (2.95% MAPE for total_orders)
- **No parameter tuning**: No hyperparameters to optimize

## Use Cases
This notebook generates forecasts for metrics where MA-3, MA-6, or XGBoost were identified as "best" in Notebook 15, but we want to capture monthly seasonality instead of flat averages.

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print("✓ Libraries imported successfully")

✓ Libraries imported successfully


## Section 1: Load Historical Data

In [2]:
# Load full company-level time series (2022-2024)
data_path = Path('../data/processed/monthly_aggregated_full_company.parquet')

if not data_path.exists():
    data_path = Path('../data/processed/monthly_aggregated_full_company.csv')
    df = pd.read_csv(data_path)
    df['date'] = pd.to_datetime(df['date'])
else:
    df = pd.read_parquet(data_path)

df = df.sort_values('date').reset_index(drop=True)

print(f"✓ Loaded: {len(df)} months ({df['date'].min()} to {df['date'].max()})")
print(f"  Years: {df['date'].dt.year.unique().tolist()}")

✓ Loaded: 36 months (2022-01-01 00:00:00 to 2024-12-01 00:00:00)
  Years: [2022, 2023, 2024]


## Section 2: Define Target Metrics

Generate Seasonal Naive forecasts for all 10 metrics.

In [3]:
# All 10 target metrics
target_metrics = [
    'total_orders',
    'total_km_billed',
    'total_km_actual',
    'total_tours',
    'total_drivers',
    'revenue_total',
    'external_drivers',
    'vehicle_km_cost',
    'vehicle_time_cost',
    'total_vehicle_cost'
]

print(f"Target metrics: {len(target_metrics)}")
for metric in target_metrics:
    print(f"  • {metric}")

Target metrics: 10
  • total_orders
  • total_km_billed
  • total_km_actual
  • total_tours
  • total_drivers
  • revenue_total
  • external_drivers
  • vehicle_km_cost
  • vehicle_time_cost
  • total_vehicle_cost


## Section 3: Implement Seasonal Naive Forecasting

In [4]:
def seasonal_naive_forecast(df_hist, target_col, forecast_year=2025, num_months=12):
    """
    Generate Seasonal Naive forecasts.
    
    For each month in forecast_year, predicts the average of that month 
    from all available historical years.
    
    Parameters:
    -----------
    df_hist : pd.DataFrame
        Historical dataframe with 'date' and target column
    target_col : str
        Name of target metric
    forecast_year : int
        Year to forecast (default 2025)
    num_months : int
        Number of months to forecast (default 12)
    
    Returns:
    --------
    pd.DataFrame
        Dataframe with date and forecast values
    """
    # Extract month from historical data
    df_hist = df_hist.copy()
    df_hist['month'] = df_hist['date'].dt.month
    
    # Calculate average for each month across all historical years
    monthly_avg = df_hist.groupby('month')[target_col].mean().to_dict()
    
    # Create 2025 dates
    forecast_dates = pd.date_range(
        start=f'{forecast_year}-01-01',
        periods=num_months,
        freq='MS'
    )
    
    # Generate forecasts using monthly averages
    forecasts = []
    for date in forecast_dates:
        month = date.month
        forecast_value = monthly_avg[month]
        forecasts.append(forecast_value)
    
    # Create output dataframe
    forecast_df = pd.DataFrame({
        'date': forecast_dates,
        target_col: forecasts
    })
    
    return forecast_df

print("✓ Seasonal Naive function defined")

✓ Seasonal Naive function defined


## Section 4: Generate 2025 Forecasts

In [5]:
# Generate forecasts for all metrics
print("="*80)
print("GENERATING 2025 SEASONAL NAIVE FORECASTS")
print("="*80)

seasonal_forecasts = {}

for metric in target_metrics:
    print(f"\nForecasting {metric}...")
    
    # Generate forecast
    forecast_df = seasonal_naive_forecast(df, metric, forecast_year=2025, num_months=12)
    
    seasonal_forecasts[metric] = forecast_df
    
    # Show statistics
    min_val = forecast_df[metric].min()
    max_val = forecast_df[metric].max()
    variation = ((max_val / min_val - 1) * 100)
    
    print(f"  ✓ Generated 12 monthly forecasts")
    print(f"  Range: {min_val:,.0f} - {max_val:,.0f}")
    print(f"  Seasonal variation: {variation:.1f}%")

print(f"\n{'='*80}")
print(f"✓ All forecasts generated successfully!")
print(f"{'='*80}")

GENERATING 2025 SEASONAL NAIVE FORECASTS

Forecasting total_orders...
  ✓ Generated 12 monthly forecasts
  Range: 130,556 - 147,638
  Seasonal variation: 13.1%

Forecasting total_km_billed...
  ✓ Generated 12 monthly forecasts
  Range: 8,295,272 - 9,334,915
  Seasonal variation: 12.5%

Forecasting total_km_actual...
  ✓ Generated 12 monthly forecasts
  Range: 21,457,443 - 25,300,580
  Seasonal variation: 17.9%

Forecasting total_tours...
  ✓ Generated 12 monthly forecasts
  Range: 136,930 - 159,077
  Seasonal variation: 16.2%

Forecasting total_drivers...
  ✓ Generated 12 monthly forecasts
  Range: 128,393 - 144,621
  Seasonal variation: 12.6%

Forecasting revenue_total...
  ✓ Generated 12 monthly forecasts
  Range: 12,097,914 - 14,367,048
  Seasonal variation: 18.8%

Forecasting external_drivers...
  ✓ Generated 12 monthly forecasts
  Range: 28,261 - 34,523
  Seasonal variation: 22.2%

Forecasting vehicle_km_cost...
  ✓ Generated 12 monthly forecasts
  Range: 19,729,097 - 23,278,660
 

## Section 5: Consolidate and Display Forecasts

In [6]:
# Consolidate all forecasts into single dataframe
df_forecast_2025 = seasonal_forecasts[target_metrics[0]][['date']].copy()

for metric in target_metrics:
    df_forecast_2025[metric] = seasonal_forecasts[metric][metric].values

# Add month name for display
df_forecast_2025['month_name'] = df_forecast_2025['date'].dt.strftime('%B %Y')

# Display forecasts
print("\n2025 Seasonal Naive Forecasts:")
print("="*80)
print(df_forecast_2025[['month_name'] + target_metrics].to_string(index=False))


2025 Seasonal Naive Forecasts:
    month_name  total_orders  total_km_billed  total_km_actual   total_tours  total_drivers  revenue_total  external_drivers  vehicle_km_cost  vehicle_time_cost  total_vehicle_cost
  January 2025 131959.666667     8.316071e+06     2.177351e+07 141206.666667  129735.000000   1.212052e+07      30514.333333     1.989583e+07       2.614332e+07        4.603914e+07
 February 2025 130556.333333     8.330277e+06     2.145744e+07 136930.000000  128392.666667   1.213945e+07      29716.666667     1.972910e+07       2.587637e+07        4.560547e+07
    March 2025 147637.666667     9.334915e+06     2.530058e+07 159076.666667  144621.333333   1.436705e+07      34523.333333     2.327866e+07       2.955037e+07        5.282903e+07
    April 2025 131186.666667     8.295272e+06     2.331920e+07 145765.666667  128910.333333   1.275412e+07      31307.000000     2.150817e+07       2.736922e+07        4.887739e+07
      May 2025 137724.666667     8.682872e+06     2.479856e+07 

## Section 6: Validation - Compare with Historical Patterns

In [7]:
# Check if forecasts match historical monthly patterns
print("\n" + "="*80)
print("VALIDATION: Historical vs 2025 Forecast Patterns")
print("="*80)

metric = target_metrics[0]  # Check first metric

# Historical monthly averages
df['month'] = df['date'].dt.month
df['month_name'] = df['date'].dt.strftime('%B')
historical_monthly = df.groupby('month')[metric].mean()

# 2025 forecast by month
df_forecast_2025['month'] = df_forecast_2025['date'].dt.month
forecast_monthly = df_forecast_2025.groupby('month')[metric].mean()

# Compare
print(f"\nMetric: {metric}")
print(f"{'Month':<12} {'Historical Avg':>15} {'2025 Forecast':>15} {'Match':>10}")
print("-"*60)

for month in range(1, 13):
    hist_val = historical_monthly[month]
    fore_val = forecast_monthly[month]
    match = "✓" if abs(hist_val - fore_val) < 0.01 else "✗"
    month_name = pd.Timestamp(f'2025-{month:02d}-01').strftime('%B')
    print(f"{month_name:<12} {hist_val:>15,.0f} {fore_val:>15,.0f} {match:>10}")

print("\n✓ Forecasts match historical monthly averages (as expected)")


VALIDATION: Historical vs 2025 Forecast Patterns

Metric: total_orders
Month         Historical Avg   2025 Forecast      Match
------------------------------------------------------------
January              131,960         131,960          ✓
February             130,556         130,556          ✓
March                147,638         147,638          ✓
April                131,187         131,187          ✓
May                  137,725         137,725          ✓
June                 134,151         134,151          ✓
July                 139,183         139,183          ✓
August               138,882         138,882          ✓
September            142,435         142,435          ✓
October              142,911         142,911          ✓
November             135,587         135,587          ✓
December             133,482         133,482          ✓

✓ Forecasts match historical monthly averages (as expected)


## Section 7: Visualize Forecasts

In [8]:
# Create visualization for first 3 metrics
fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=[m.replace('_', ' ').title() for m in target_metrics[:3]],
    vertical_spacing=0.1
)

for idx, metric in enumerate(target_metrics[:3]):
    row = idx + 1
    
    # Historical data
    fig.add_trace(
        go.Scatter(
            x=df['date'],
            y=df[metric],
            mode='lines+markers',
            name='Historical' if idx == 0 else None,
            showlegend=(idx == 0),
            line=dict(color='blue', width=2),
            marker=dict(size=4)
        ),
        row=row, col=1
    )
    
    # 2025 forecast
    fig.add_trace(
        go.Scatter(
            x=df_forecast_2025['date'],
            y=df_forecast_2025[metric],
            mode='lines+markers',
            name='2025 Forecast' if idx == 0 else None,
            showlegend=(idx == 0),
            line=dict(color='red', width=2, dash='dash'),
            marker=dict(size=6, symbol='diamond')
        ),
        row=row, col=1
    )

fig.update_layout(
    title_text="2025 Seasonal Naive Forecasts - Top 3 Metrics",
    height=900,
    showlegend=True,
    hovermode='x unified'
)

fig.show()

# Save
results_dir = Path('../results')
results_dir.mkdir(exist_ok=True)
fig.write_html(results_dir / 'seasonal_naive_forecast_2025.html')
print("\n✓ Saved visualization: results/seasonal_naive_forecast_2025.html")


✓ Saved visualization: results/seasonal_naive_forecast_2025.html


## Section 8: Seasonality Analysis

In [9]:
# Analyze seasonal patterns in forecasts
print("\n" + "="*80)
print("SEASONALITY ANALYSIS - 2025 FORECASTS")
print("="*80)

for metric in target_metrics:
    df_metric = df_forecast_2025[['date', metric]].copy()
    df_metric['month'] = df_metric['date'].dt.month
    
    # Find peak and trough
    peak_idx = df_metric[metric].idxmax()
    trough_idx = df_metric[metric].idxmin()
    
    peak_month = df_metric.loc[peak_idx, 'date'].strftime('%B')
    trough_month = df_metric.loc[trough_idx, 'date'].strftime('%B')
    
    peak_val = df_metric.loc[peak_idx, metric]
    trough_val = df_metric.loc[trough_idx, metric]
    
    variation = ((peak_val / trough_val - 1) * 100)
    
    print(f"\n{metric}:")
    print(f"  Peak: {peak_month} ({peak_val:,.0f})")
    print(f"  Trough: {trough_month} ({trough_val:,.0f})")
    print(f"  Seasonal variation: {variation:.1f}%")


SEASONALITY ANALYSIS - 2025 FORECASTS

total_orders:
  Peak: March (147,638)
  Trough: February (130,556)
  Seasonal variation: 13.1%

total_km_billed:
  Peak: March (9,334,915)
  Trough: April (8,295,272)
  Seasonal variation: 12.5%

total_km_actual:
  Peak: March (25,300,580)
  Trough: February (21,457,443)
  Seasonal variation: 17.9%

total_tours:
  Peak: March (159,077)
  Trough: February (136,930)
  Seasonal variation: 16.2%

total_drivers:
  Peak: March (144,621)
  Trough: February (128,393)
  Seasonal variation: 12.6%

revenue_total:
  Peak: March (14,367,048)
  Trough: December (12,097,914)
  Seasonal variation: 18.8%

external_drivers:
  Peak: March (34,523)
  Trough: November (28,261)
  Seasonal variation: 22.2%

vehicle_km_cost:
  Peak: March (23,278,660)
  Trough: February (19,729,097)
  Seasonal variation: 18.0%

vehicle_time_cost:
  Peak: March (29,550,372)
  Trough: February (25,876,370)
  Seasonal variation: 14.2%

total_vehicle_cost:
  Peak: March (52,829,033)
  Troug

## Section 9: Save Results

In [10]:
# Save consolidated forecast
output_dir = Path('../data/processed')
output_dir.mkdir(exist_ok=True)

# Remove month columns before saving
df_output = df_forecast_2025.drop(columns=['month_name', 'month'], errors='ignore')

df_output.to_csv(output_dir / 'seasonal_naive_forecast_2025.csv', index=False)
print(f"✓ Saved: data/processed/seasonal_naive_forecast_2025.csv")

# Save individual metric forecasts for reference
for metric in target_metrics:
    metric_df = seasonal_forecasts[metric].copy()
    metric_df.to_csv(output_dir / f'seasonal_naive_{metric}_2025.csv', index=False)

print(f"✓ Saved {len(target_metrics)} individual metric files")

print(f"\n{'='*80}")
print(f"SEASONAL NAIVE FORECASTING COMPLETE!")
print(f"{'='*80}")
print(f"\n✓ Generated 2025 forecasts for {len(target_metrics)} metrics")
print(f"✓ Captured historical seasonal patterns")
print(f"✓ Ready for use in Notebook 14 consolidation")

✓ Saved: data/processed/seasonal_naive_forecast_2025.csv


✓ Saved 10 individual metric files

SEASONAL NAIVE FORECASTING COMPLETE!

✓ Generated 2025 forecasts for 10 metrics
✓ Captured historical seasonal patterns
✓ Ready for use in Notebook 14 consolidation
