# Example 32: Baseline Forecasting Models

**New in v1.0.0**: `null_model()` and `naive_reg()` for baseline forecasting

## Overview

This notebook demonstrates the NEW baseline forecasting models added in py-tidymodels v1.0.0:

### `null_model()` - Statistical Baselines
- **mean**: Average of historical values
- **median**: Median of historical values
- **last**: Last observed value (persistence)

### `naive_reg()` - Time Series Baselines
- **naive**: Last observation (same as null_model last)
- **seasonal_naive**: Value from same period last season
- **drift**: Linear trend from first to last observation
- **window**: Moving average over recent window

## When to Use Baseline Models

**Critical for model evaluation**:
1. Establish performance floor - any advanced model should beat baselines
2. Quick sanity checks - if complex model loses to mean baseline, investigate!
3. Production fallbacks - use baselines when advanced models fail
4. Benchmark comparisons - report "% improvement over naive forecast"

**Sometimes baselines are sufficient**:
- Stable time series with no trend/seasonality
- Limited historical data (< 50 observations)
- High forecast uncertainty makes complex models overkill
- Interpretability is paramount

## Dataset

**JODI Crude Oil Production** (Saudi Arabia):
- Monthly production from 2002-2024
- Units: Thousands of barrels per day (KBD)
- Major oil producer with stable production patterns
- Source: Joint Organisations Data Initiative

In [None]:
# Setup
import pandas as pd
import numpy as np
from datetime import timedelta

# py-tidymodels imports
from py_parsnip import null_model, naive_reg, linear_reg, prophet_reg
from py_workflows import Workflow
from py_rsample import initial_time_split
from py_yardstick import rmse, mae, mape
from py_workflowsets import WorkflowSet
from py_yardstick import metric_set

import warnings
warnings.filterwarnings('ignore')

print("✓ Imports complete")

## 1. Load and Prepare Data

In [None]:
# Load JODI crude production data
df = pd.read_csv('../_md/__data/jodi_crude_production_data.csv')
df['date'] = pd.to_datetime(df['date'])

# Filter to Saudi Arabia only
saudi = df[df['country'] == 'Saudi Arabia'].copy()
saudi = saudi[['date', 'value']].rename(columns={'value': 'production'})
saudi = saudi.sort_values('date').reset_index(drop=True)

# Remove zeros (missing months)
saudi = saudi[saudi['production'] > 0].reset_index(drop=True)

print(f"Saudi Arabia crude oil production:")
print(f"  Records: {len(saudi):,}")
print(f"  Date range: {saudi['date'].min()} to {saudi['date'].max()}")
print(f"  Mean production: {saudi['production'].mean():.0f} KBD")
print(f"  Std dev: {saudi['production'].std():.0f} KBD")
print(f"\nFirst few rows:")
print(saudi.head())

In [None]:
# Train/test split (hold out last 12 months)
split = initial_time_split(saudi, date_column='date', prop=0.90)
train = split.training()
test = split.testing()

print(f"Train: {len(train)} months ({train['date'].min()} to {train['date'].max()})")
print(f"Test:  {len(test)} months ({test['date'].min()} to {test['date'].max()})")
print(f"\nTest period: {len(test)} months for evaluation")

## 2. null_model() - Statistical Baselines

These models use simple statistics from training data.

### 2.1 Mean Baseline

Forecast = mean of all training values

In [None]:
# Mean baseline: predicts average of training data
mean_spec = null_model(strategy='mean')
mean_fit = mean_spec.fit(train, 'production ~ date')

# Predict on test
mean_eval = mean_fit.evaluate(test)
outputs, coeffs, stats = mean_eval.extract_outputs()

print("Mean Baseline:")
print(f"  Training mean: {train['production'].mean():.2f} KBD")
print(f"  Test RMSE: {stats[stats['split']=='test']['rmse'].values[0]:.2f} KBD")
print(f"  Test MAE: {stats[stats['split']=='test']['mae'].values[0]:.2f} KBD")
print(f"  Test MAPE: {stats[stats['split']=='test']['mape'].values[0]:.2f}%")

### 2.2 Median Baseline

Forecast = median of all training values (robust to outliers)

In [None]:
# Median baseline: more robust to outliers
median_spec = null_model(strategy='median')
median_fit = median_spec.fit(train, 'production ~ date')
median_eval = median_fit.evaluate(test)
_, _, stats_median = median_eval.extract_outputs()

print("Median Baseline:")
print(f"  Training median: {train['production'].median():.2f} KBD")
print(f"  Test RMSE: {stats_median[stats_median['split']=='test']['rmse'].values[0]:.2f} KBD")
print(f"  Test MAE: {stats_median[stats_median['split']=='test']['mae'].values[0]:.2f} KBD")

### 2.3 Last Value Baseline (Persistence)

Forecast = last observed training value

In [None]:
# Last value: persistence forecast
last_spec = null_model(strategy='last')
last_fit = last_spec.fit(train, 'production ~ date')
last_eval = last_fit.evaluate(test)
_, _, stats_last = last_eval.extract_outputs()

print("Last Value (Persistence) Baseline:")
print(f"  Last training value: {train['production'].iloc[-1]:.2f} KBD")
print(f"  Test RMSE: {stats_last[stats_last['split']=='test']['rmse'].values[0]:.2f} KBD")
print(f"  Test MAE: {stats_last[stats_last['split']=='test']['mae'].values[0]:.2f} KBD")

## 3. naive_reg() - Time Series Baselines

These models use time series specific patterns.

### 3.1 Naive Forecast

Same as persistence (last value)

In [None]:
# Naive: last observation carried forward
naive_spec = naive_reg(strategy='naive')
naive_fit = naive_spec.fit(train, 'production ~ date')
naive_eval = naive_fit.evaluate(test)
_, _, stats_naive = naive_eval.extract_outputs()

print("Naive Forecast:")
print(f"  Test RMSE: {stats_naive[stats_naive['split']=='test']['rmse'].values[0]:.2f} KBD")
print(f"  Test MAE: {stats_naive[stats_naive['split']=='test']['mae'].values[0]:.2f} KBD")
print(f"\n  (Should match 'last' from null_model)")

### 3.2 Seasonal Naive

Forecast = value from same month last year (12-month seasonality)

In [None]:
# Seasonal naive: value from same period last year
seasonal_spec = naive_reg(strategy='seasonal_naive', seasonal_period=12)
seasonal_fit = seasonal_spec.fit(train, 'production ~ date')
seasonal_eval = seasonal_fit.evaluate(test)
_, _, stats_seasonal = seasonal_eval.extract_outputs()

print("Seasonal Naive (12-month):")
print(f"  Test RMSE: {stats_seasonal[stats_seasonal['split']=='test']['rmse'].values[0]:.2f} KBD")
print(f"  Test MAE: {stats_seasonal[stats_seasonal['split']=='test']['mae'].values[0]:.2f} KBD")
print(f"\n  Uses value from same month 12 months ago")

### 3.3 Drift Method

Extrapolates linear trend from first to last training observation

In [None]:
# Drift: linear extrapolation from first to last value
drift_spec = naive_reg(strategy='drift')
drift_fit = drift_spec.fit(train, 'production ~ date')
drift_eval = drift_fit.evaluate(test)
_, _, stats_drift = drift_eval.extract_outputs()

print("Drift Method:")
first_val = train['production'].iloc[0]
last_val = train['production'].iloc[-1]
n_train = len(train)
drift_rate = (last_val - first_val) / (n_train - 1)

print(f"  First value: {first_val:.2f} KBD")
print(f"  Last value: {last_val:.2f} KBD")
print(f"  Drift rate: {drift_rate:.2f} KBD/month")
print(f"  Test RMSE: {stats_drift[stats_drift['split']=='test']['rmse'].values[0]:.2f} KBD")
print(f"  Test MAE: {stats_drift[stats_drift['split']=='test']['mae'].values[0]:.2f} KBD")

### 3.4 Window Average (Moving Average)

Forecast = average of last N observations

In [None]:
# Window average: mean of last 6 months
window_spec = naive_reg(strategy='window', window_size=6)
window_fit = window_spec.fit(train, 'production ~ date')
window_eval = window_fit.evaluate(test)
_, _, stats_window = window_eval.extract_outputs()

print("Window Average (6 months):")
recent_avg = train['production'].iloc[-6:].mean()
print(f"  Last 6 months average: {recent_avg:.2f} KBD")
print(f"  Test RMSE: {stats_window[stats_window['split']=='test']['rmse'].values[0]:.2f} KBD")
print(f"  Test MAE: {stats_window[stats_window['split']=='test']['mae'].values[0]:.2f} KBD")

## 4. Baseline Comparison Framework

Compare all baselines side-by-side using WorkflowSet.

In [None]:
# Create all baseline workflows
baseline_models = [
    ('mean', null_model(strategy='mean')),
    ('median', null_model(strategy='median')),
    ('last', null_model(strategy='last')),
    ('naive', naive_reg(strategy='naive')),
    ('seasonal_naive_12m', naive_reg(strategy='seasonal_naive', seasonal_period=12)),
    ('drift', naive_reg(strategy='drift')),
    ('window_6m', naive_reg(strategy='window', window_size=6)),
    ('window_3m', naive_reg(strategy='window', window_size=3)),
]

# Create workflows
baseline_workflows = []
for name, model in baseline_models:
    wf = Workflow().add_formula('production ~ date').add_model(model)
    baseline_workflows.append(wf)

# Create WorkflowSet
wf_set_baselines = WorkflowSet.from_workflows(baseline_workflows)

print(f"Created {len(baseline_workflows)} baseline workflows")
print(f"Workflow IDs: {list(wf_set_baselines.workflows.keys())}")

In [None]:
# Fit all baseline models on train data
baseline_results = []
for wf_id, wf in wf_set_baselines.workflows.items():
    fit = wf.fit(train)
    eval_fit = fit.evaluate(test)
    _, _, stats = eval_fit.extract_outputs()
    
    test_stats = stats[stats['split'] == 'test'].iloc[0]
    baseline_results.append({
        'model': wf_id,
        'rmse': test_stats['rmse'],
        'mae': test_stats['mae'],
        'mape': test_stats['mape']
    })

baseline_comparison = pd.DataFrame(baseline_results)
baseline_comparison = baseline_comparison.sort_values('rmse')

print("\nBaseline Model Comparison (Test Set):")
print("="*70)
print(baseline_comparison.to_string(index=False))
print("="*70)
print(f"\nBest baseline: {baseline_comparison.iloc[0]['model']}")
print(f"  RMSE: {baseline_comparison.iloc[0]['rmse']:.2f} KBD")
print(f"  MAE: {baseline_comparison.iloc[0]['mae']:.2f} KBD")

## 5. Baselines vs Advanced Models

Compare baselines against advanced models (Linear Regression and Prophet).

In [None]:
# Add advanced models
advanced_models = [
    ('linear_reg', linear_reg()),
    ('prophet', prophet_reg())
]

# Create workflows for advanced models
all_workflows = baseline_workflows.copy()
for name, model in advanced_models:
    wf = Workflow().add_formula('production ~ date').add_model(model)
    all_workflows.append(wf)

# Create WorkflowSet with all models
wf_set_all = WorkflowSet.from_workflows(all_workflows)

print(f"Total workflows: {len(all_workflows)} (8 baselines + 2 advanced)")

In [None]:
# Fit all models
all_results = []
for wf_id, wf in wf_set_all.workflows.items():
    try:
        fit = wf.fit(train)
        eval_fit = fit.evaluate(test)
        _, _, stats = eval_fit.extract_outputs()
        
        test_stats = stats[stats['split'] == 'test'].iloc[0]
        model_type = 'baseline' if wf_id in wf_set_baselines.workflows else 'advanced'
        
        all_results.append({
            'model': wf_id,
            'type': model_type,
            'rmse': test_stats['rmse'],
            'mae': test_stats['mae'],
            'mape': test_stats['mape']
        })
    except Exception as e:
        print(f"Warning: {wf_id} failed - {str(e)[:50]}")

all_comparison = pd.DataFrame(all_results)
all_comparison = all_comparison.sort_values('rmse')

print("\nAll Models Comparison (Test Set):")
print("="*80)
print(all_comparison.to_string(index=False))
print("="*80)

In [None]:
# Calculate improvement over best baseline
best_baseline_rmse = all_comparison[all_comparison['type']=='baseline']['rmse'].min()
all_comparison['improvement_vs_baseline'] = (
    (best_baseline_rmse - all_comparison['rmse']) / best_baseline_rmse * 100
)

print("\nImprovement vs Best Baseline:")
print("="*80)
print(all_comparison[['model', 'type', 'rmse', 'improvement_vs_baseline']].to_string(index=False))
print("="*80)

# Show advanced model performance
advanced_results = all_comparison[all_comparison['type']=='advanced']
if len(advanced_results) > 0:
    print(f"\nAdvanced Models:")
    for _, row in advanced_results.iterrows():
        if row['improvement_vs_baseline'] > 0:
            print(f"  ✓ {row['model']}: {row['improvement_vs_baseline']:.1f}% better than best baseline")
        else:
            print(f"  ✗ {row['model']}: {abs(row['improvement_vs_baseline']):.1f}% WORSE than best baseline")
            print(f"    → Investigation needed! Advanced model should beat baselines.")

## 6. Key Takeaways

### When Each Baseline Works Best

1. **Mean/Median**: 
   - Stationary series with no trend
   - Random walk patterns
   - Long-term averages

2. **Naive/Last Value**:
   - Strong persistence (tomorrow ≈ today)
   - Short-term forecasts
   - Stable recent patterns

3. **Seasonal Naive**:
   - Strong seasonal patterns
   - Consistent year-over-year behavior
   - Retail sales, energy demand

4. **Drift**:
   - Clear linear trends
   - No seasonality
   - Steady growth/decline

5. **Window Average**:
   - Smoothing noisy data
   - Recent changes more important
   - Balancing stability vs responsiveness

### Best Practices

1. **Always establish baseline performance FIRST**
   - Run all baseline models before complex ones
   - Document baseline RMSE/MAE as reference

2. **Advanced models MUST beat baselines**
   - If linear_reg loses to mean, check for bugs
   - If Prophet loses to seasonal_naive, investigate why
   - Report "% improvement over naive forecast"

3. **Use baselines as production fallbacks**
   - If advanced model fails, fall back to seasonal_naive
   - Baselines never fail (no hyperparameters, no convergence issues)

4. **Choose baseline strategy based on data characteristics**
   - Check for trend → use drift
   - Check for seasonality → use seasonal_naive
   - Neither → use mean or window average

### Production Deployment

```python
# Production pattern: baseline fallback
try:
    predictions = advanced_model.predict(new_data)
except Exception as e:
    logger.warning(f"Advanced model failed: {e}")
    logger.info("Falling back to seasonal_naive baseline")
    predictions = baseline_model.predict(new_data)
```

### Common Pitfalls

1. **Seasonal period mismatch**
   - Weekly data: use seasonal_period=52 (not 12)
   - Daily data: use seasonal_period=7 or 365

2. **Window size too small/large**
   - Too small: noisy predictions
   - Too large: slow to react to changes
   - Rule of thumb: 3-12 observations

3. **Ignoring baseline performance**
   - If complex model only 2% better → use baseline (simpler)
   - If complex model 50% better → definitely worth complexity

## Summary

This notebook demonstrated:

✅ All `null_model()` strategies (mean, median, last)  
✅ All `naive_reg()` strategies (naive, seasonal_naive, drift, window)  
✅ Baseline comparison framework with WorkflowSet  
✅ Baselines vs advanced models (linear_reg, prophet_reg)  
✅ When to use each baseline strategy  
✅ Best practices for production deployment  

**Key Insight**: Baselines establish the performance floor. Any advanced model should beat the best baseline by a meaningful margin (10-20%+), otherwise the added complexity isn't justified.

**Next Steps**:
- Example 33: Recursive multistep forecasting
- Example 34: Gradient boosting engines comparison