# Applied Time Series Analysis and Forecasting

## Complete Learning Course

This notebook consolidates all learning materials for the ATSAF project. It covers the full lifecycle of time series forecasting:

**Part I: Foundations**
1. Introduction & Course Overview
2. Time Series Objects & Contracts

**Part II: Data Pipeline**
3. Data Ingestion & Preparation
4. Data Quality & Preprocessing

**Part III: Analysis & Diagnostics**
5. Transformations & Stationarity
6. ACF, PACF & Residual Diagnostics

**Part IV: Modeling & Evaluation**
7. Backtesting & Cross-Validation
8. Experimentation & Model Training
9. Metrics & Evaluation
10. Probabilistic Forecasting

**Part V: Advanced Topics**
11. Exogenous Regressors
12. Hierarchical & Multi-Series
13. Special Data Types
14. Model Selection & Ensembling

**Part VI: Production**
15. Orchestration & Pipeline DAG
16. Monitoring & Drift Detection

---

### Learning Paths

**Quick Path (~2 hours):** Sections 1, 2, 3, 8, 15

**Full Path (~9 hours):** All sections with walkthroughs

**Modification Path (~10+ hours):** Full path + exercises + source code reading

In [None]:
# ========================================
# SETUP: Run this cell first
# ========================================

from pathlib import Path
import sys
import os

# Ensure src is importable
root = Path.cwd()
if (root / 'src').exists():
    sys.path.insert(0, str(root))
elif (root.parent / 'src').exists():
    sys.path.insert(0, str(root.parent))
    root = root.parent

# Standard imports
import pandas as pd
import numpy as np
import json

# Load environment variables
from dotenv import load_dotenv
load_dotenv()

# Verify setup
api_key = os.getenv("EIA_API_KEY")
if api_key:
    print("EIA API key loaded successfully")
else:
    print("WARNING: EIA_API_KEY not found in .env file")

print(f"Working directory: {root}")
print(f"Python version: {sys.version}")

---

# Part I: Foundations

---

## Section 1: Introduction & Course Overview

### Learning Outcomes
- [ ] Understand the repository structure and purpose
- [ ] Know how to navigate between learning materials and source code
- [ ] Set up the environment for running examples

### Overview

This course teaches applied time series forecasting using real EIA (Energy Information Administration) electricity generation data. You'll learn:

1. **Data handling**: Fetching, validating, and preparing time series data
2. **Modeling**: Training and evaluating forecasting models with cross-validation
3. **Production**: Building pipelines, monitoring drift, and triggering alerts

### Repository Structure

```
atsaf/
├── src/
│   ├── chapter0/    # Time series object helpers
│   ├── chapter1/    # Data ingestion & validation
│   ├── chapter2/    # Backtesting & evaluation
│   ├── chapter3/    # Pipeline orchestration
│   └── chapter4/    # Monitoring & alerts
├── chapters/        # Jupyter notebooks
├── docs/            # Learning documentation
├── data/            # Raw and processed data
└── artifacts/       # Model outputs
```

### Design Principles

1. **Verifiability**: Every example is runnable, not pseudo-code
2. **Fail-Loud**: Code raises clear errors when assumptions are violated
3. **Idempotency**: Tasks can be re-run safely with same results
4. **Metrics Over Promises**: Success is measured, not assumed

## Section 2: Time Series Objects & Contracts

### Learning Outcomes
- [ ] Explain the Python equivalents of R ts / tsibble / timetk
- [ ] Create a forecasting-ready DataFrame with columns unique_id, ds, y
- [ ] Normalize timestamps to UTC and reason about DST edge cases
- [ ] Validate time-series integrity before modeling

### Concepts

| R Concept | Python Equivalent | Description |
|-----------|------------------|-------------|
| `ts` | `pd.Series` with `DatetimeIndex` | Single time series |
| `tsibble` | DataFrame with `unique_id, ds, y` | Tidy time series table |
| `timetk` | pandas `.dt`, `shift`, `rolling` | Time-based feature helpers |

### The Data Contract

StatsForecast and our pipeline expect data in this format:

| Column | Type | Description |
|--------|------|-------------|
| `unique_id` | string | Series identifier (e.g., "NG_US48") |
| `ds` | datetime | Timestamp (UTC, timezone-naive) |
| `y` | float | Numeric value to forecast |

### Invariants (Must Always Hold)
- `ds` is normalized to UTC (timezone-naive in the pipeline)
- No duplicate `(unique_id, ds)` pairs
- Regular hourly frequency with no gaps
- Data sorted by `unique_id, ds`

In [None]:
# Section 2 Walkthrough: Create a single-series "ts" object

import pandas as pd

# Create a UTC-aware time index
idx = pd.date_range("2024-01-01", periods=6, freq="h", tz="UTC")
y = pd.Series([100, 102, 98, 101, 103, 99], index=idx)

print("Type:", type(y))
print("Timezone:", y.index.tz)
print("\nSeries:")
print(y)

In [None]:
# Section 2 Walkthrough: Convert to "tsibble" style table

# Convert series to DataFrame with contract columns
df = y.reset_index()
df.columns = ["ds", "y"]

# Convert to timezone-naive UTC (required by StatsForecast)
df["ds"] = pd.to_datetime(df["ds"], utc=True).dt.tz_localize(None)

# Add series identifier
df["unique_id"] = "NG_US48"

# Reorder columns to match contract
df = df[["unique_id", "ds", "y"]]

print("Columns:", df.columns.tolist())
print("\nDataFrame:")
print(df)

In [None]:
# Section 2 Walkthrough: Validate the time-series contract

from src.chapter1.validate import validate_time_index, print_validation_report

report = validate_time_index(df)
print_validation_report(report)

In [None]:
# Section 2 Walkthrough: Add timetk-style features (safe, leakage-free)

df_features = df.assign(
    hour=df["ds"].dt.hour,
    dayofweek=df["ds"].dt.dayofweek,
    y_lag1=df["y"].shift(1),
    y_roll24=df["y"].rolling(24, min_periods=1).mean()
)
print(df_features)

### Checkpoint Questions

<details>
<summary>1. What is the Python equivalent of an R `ts` object?</summary>

A `pd.Series` with a `DatetimeIndex`.
</details>

<details>
<summary>2. Why does the pipeline require `unique_id, ds, y` even for a single series?</summary>

StatsForecast is multi-series-first; the contract stays consistent across one or many series.
</details>

<details>
<summary>3. What two DST problems does the integrity check catch?</summary>

Fall-back duplicates and spring-forward missing hours.
</details>

---

# Part II: Data Pipeline

---

## Section 3: Data Ingestion & Preparation

### Learning Outcomes
- [ ] Pull raw EIA electricity data via REST API with pagination
- [ ] Validate time-series data for duplicates, missing hours, DST edge cases
- [ ] Transform raw data into forecasting-ready format (unique_id, ds, y)
- [ ] Explain why UTC normalization and data sorting matter

### Concepts

- **API Pagination**: Splitting large datasets into fixed-size chunks
- **Datetime Normalization**: Converting all timestamps to UTC
- **Time-series Integrity**: Detecting duplicates, missing hours, repeated timestamps
- **Monotonicity**: Ensuring timestamps are sorted chronologically

### Architecture

**Inputs:**
- EIA API credentials (`EIA_API_KEY` in `.env`)
- Date range (start_date, end_date)
- Series identifier (respondent, fueltype)

**Outputs:**
- `raw.parquet`: Unmodified API response
- `clean.parquet`: Normalized data with `unique_id, ds, y`
- `metadata.json`: Data snapshot (row count, integrity report)

### Files Touched
- `src/chapter1/eia_data_simple.py` - Main orchestrator
- `src/chapter1/ingest.py` - Paginated API calls
- `src/chapter1/prepare.py` - Datetime parsing
- `src/chapter1/validate.py` - Integrity checks

In [None]:
# Section 3 Walkthrough: Initialize the EIA data fetcher

from src.chapter1.eia_data_simple import EIADataFetcher
import os

api_key = os.getenv("EIA_API_KEY")
if not api_key:
    print("ERROR: Set EIA_API_KEY in .env file")
else:
    fetcher = EIADataFetcher(api_key)
    print("Fetcher initialized successfully")

In [None]:
# Section 3 Walkthrough: Pull and prepare data
# Note: This requires a valid EIA_API_KEY

if api_key:
    # Pull raw data (1 month for demo)
    df_raw = fetcher.pull_data(
        start_date="2023-06-01",
        end_date="2023-06-30",
        respondent="US48",
        fueltype="NG"
    )
    print(f"Raw rows: {len(df_raw)}, Columns: {df_raw.columns.tolist()}")
    
    # Prepare (normalize) data
    df_prepared = fetcher.prepare_data(df_raw)
    print(f"\nPrepared data types:\n{df_prepared.dtypes}")
else:
    print("Skipped: API key not available")

In [None]:
# Section 3 Walkthrough: Validate and format for forecasting

if api_key:
    # Validate integrity
    is_valid = fetcher.validate_data(df_prepared)
    print(f"Basic validation: {is_valid}")
    
    integrity = fetcher.validate_time_series_integrity(df_prepared, unique_id="respondent")
    print(f"Integrity status: {integrity['status']}")
    
    # Format for forecasting
    df_forecast = fetcher.prepare_for_forecasting(df_prepared, unique_id="respondent")
    print(f"\nForecast-ready columns: {df_forecast.columns.tolist()}")
    print(df_forecast.head())
else:
    print("Skipped: API key not available")

### Checkpoint Questions

<details>
<summary>1. Why is UTC normalization required?</summary>

Backtesting assumes monotonic, non-overlapping timestamps. Local timezones have repeated/missing hours during DST transitions. UTC eliminates this ambiguity.
</details>

<details>
<summary>2. What are the three checks in `validate_time_series_integrity()`?</summary>

Duplicates (same timestamp twice), missing hours (gaps in sequence), DST repeated hours (detected by duplicate).
</details>

## Section 4: Data Quality & Preprocessing

### Learning Outcomes
- [ ] Detect missing timestamps, duplicates, and ordering issues
- [ ] Standardize time zones and maintain clean unique_id/ds/y contract
- [ ] Repair gaps without leaking future information

### Concepts

- **Missing timestamps**: Gaps in the expected frequency
- **Duplicates**: Multiple rows with the same (unique_id, ds)
- **Integrity gate**: A hard check that stops the pipeline when data is invalid
- **Resampling**: Creating a regular time index and aligning data to it

### Failure Modes
- DST creates duplicates or missing hours
- API returns unsorted or partial data
- Local time zone slips into modeling steps

In [None]:
# Section 4 Walkthrough: Detect and handle data quality issues

import pandas as pd
import numpy as np
from src.chapter1.validate import validate_time_index

# Create a series with issues
ds = pd.date_range('2024-01-01', periods=72, freq='h')
df = pd.DataFrame({'unique_id': 'series_1', 'ds': ds, 'y': np.arange(len(ds))})

# Introduce a missing hour and a duplicate
df_broken = df.drop(index=[10]).reset_index(drop=True)
df_broken = pd.concat([df_broken, df_broken.iloc[[20]]], ignore_index=True)

# Validate
result = validate_time_index(df_broken)
print("Validation result:")
print(result)

In [None]:
# Section 4: Repair the issues

# Remove duplicates
df_fixed = df_broken.drop_duplicates(subset=["unique_id", "ds"]).sort_values("ds")

# Align to expected frequency
full_index = pd.date_range(df_fixed["ds"].min(), df_fixed["ds"].max(), freq="h")
df_fixed = df_fixed.set_index("ds").reindex(full_index).rename_axis("ds").reset_index()
df_fixed["unique_id"] = df_fixed["unique_id"].ffill()

print(f"Missing values after alignment:\n{df_fixed.isna().sum()}")

# Forward fill short gaps (1 hour max)
df_fixed["y"] = df_fixed["y"].ffill(limit=1)
print(f"\nMissing values after forward fill:\n{df_fixed.isna().sum()}")

### Checkpoint Questions

<details>
<summary>1. Why must ds be timezone-naive UTC before modeling?</summary>

StatsForecast and backtesting assume a clean, monotonic UTC index.
</details>

<details>
<summary>2. What is the difference between missing hours and duplicates?</summary>

Missing hours are gaps in the expected frequency; duplicates are repeated timestamps.
</details>

---

# Part III: Analysis & Diagnostics

---

## Section 5: Transformations & Stationarity

### Learning Outcomes
- [ ] Apply log or Box-Cox transforms safely
- [ ] Difference a series to remove trend
- [ ] Run a stationarity test when needed

### Concepts

- **Transformations**: Log or Box-Cox to stabilize variance
- **Differencing**: Subtract lagged values to remove trend
- **Stationarity**: Stable mean and variance over time
- **Unit root tests**: ADF or KPSS as a quick check

In [None]:
# Section 5 Walkthrough: Transformations and stationarity testing

import pandas as pd
import numpy as np

# Create a trending series
ds = pd.date_range('2024-01-01', periods=200, freq='h')
trend = 5 + 0.05 * np.arange(len(ds))
noise = np.random.normal(scale=0.5, size=len(ds))
y = trend + noise
series = pd.Series(y, index=ds)

# Apply transformations
log_series = np.log1p(series)
diff_series = series.diff().dropna()

print('Original mean:', round(series.mean(), 2))
print('Diff mean:', round(diff_series.mean(), 4))

# Run ADF test (if statsmodels available)
try:
    from statsmodels.tsa.stattools import adfuller
    p_value = adfuller(series.values)[1]
    p_value_diff = adfuller(diff_series.values)[1]
    print(f'\nADF p-value (original): {p_value:.4f}')
    print(f'ADF p-value (differenced): {p_value_diff:.4f}')
    print('\nInterpretation: p < 0.05 suggests stationarity')
except Exception as exc:
    print(f'statsmodels not available: {exc}')

### Checkpoint Questions

<details>
<summary>1. When should you consider differencing a series?</summary>

When trend dominates and residuals show non-stationarity.
</details>

<details>
<summary>2. Why do we avoid log(0) without an offset?</summary>

log(0) is undefined, so we use log1p or an offset.
</details>

## Section 6: ACF, PACF & Residual Diagnostics

### Learning Outcomes
- [ ] Compute ACF/PACF for model clues
- [ ] Run a basic residual diagnostic
- [ ] Interpret lag structure

### Concepts

- **ACF (Autocorrelation Function)**: Correlation between values at different lags
- **PACF (Partial ACF)**: Correlation after removing effects of intermediate lags
- **Residual diagnostics**: Checking if model residuals are white noise

In [None]:
# Section 6 Walkthrough: ACF and PACF analysis

import pandas as pd
import numpy as np

# Create a seasonal series
ds = pd.date_range('2024-01-01', periods=200, freq='h')
y = np.sin(np.arange(len(ds)) / 6.0) + np.random.normal(scale=0.3, size=len(ds))
series = pd.Series(y, index=ds)

# Compute ACF/PACF (if statsmodels available)
try:
    from statsmodels.tsa.stattools import acf, pacf
    acf_vals = acf(series.values, nlags=10)
    pacf_vals = pacf(series.values, nlags=10)
    
    print('ACF values (lags 0-10):')
    print([round(v, 3) for v in acf_vals])
    print('\nPACF values (lags 0-10):')
    print([round(v, 3) for v in pacf_vals])
except Exception as exc:
    print(f'statsmodels not available: {exc}')

---

# Part IV: Modeling & Evaluation

---

## Section 7: Backtesting & Cross-Validation

### Learning Outcomes
- [ ] Build rolling-origin splits without leakage
- [ ] Compute RMSE, MAE, and MASE on holdout windows
- [ ] Compare models fairly using identical splits

### Concepts

- **Rolling-origin CV**: Move the train/test cutoff forward through time
- **Horizon**: Number of steps to predict in each window
- **Leakage**: Using future data to predict the past (always a bug)

### Invariants
- Train end < test start (no overlap)
- Same splits for all models (fair comparison)

In [None]:
# Section 7 Walkthrough: Rolling-origin backtesting

import pandas as pd
import numpy as np
from src.chapter2.backtesting import RollingWindowBacktest
from src.chapter2.evaluation import ForecastMetrics

# Create synthetic data
ds = pd.date_range('2024-01-01', periods=240, freq='h')
y = 10 + 0.1 * np.arange(len(ds)) + np.random.normal(scale=0.5, size=len(ds))
df = pd.DataFrame({'unique_id': 'series_1', 'ds': ds, 'y': y})

# Generate rolling splits
backtest = RollingWindowBacktest(min_train_size=120, test_size=24, step_size=24)
splits = backtest.generate_splits(df, unique_id='series_1')

print(f"Generated {len(splits)} splits")
print(f"First split info: {splits[0].info}")

In [None]:
# Section 7: Score a naive baseline

split = splits[0]
train = df.iloc[split.train_indices]
test = df.iloc[split.test_indices]

# Naive baseline: repeat last training value
yhat = np.repeat(train['y'].iloc[-1], len(test))

rmse = ForecastMetrics.rmse(test['y'].values, yhat)
mae = ForecastMetrics.mae(test['y'].values, yhat)

print(f'Naive Baseline RMSE: {rmse:.3f}')
print(f'Naive Baseline MAE: {mae:.3f}')

### Checkpoint Questions

<details>
<summary>1. Why is rolling-origin CV preferred over random splits?</summary>

It preserves time order and avoids leakage.
</details>

<details>
<summary>2. What does MASE tell you that RMSE does not?</summary>

MASE scales error relative to a seasonal naive baseline, so MASE < 1 means you're beating the baseline.
</details>

## Section 8: Experimentation & Model Training

### Learning Outcomes
- [ ] Set up and run rolling-origin cross-validation
- [ ] Build a leaderboard comparing multiple models
- [ ] Explain why RMSE is primary and when MAPE fails
- [ ] Interpret prediction interval coverage

### Concepts

- **Expanding window**: Training set grows over time; test set is fixed
- **Rolling window**: Both train and test windows slide forward
- **Coverage**: Percentage of actual values inside prediction intervals
- **MASE**: Mean Absolute Scaled Error (normalized by seasonal naive)

### Files Touched
- `src/chapter1/eia_data_simple.py` - cross_validate(), evaluate_forecast()
- `src/chapter2/backtesting.py` - RollingWindowBacktest, ExpandingWindowBacktest
- `src/chapter2/evaluation.py` - ForecastMetrics

In [None]:
# Section 8 Walkthrough: Cross-validation with StatsForecast
# Note: This requires a valid EIA_API_KEY and may take a few minutes

if api_key:
    from src.chapter1.eia_data_simple import EIADataFetcher, ExperimentConfig
    
    # Define experiment configuration
    config = ExperimentConfig(
        name="baseline_experiment",
        horizon=24,           # Forecast 24 hours ahead
        n_windows=3,          # 3 train/test splits
        step_size=168,        # Move forward 1 week each time
        confidence_level=95,
        models=["AutoARIMA", "SeasonalNaive"],
        metrics=["rmse", "mape", "mase", "coverage"]
    )
    print(f"Experiment config: {config}")
else:
    print("Skipped: API key not available. See section 3 for data loading.")

## Section 9: Metrics & Evaluation

### Learning Outcomes
- [ ] Compute RMSE, MAPE, MASE, and coverage
- [ ] Understand when MAPE fails and what to use instead
- [ ] Interpret leaderboard rankings

### Metric Reference

| Metric | Formula | Best When |
|--------|---------|----------|
| RMSE | √(mean(error²)) | Penalize large errors |
| MAE | mean(|error|) | Robust to outliers |
| MAPE | mean(|error/actual|) | y never near zero |
| MASE | MAE / naive_MAE | Compare to baseline |
| Coverage | % inside [lo, hi] | Interval calibration |

### MAPE Pitfall

**MAPE = |error| / |actual|** → explodes when actual ≈ 0

Example: Solar generation at night = 0 → MAPE = ∞

**Fix**: Use RMSE/MAE/MASE as primary metrics

In [None]:
# Section 9 Walkthrough: Compute all metrics

import numpy as np
from src.chapter2.evaluation import ForecastMetrics

# Simulated forecast results
rng = np.random.default_rng(42)
y_true = 50 + rng.normal(scale=5, size=100)
y_pred = y_true + rng.normal(scale=3, size=100)

# Compute metrics
rmse = ForecastMetrics.rmse(y_true, y_pred)
mae = ForecastMetrics.mae(y_true, y_pred)
mape = ForecastMetrics.mape(y_true, y_pred)
mase = ForecastMetrics.mase(y_true, y_pred, y_true, season_length=24)

print(f"RMSE: {rmse:.3f}")
print(f"MAE: {mae:.3f}")
print(f"MAPE: {mape:.3f}%")
print(f"MASE: {mase:.3f}")

## Section 10: Probabilistic Forecasting

### Learning Outcomes
- [ ] Compute prediction interval coverage
- [ ] Compare nominal and empirical coverage
- [ ] Spot overconfident or underconfident intervals

### Concepts

- **Prediction interval**: A range that should contain the true value
- **Coverage**: Percent of actuals inside the interval
- **Calibration**: Agreement between nominal and empirical coverage

**Interpretation:**
- Coverage >> nominal: Intervals too wide (overconfident)
- Coverage << nominal: Intervals too tight (underconfident)

In [None]:
# Section 10 Walkthrough: Prediction interval coverage

import numpy as np
from src.chapter2.evaluation import ForecastMetrics

# Simulate predictions with intervals
rng = np.random.default_rng(7)
y_true = rng.normal(loc=100, scale=5, size=200)
yhat = y_true + rng.normal(scale=2, size=200)

# Create 95% prediction interval
interval_width = 4.0
lower = yhat - interval_width
upper = yhat + interval_width

# Compute coverage
coverage = ForecastMetrics.coverage(y_true, lower, upper)

print(f"Empirical coverage: {coverage:.1f}%")
print(f"Nominal coverage: 95%")
print(f"Gap: {95 - coverage:.1f}%")

if coverage < 90:
    print("\nWarning: Intervals too tight (underconfident)")
elif coverage > 98:
    print("\nWarning: Intervals too wide (overconfident)")
else:
    print("\nIntervals are well-calibrated")

---

# Part V: Advanced Topics

---

## Section 11: Exogenous Regressors

### Learning Outcomes
- [ ] Add weather, holiday, or event features
- [ ] Align features to avoid leakage
- [ ] Measure feature impact

### Concepts

- **Exogenous variables**: External drivers that influence the forecast
- **Feature alignment**: Ensuring features are available at forecast time
- **Leakage prevention**: Only use past/known values for future predictions

In [None]:
# Section 11 Walkthrough: Exogenous features

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from src.chapter2.evaluation import ForecastMetrics

# Create data with exogenous drivers
ds = pd.date_range('2024-01-01', periods=240, freq='h')
temp = 20 + 5 * np.sin(np.arange(len(ds)) / 24.0)  # Temperature
event = (np.arange(len(ds)) % 72 == 0).astype(int)  # Events every 3 days
y = 50 + 0.8 * temp + 10 * event + np.random.normal(scale=2.0, size=len(ds))

df = pd.DataFrame({'ds': ds, 'temp': temp, 'event': event, 'y': y})

# Train/test split
train = df.iloc[:-24]
test = df.iloc[-24:]

# Model with exogenous features
model = LinearRegression()
model.fit(train[['temp', 'event']], train['y'])
pred = model.predict(test[['temp', 'event']])

rmse = ForecastMetrics.rmse(test['y'].values, pred)
print(f'RMSE with exogenous features: {rmse:.3f}')
print(f'Feature coefficients: temp={model.coef_[0]:.2f}, event={model.coef_[1]:.2f}')

## Section 12: Hierarchical & Multi-Series

### Learning Outcomes
- [ ] Structure multiple series with unique_id
- [ ] Compare local vs global summaries
- [ ] Understand why reconciliation matters for totals

### Concepts

- **Multi-series**: Multiple time series in one dataset
- **Local models**: Fit separately per series
- **Global models**: Fit once across all series
- **Reconciliation**: Ensuring forecasts sum correctly

In [None]:
# Section 12 Walkthrough: Multi-series handling

import pandas as pd
import numpy as np

# Create multi-series data
ds = pd.date_range('2024-01-01', periods=48, freq='h')
df = pd.DataFrame({
    'unique_id': ['series_a'] * len(ds) + ['series_b'] * len(ds),
    'ds': list(ds) + list(ds),
    'y': np.concatenate([
        10 + np.random.normal(scale=1.0, size=len(ds)),
        20 + np.random.normal(scale=1.5, size=len(ds)),
    ])
})

# Compare local vs global summaries
local_means = df.groupby('unique_id')['y'].mean()
global_mean = df['y'].mean()

print('Local means (per series):')
print(local_means)
print(f'\nGlobal mean (all series): {global_mean:.2f}')

## Section 13: Special Data Types

### Learning Outcomes
- [ ] Handle zero-heavy or count series
- [ ] Know which metrics fail on zeros

### Special Cases

- **Intermittent demand**: Many zeros with occasional spikes
- **Count data**: Non-negative integers (e.g., units sold)
- **Zero-heavy series**: Solar at night, wind on calm days

### Metric Behavior with Zeros

| Metric | With Zeros |
|--------|------------|
| RMSE | Works fine |
| MAE | Works fine |
| MAPE | Undefined (divide by zero) |
| MASE | Works if baseline exists |

In [None]:
# Section 13 Walkthrough: Handling zero-heavy data

import numpy as np
from src.chapter2.evaluation import ForecastMetrics

# Create intermittent demand series
rng = np.random.default_rng(42)
y_true = rng.poisson(lam=1.0, size=100)
y_true[rng.choice(len(y_true), size=40, replace=False)] = 0  # Add more zeros

yhat = np.maximum(0, y_true + rng.normal(scale=0.5, size=len(y_true)))

# Compute metrics
mae = ForecastMetrics.mae(y_true, yhat)
mape = ForecastMetrics.mape(y_true, yhat)
mase = ForecastMetrics.mase(y_true, yhat, y_true, season_length=1)

print(f'MAE: {mae:.3f}')
print(f'MAPE: {mape:.3f}% (may be unreliable with zeros)')
print(f'MASE: {mase:.3f}')
print(f'\nZero count: {(y_true == 0).sum()} / {len(y_true)}')

## Section 14: Model Selection & Ensembling

### Learning Outcomes
- [ ] Pick a champion from a leaderboard
- [ ] Build a simple ensemble baseline
- [ ] Understand why a naive baseline should never be dropped

### Concepts

- **Champion model**: Best performer on primary metric
- **Ensemble**: Combine multiple model predictions
- **Naive baseline**: Always include to sanity-check complex models

In [None]:
# Section 14 Walkthrough: Simple ensemble

import numpy as np
from src.chapter2.evaluation import ForecastMetrics

# Simulate two model predictions
rng = np.random.default_rng(123)
y_true = rng.normal(loc=50, scale=3, size=60)
model_a = y_true + rng.normal(scale=2, size=60)
model_b = y_true + rng.normal(scale=2.5, size=60)

# Simple mean ensemble
ensemble = (model_a + model_b) / 2

# Compare performance
rmse_a = ForecastMetrics.rmse(y_true, model_a)
rmse_b = ForecastMetrics.rmse(y_true, model_b)
rmse_ens = ForecastMetrics.rmse(y_true, ensemble)

print(f'Model A RMSE: {rmse_a:.3f}')
print(f'Model B RMSE: {rmse_b:.3f}')
print(f'Ensemble RMSE: {rmse_ens:.3f}')

if rmse_ens < min(rmse_a, rmse_b):
    print('\nEnsemble outperforms individual models!')
else:
    print('\nBest single model is the winner')

---

# Part VI: Production

---

## Section 15: Orchestration & Pipeline DAG

### Learning Outcomes
- [ ] Run an end-to-end forecasting pipeline from CLI
- [ ] Understand task decomposition and why idempotency matters
- [ ] Deploy the pipeline as an Airflow DAG

### Concepts

- **Task**: Atomic unit of work (pull data, validate, train, forecast)
- **DAG**: Directed Acyclic Graph of task dependencies
- **Idempotency**: Re-running produces same outputs
- **Atomic writes**: All-or-nothing file operations

### Pipeline Flow

```
ingest_eia() → prepare_clean() → validate_clean() → train_backtest_select() → register_champion() → forecast_publish()
```

### Files Touched
- `src/chapter3/tasks.py` - 6 pipeline tasks
- `src/chapter3/dag_builder.py` - Airflow DAG definition
- `src/chapter3/cli.py` - Command-line interface

In [None]:
# Section 15 Walkthrough: View the pipeline DAG

try:
    from src.chapter3.dag_builder import build_dag_dot
    dot_string = build_dag_dot()
    print("Pipeline DAG (DOT format):")
    print(dot_string)
    print("\nVisualize at: https://dreampuf.github.io/GraphvizOnline/")
except Exception as e:
    print(f"Could not build DAG: {e}")

### Running the Pipeline via CLI

```bash
# Run full pipeline
python -m src.chapter3.cli run \
  --start-date 2023-06-01 \
  --end-date 2023-09-30 \
  --horizon 24 \
  --output-dir artifacts/

# Re-run with overwrite
python -m src.chapter3.cli run \
  --start-date 2023-06-01 \
  --end-date 2023-09-30 \
  --horizon 24 \
  --overwrite
```

### Expected Outputs
- `data/raw.parquet` - Raw API response
- `data/clean.parquet` - Normalized data
- `artifacts/cv_results.parquet` - Cross-validation results
- `artifacts/leaderboard.parquet` - Model rankings
- `artifacts/predictions.parquet` - Final forecasts

### Checkpoint Questions

<details>
<summary>1. What is idempotency and why does it matter?</summary>

Idempotency means re-running with same inputs yields same outputs (no hidden state). It matters because tasks can be safely re-run on failure.
</details>

<details>
<summary>2. If validate_clean() is skipped, what could go wrong?</summary>

Bad data (duplicates, missing hours) enters training. Model learns on corrupt series → poor forecasts.
</details>

## Section 16: Monitoring, Drift Detection & Alerts

### Learning Outcomes
- [ ] Persist forecasts and actuals into a queryable database
- [ ] Compute rolling accuracy metrics and detect model drift
- [ ] Set alert thresholds based on backtest performance
- [ ] Run health checks (freshness, completeness, staleness)

### Concepts

- **Forecast persistence**: Store predictions for later scoring
- **Scoring**: Compare predictions vs actuals
- **Drift**: Model performance degrades over time
- **Alert threshold**: Metric value that triggers action

### Monitoring Database Schema

```sql
pipeline_runs     -- Execution log
forecasts         -- Stored predictions
forecast_scores   -- Rolling metrics
alerts            -- Alert events
```

### Alert Types
- `STALE_DATA` - Data too old
- `MISSING_DATA` - Gaps in data
- `STALE_FORECAST` - Forecasts not scored
- `MODEL_DRIFT` - Performance degraded

In [None]:
# Section 16 Walkthrough: Initialize monitoring database

try:
    from src.chapter4.db import init_monitoring_db
    import sqlite3
    
    db_path = root / "monitoring_demo.db"
    init_monitoring_db(str(db_path))
    
    # Verify tables created
    conn = sqlite3.connect(str(db_path))
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
    tables = cursor.fetchall()
    conn.close()
    
    print(f"Monitoring database created at: {db_path}")
    print(f"Tables: {[t[0] for t in tables]}")
except Exception as e:
    print(f"Could not initialize monitoring: {e}")

In [None]:
# Section 16 Walkthrough: Drift threshold calculation

# Drift threshold = mean + k*std (from backtest)
# k = 2 is a good starting point (2 standard deviations)

backtest_rmse_mean = 10.5
backtest_rmse_std = 1.2
k = 2.0

threshold = backtest_rmse_mean + k * backtest_rmse_std

print(f"Backtest RMSE: {backtest_rmse_mean} ± {backtest_rmse_std}")
print(f"Drift threshold (k={k}): {threshold:.2f}")
print(f"\nAlert triggers when RMSE > {threshold:.2f}")

### Checkpoint Questions

<details>
<summary>1. Why can't we score a 24-hour forecast immediately after generating it?</summary>

The actual value isn't observed yet (it's 24 hours in the future). Safe to score once actuals are available.
</details>

<details>
<summary>2. What does k control in the threshold formula?</summary>

k is the std multiplier. Higher k = wider band (fewer alerts). Lower k = tighter band (more alerts). k=2 is standard.
</details>

---

# Appendices

---

## Appendix A: Quick Reference

### Common Imports

```python
# Core
import pandas as pd
import numpy as np
from pathlib import Path

# Data ingestion
from src.chapter1.eia_data_simple import EIADataFetcher
from src.chapter1.validate import validate_time_index

# Backtesting
from src.chapter2.backtesting import RollingWindowBacktest
from src.chapter2.evaluation import ForecastMetrics

# Pipeline
from src.chapter3.cli import run as pipeline_run

# Monitoring
from src.chapter4.db import init_monitoring_db
from src.chapter4.drift import detect_drift
```

### Data Contract

```python
# Required columns for forecasting
df.columns  # ['unique_id', 'ds', 'y']
df['ds']    # datetime64[ns], timezone-naive UTC
df['y']     # float64, the target variable
```

### CLI Commands

```bash
# Run pipeline
python -m src.chapter3.cli run --start-date 2023-06-01 --end-date 2023-09-30 --horizon 24

# Run tests
pytest tests/ -v
```

## Appendix B: All Checkpoint Answers

### Section 2: Time Series Objects
1. Python equivalent of R `ts`: `pd.Series` with `DatetimeIndex`
2. Why `unique_id, ds, y`: StatsForecast is multi-series-first
3. DST problems caught: Fall-back duplicates, spring-forward missing hours

### Section 3: Data Ingestion
1. UTC required: Eliminates DST ambiguity for backtesting
2. Integrity checks: Duplicates, missing hours, DST repeated hours

### Section 4: Data Quality
1. UTC before modeling: StatsForecast assumes clean monotonic UTC index
2. Missing vs duplicates: Gaps in frequency vs repeated timestamps

### Section 5: Transformations
1. When to difference: When trend dominates and residuals are non-stationary
2. Avoid log(0): Undefined, use log1p or offset

### Section 7: Backtesting
1. Rolling-origin preferred: Preserves time order, avoids leakage
2. MASE vs RMSE: MASE scales error relative to baseline

### Section 15: Orchestration
1. Idempotency: Re-run produces same outputs, safe on failure
2. Skip validate: Bad data enters training, poor forecasts

### Section 16: Monitoring
1. Can't score immediately: Actuals not observed for 24 hours
2. k in threshold: Std multiplier, k=2 is standard

---

## Course Complete!

You've covered the full time series forecasting lifecycle:

1. **Foundations**: Time series objects and data contracts
2. **Data Pipeline**: Ingestion, validation, and quality
3. **Analysis**: Transformations and diagnostics
4. **Modeling**: Backtesting, training, and evaluation
5. **Advanced**: Exogenous features, multi-series, ensembling
6. **Production**: Pipeline orchestration and monitoring

### Next Steps

- [ ] Run the full pipeline on your own data
- [ ] Deploy to Airflow for scheduled runs
- [ ] Set up monitoring alerts for production
- [ ] Extend with additional models and features

### Resources

- Source code: `src/chapter[0-4]/`
- Documentation: `docs/chapter*.md`
- Learning guide: `docs/LEARNING_GUIDE.md`

---

*Last updated: January 2026*