# Evaluation and Backtesting

This notebook demonstrates how to evaluate forecasters using backtesting with time series cross-validation.

## What You'll Learn

- Creating forecast tasks
- Running backtests with different splitters
- Summarizing backtest results
- Comparing multiple models
- Understanding evaluation metrics

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from timesmith import (
    ForecastTask,
    SimpleMovingAverageForecaster,
    ExponentialMovingAverageForecaster,
    ExponentialSmoothingForecaster,
    backtest_forecaster,
    summarize_backtest,
    compare_models,
    ExpandingWindowSplit,
    SlidingWindowSplit,
)

np.random.seed(42)
print("Evaluation and backtesting tools loaded!")

## 1. Create Time Series Data

Let's create a realistic time series for evaluation.

In [None]:
# Create time series with trend and seasonality
dates = pd.date_range("2020-01-01", periods=200, freq="D")

# Trend component
trend = np.linspace(100, 150, len(dates))

# Seasonal component (weekly pattern)
seasonal = 10 * np.sin(2 * np.pi * np.arange(len(dates)) / 7)

# Noise
noise = np.random.normal(0, 5, len(dates))

# Combine
y = pd.Series(trend + seasonal + noise, index=dates, name="value")

# Plot
plt.figure(figsize=(12, 5))
plt.plot(y.index, y.values, linewidth=1.5, alpha=0.7)
plt.title("Time Series Data for Backtesting", fontsize=14, fontweight="bold")
plt.xlabel("Date")
plt.ylabel("Value")
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Data shape: {y.shape}")
print(f"Date range: {y.index[0]} to {y.index[-1]}")

# Create forecast task
task = ForecastTask(y=y, fh=10, frequency="D")
print(f"\nForecast task created:")
print(f"  Forecast horizon: {task.fh}")
print(f"  Frequency: {task.frequency}")

## 2. Run a Simple Backtest

Backtesting uses time series cross-validation to evaluate forecasters.

In [None]:
# Create and fit forecaster
forecaster = SimpleMovingAverageForecaster(window=7)

# Run backtest
result = backtest_forecaster(forecaster, task)

print("Backtest Results:")
print("=" * 60)
print(f"Number of folds: {len(result.results)}")
print(f"\nFirst few results:")
print(result.results[["fold_id", "cutoff", "mae", "rmse", "mape"]].head())

## 3. Summarize Backtest Results

Get aggregate metrics across all folds.

In [None]:
# Summarize backtest
summary = summarize_backtest(result)

print("Backtest Summary:")
print("=" * 60)
print("\nAggregate Metrics:")
for key, value in summary["aggregate_metrics"].items():
    print(f"  {key}: {value:.4f}")

## 4. Expanding Window Split

Expanding window uses all historical data up to each cutoff point.

In [None]:
# Expanding window splitter
splitter = ExpandingWindowSplit(initial_window=50, step=10)

# Get splits
splits = list(splitter.split(y))

print(f"Expanding Window Splits:")
print(f"  Number of splits: {len(splits)}")
print(f"\nFirst 3 splits:")
for i, (train_idx, test_idx) in enumerate(splits[:3]):
    print(f"  Split {i}: train={len(train_idx)}, test={len(test_idx)}")

## 5. Sliding Window Split

Sliding window uses a fixed-size window that moves forward.

In [None]:
# Sliding window splitter
sliding_splitter = SlidingWindowSplit(window=50, step=10)

# Get splits
sliding_splits = list(sliding_splitter.split(y))

print(f"Sliding Window Splits:")
print(f"  Number of splits: {len(sliding_splits)}")
print(f"\nFirst 3 splits:")
for i, (train_idx, test_idx) in enumerate(sliding_splits[:3]):
    print(f"  Split {i}: train={len(train_idx)}, test={len(test_idx)}")

## 6. Compare Multiple Models

Compare different forecasters using the same backtest setup.

In [None]:
# Compare multiple forecasters
forecasters = {
    "Simple MA (7)": SimpleMovingAverageForecaster(window=7),
    "Simple MA (12)": SimpleMovingAverageForecaster(window=12),
    "Exponential MA": ExponentialMovingAverageForecaster(alpha=0.3),
    "Exponential Smoothing": ExponentialSmoothingForecaster(),
}

# Run comparison
comparison = compare_models(forecasters, task, metrics=["mae", "rmse", "mape"])

print("Model Comparison Results:")
print("=" * 70)
print(comparison.to_string())

## Summary

You've learned:
- How to create forecast tasks for evaluation
- How to run backtests with time series cross-validation
- How to summarize backtest results with aggregate metrics
- How to use expanding and sliding window splits
- How to compare multiple models side-by-side

**Key Takeaways:**
- Backtesting is essential for time series evaluation
- Expanding windows use all historical data (more realistic)
- Sliding windows use fixed-size windows (more folds)
- Always compare multiple models to find the best one