# 1. Overview & Objectives  
This notebook implements the statistical forecasting models used in the project:
- Simple Exponential Smoothing (SES)
- Holt Linear Trend
- Holt-Winters Seasonal
- Autoregressive (AR)
- Moving Average (MA)
- ARIMA

Each model produces validation and test forecasts, and all results are evaluated using the shared evaluate_and_save() function.
The parameter configurations are saved into statsmodels_results.csv, enabling visualizations.ipynb to load and plot statistical models using a unified interface.

Outputs
- Forecasts for validation and test sets
- Evaluation metrics (MAE, RMSE, etc.)


# 2. Imports & Setup

In [17]:
# Importing the helper notebooks

## Enable imports from .ipynb files
import import_ipynb  
import sys
sys.path.append("code")

## Importing the helper notebooks as modules
from splitting import split_time_series
from metrics import evaluate_and_save, load_best_models

# Notebook specific imports
import numpy as np
import pandas as pd

from statsmodels.tsa.holtwinters import SimpleExpSmoothing, ExponentialSmoothing
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tsa.arima.model import ARIMA

# 3. Load Data & Train/Val/Test Split
use `split_time_series()`

In [18]:
splits = split_time_series(
    csv_path="../data/processed/processed_weather_data_michigan.csv",
    date_col="time",
    y_col="tavg",
    train_end="2015-12-31",
    val_end="2020-12-31",
    test_end="2025-11-30",
    start_date="1980-01-01",
    end_date="2025-11-30"
)

train_df = splits["train"]
val_df   = splits["val"]
test_df  = splits["test"]

h_val  = len(val_df)
h_test = len(test_df)

# 4. Model Definition  
Clearly specify:  
- Model names  
- Hyperparameters  

In [19]:
def train_ses(train_df):
    model = SimpleExpSmoothing(train_df["tavg"])
    return model.fit(optimized=True)

def train_holt(train_df):
    model = ExponentialSmoothing(train_df["tavg"], trend="add", seasonal=None)
    return model.fit(optimized=True)

def train_holt_winters(train_df, season_length=365):
    model = ExponentialSmoothing(
        train_df["tavg"],
        trend="add",
        seasonal="add",
        seasonal_periods=season_length
    )
    return model.fit(optimized=True)

def train_ar(train_df, lags=7):
    model = AutoReg(train_df["tavg"], lags=lags)
    return model.fit()

def train_ma(train_df, q=7):
    return ARIMA(train_df["tavg"], order=(0,0,q)).fit()

def train_arima(train_df, p=2, d=1, q=2):
    return ARIMA(train_df["tavg"], order=(p,d,q)).fit()

In [20]:
# model registry
stat_models = {
    "ses": {
        "train_fn": train_ses,
        "params": {}
    },
    "holt": {
        "train_fn": train_holt,
        "params": {}
    },
    "holt_winters": {
        "train_fn": train_holt_winters,
        "params": {"season_length": 365}
    },
    "ar": {
        "train_fn": train_ar,
        "params": {"lags": 7}
    },
    "ma": {
        "train_fn": train_ma,
        "params": {"q": 7}
    },
    "arima": {
        "train_fn": train_arima,
        "params": {"p": 2, "d": 1, "q": 2}
    }
}


# 5. Training  
For each model:  
- Fit on training data  
- (For ML & Neural) prepare features / loaders / windows

In [21]:
trained_models = {}

for model_name, info in stat_models.items():
    train_fn = info["train_fn"]
    params   = info["params"]

    print(f"Training {model_name}...")
    model_fit = train_fn(train_df, **params)
    trained_models[model_name] = model_fit

Training ses...
Training holt...
Training holt_winters...




Training ar...
Training ma...
Training arima...


# 6. Forecasting  
- Produce forecasts for validation and test horizons

In [None]:
forecasts = {}

for model_name, info in stat_models.items():
    model_fit = trained_models[model_name]
    params    = info["params"]

    # Validation forecast
    val_pred = model_fit.forecast(h_val)

    # Retrain on train + val for test forecasting
    extended_train = pd.concat([train_df, val_df], ignore_index=True)
    model_fit_extended = info["train_fn"](extended_train, **params)
    test_pred = model_fit_extended.forecast(h_test)

    forecasts[model_name] = {
        "val_pred": val_pred,
        "test_pred": test_pred,
        "params": params
    }

# 7. Evaluation (Using Shared Metrics Function)  
- Apply `evaluate_and_save()` to each model  
- Save results as CSV into `data/models/`  
- Display sorted results table  

In [None]:
impl_name = "statistical"
results = []

for model_name, f in forecasts.items():
    val_metrics = evaluate_and_save(
        y_true=val_df["tavg"].values,
        y_pred=f["val_pred"],
        model_name=model_name,
        impl_name=impl_name,
        out_filename="best_param.csv"
    )

    test_metrics = evaluate_and_save(
        y_true=test_df["tavg"].values,
        y_pred=f["test_pred"],
        model_name=model_name,
        impl_name=impl_name,
        out_filename="best_param.csv"
    )

    results.append({
        "Model": model_name,
        **f["params"],
        **val_metrics,
        **{"Test_" + k: v for k, v in test_metrics.items() if k not in ["Model","Impl"]}
    })

results_df = pd.DataFrame(results).sort_values("MAE")
results_df

# 8. Conclusions  
Short wrap-up:  
- Which model family performed best here?  
- Any issues or instability?  
- Notes for integration in the final report  

* **Best Model / Family:** Holt-Winters (seasonal-aware)

  * strongest overall on MAE/RMSE
  * positive R² for validation and test sets
  * captures annual temperature seasonality effectively

* **Weak Models:**

  * SES and Holt - fail to capture seasonality, only level/trend
  * AR and MA - sensitive to non-stationarity, short-term memory
  * ARIMA - moderate performance, requires careful hyperparameter tuning

* **Instability / Issues:**

  * ARIMA/AR/MA may diverge or underperform with long seasonal cycles
  * SES/Holt forecasts smooth but miss seasonal peaks/troughs
  * MAPE unreliable due to temperatures near zero

* **Metric Notes:**

  * Prioritize MAE, RMSE, R² for comparisons
  * MAPE and OPE can be unstable with temperature data

* **Integration Notes:**

  * Holt-Winters chosen as reference statistical baseline
  * ensure seasonality incorporated for downstream models
