# 1. Overview & Objectives  
This notebook implements the baseline forecasting models for the Michigan temperature time series. It generates validation and test forecasts for the following models:
- Naïve
- Historic Average
- Seasonal Naïve
- Random Walk with Drift
- Structural (local trend)

All models are evaluated using the shared evaluate_and_save() function, and the best-performing configurations are written to baseline_results.csv.
The models are implemented so that visualizations.ipynb can load them directly from baseline_results.csv and plot both validation and test forecasts.

Outputs
- Forecasts for validation and test sets
- Evaluation metrics

# 2. Imports & Setup

In [2]:
# Importing the helper notebooks

## Enable imports from .ipynb files
import import_ipynb  
import sys
sys.path.append("code")

## Importing the helper notebooks as modules
from splitting import split_time_series
from metrics import evaluate_and_save, load_best_models

# Notebook specific imports
import numpy as np
import pandas as pd
from statsmodels.tsa.holtwinters import ExponentialSmoothing

# 3. Load Data & Train/Val/Test Split
use `split_time_series()`

In [3]:
splits = split_time_series(
        csv_path="../data/processed/processed_weather_data_michigan.csv",
        date_col="time",
        y_col="tavg",
        train_end="2015-12-31",
        val_end="2020-12-31",
        test_end="2025-11-30",
        start_date="1980-01-01",
        end_date="2025-11-30")

train_df = splits["train"]
val_df   = splits["val"]
test_df  = splits["test"]

h_val  = len(val_df)
h_test = len(test_df)

In [4]:
train_df

Unnamed: 0,time,tavg
0,1980-01-01,-0.85
1,1980-01-02,-3.90
2,1980-01-03,-8.85
3,1980-01-04,-10.30
4,1980-01-05,-7.50
...,...,...
13144,2015-12-27,0.40
13145,2015-12-28,-5.40
13146,2015-12-29,-2.20
13147,2015-12-30,-2.20


# 4. Model Definition  
Clearly specify:  
- Model names  
- Hyperparameters  

In [5]:
# Forecasting functions
def naive_forecast(train_df, horizon):
    last_val = train_df["tavg"].iloc[-1]
    return np.repeat(last_val, horizon)

def historic_average_forecast(train_df, horizon):
    mean_val = train_df["tavg"].mean()
    return np.repeat(mean_val, horizon)

def seasonal_naive_forecast(train_df, horizon, season_length=365):
    seasonal_tail = train_df["tavg"].iloc[-season_length:]
    repeats = int(np.ceil(horizon / season_length))
    return np.tile(seasonal_tail.values, repeats)[:horizon]

def drift_forecast(train_df, horizon):
    y0 = train_df["tavg"].iloc[0]
    yT = train_df["tavg"].iloc[-1]
    T = len(train_df) - 1
    drift = (yT - y0) / T
    return [yT + (i+1)*drift for i in range(horizon)]

def structural_model_forecast(train_df, horizon):
    # Local level + trend, no seasonality
    model = ExponentialSmoothing(train_df["tavg"], trend="add", seasonal=None)
    fit = model.fit(optimized=True)
    return fit.forecast(horizon)

# Define models and parameters
baseline_models = {
    "naive": {"fn": naive_forecast, "params": {}},
    "historic_avg": {"fn": historic_average_forecast, "params": {}},
    "seasonal_naive": {"fn": seasonal_naive_forecast, "params": {"season_length": 365}},
    "drift": {"fn": drift_forecast, "params": {}},
    "structural": {"fn": structural_model_forecast, "params": {}}
}


# 5. Training  
For each model:  
- Fit on training data  
- (For ML & Neural) prepare features / loaders / windows

-> for baseline models not necessary

# 6. Forecasting  
- Produce forecasts for validation and test horizons

In [6]:
forecasts = {}  # Store forecasts for all models

for model_name, model_info in baseline_models.items():
    model_fn = model_info["fn"]
    params   = model_info["params"]

    # Validation forecast
    val_pred = model_fn(train_df, h_val, **params)

    # Test forecast (train + val)
    extended_train = pd.concat([train_df, val_df], ignore_index=True)
    test_pred = model_fn(extended_train, h_test, **params)

    # Store forecasts
    forecasts[model_name] = {
        "val_pred": val_pred,
        "test_pred": test_pred,
        "params": params
    }


# 7. Evaluation (Using Shared Metrics Function)  
- Apply `evaluate_and_save()` to each model  
- Save results as CSV into `data/models/`  
- Display sorted results table  

In [7]:
out_filename = r"C:\Users\Celina Binder\Documents\github respository\TIS3IL-WS25-Project\data\models\baseline_results.csv"  # adjust to save next to code folder
impl_name = "baseline"
results = []

for model_name, fcast in forecasts.items():
    val_pred  = fcast["val_pred"]
    test_pred = fcast["test_pred"]
    params    = fcast["params"]

    val_metrics = evaluate_and_save(
        y_true=val_df["tavg"].values,
        y_pred=val_pred,
        model_name=model_name,
        impl_name=impl_name,
        out_filename="best_param.csv"
    )

    test_metrics = evaluate_and_save(
        y_true=test_df["tavg"].values,
        y_pred=test_pred,
        model_name=model_name,
        impl_name=impl_name,
        out_filename="best_param.csv"
    )

    results.append({
        "Model": model_name,
        **params,
        **val_metrics,
        **{"Test_" + k: v for k, v in test_metrics.items() if k not in ["Model","Impl"]}
    })

best_param_df = pd.DataFrame(results)
best_param_df = best_param_df.sort_values("MAE", ascending=True)
best_param_df

Unnamed: 0,Model,Impl,MAE,RMSE,MAPE,OPE,R2,Test_MAE,Test_RMSE,Test_MAPE,Test_OPE,Test_R2,season_length
2,seasonal_naive,baseline,5.199891,6.754391,9140668000.0,0.051293,0.599095,4.801004,6.081301,3627232000.0,0.012741,0.648629,365.0
1,historic_avg,baseline,9.269835,10.682492,3788844000.0,0.075398,-0.0028,8.985741,10.333199,11703820000.0,0.150078,-0.014479,
0,naive,baseline,11.777942,14.476757,1258894000.0,1.307211,-0.84167,13.133817,15.731179,6194196000.0,1.449818,-1.351235,
4,structural,baseline,11.834208,14.53909,1341333000.0,1.319315,-0.857563,13.263359,15.864526,6432743000.0,1.470122,-1.391265,
3,drift,baseline,11.840556,14.546119,1350465000.0,1.320675,-0.85936,13.266052,15.867307,6438571000.0,1.47056,-1.392103,


# 8. Conclusions  
Short wrap-up:  
- Which model family performed best here?  
- Any issues or instability?  
- Notes for integration in the final report  

* **Best Model / Family:** Seasonal Naïve

  * lowest MAE/RMSE on validation and test
  * positive R², effectively captures annual temperature seasonality
  * season length = 365 days

* **Weak Models:**

  * Naïve, Drift, Structural, Historic Average
  * High MAE/RMSE, negative R²
  * do not capture seasonal patterns

* **Instability / Issues:**

  * MAPE extremely large due to temperatures near zero
  * OPE mostly stable, MAE/RMSE more reliable

* **Metric Notes:**

  * focus on MAE, RMSE, R² for comparisons
  * MAPE unreliable for near-zero temperature data

* **Integration Notes:**

  * Seasonal Naïve chosen as baseline reference
  * ensure models account for seasonality
