# 04_backtest.ipynb

Simple backtesting experiments for univariate time-series forecasting.

**Goal:**
- Compare a baseline ARIMA model and a Random Forest model.
- Use rolling-origin (walk-forward) evaluation.
- Run on one time series from the Corporación Favorita `train.csv` dataset.

This notebook is *exploratory* and complements the main Streamlit app, which
uses a simpler single hold-out split on the Core Project page.

In [1]:
import pandas as pd # data manipulation like DataFrames and Series similar to SQL tables or Excel sheets
import numpy as np # numerical operations like arrays and matrices

from sklearn.ensemble import RandomForestRegressor # machine learning model for regression tasks
from sklearn.metrics import mean_squared_error, mean_absolute_error # for error metrics

from statsmodels.tsa.arima.model import ARIMA # ARIMA model for time series forecasting

## Load one example series

We load the Kaggle `train.csv` file (Corporación Favorita Store Sales) and select
a single `(store_nbr, family)` pair to create a univariate daily time series.

In [None]:
# Adjust the path if your project structure is different
train_path = r"data\train.csv" # PLEASE, AGAIN!!! USE YOUR OWN PATH TO THE DATA FILE
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
train = pd.read_csv(train_path, parse_dates=["date"])

train = train.sort_values("date")

# Example: choose one store & product family
store_id = 1
family_name = "GROCERY I"

subset = train[(train["store_nbr"] == store_id) & (train["family"] == family_name)].copy()
subset = subset.sort_values("date")

# Build a univariate series indexed by date
y = subset.set_index("date")["sales"].astype(float)
y = y.asfreq("D").fillna(0.0)  # ensure daily frequency, fill any gaps with 0

y.head()

date
2013-01-01       0.0
2013-01-02    2652.0
2013-01-03    2121.0
2013-01-04    2056.0
2013-01-05    2216.0
Freq: D, Name: sales, dtype: float64

## Helper functions

- `mape`: Mean Absolute Percentage Error.
- `make_lag_features`: builds a supervised lagged dataset for Random Forest.

In [3]:
def mape(y_true, y_pred):
    y_true = np.array(y_true, dtype=float)
    y_pred = np.array(y_pred, dtype=float)
    mask = y_true != 0
    if not np.any(mask):
        return np.nan
    return np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100.0


def make_lag_features(series, n_lags=7):
    """Create a DataFrame with y and lag_1..lag_n_lags columns."""
    df = pd.DataFrame({"y": series})
    for lag in range(1, n_lags + 1):
        df[f"lag_{lag}"] = df["y"].shift(lag)
    return df.dropna()

## Rolling-origin backtest

We perform a simple rolling-origin (walk-forward) backtest:

- Use an initial training window (e.g., 365 days).
- At each step, fit ARIMA(1,1,1) on all available data up to that point and
  forecast `horizon` days ahead.
- Fit Random Forest on lagged features and also forecast `horizon` days ahead
  recursively.
- Compute RMSE, MAE, and MAPE for each model and store the errors across
  all splits.

In [4]:
# Backtesting loop with expanding window evaluation
from sklearn.metrics import mean_squared_error, mean_absolute_error # for error metrics which were not imported before

# Backtest configuration
horizon = 7          # forecast 7 days ahead each time
initial_window = 365 # minimum number of days for the first training window
n_lags = 7           # number of lag features for Random Forest

arima_errors = []
rf_errors = []

series = y.copy()
n = len(series)

for split_point in range(initial_window, n - horizon):
    train_series = series.iloc[:split_point]
    test_series = series.iloc[split_point:split_point + horizon]

    # --- ARIMA(1,1,1) ---
    try:
        arima_model = ARIMA(train_series, order=(1, 1, 1))
        arima_fit = arima_model.fit()
        arima_forecast = arima_fit.forecast(steps=horizon)

        arima_errors.append({
            "rmse": np.sqrt(mean_squared_error(test_series, arima_forecast)),
            "mae": mean_absolute_error(test_series, arima_forecast),
            "mape": mape(test_series, arima_forecast),
        })
    except Exception as e:
        print(f"ARIMA failed at split {split_point}: {e}")

    # --- Random Forest with lag features ---
    supervised = make_lag_features(train_series, n_lags=n_lags)
    X_train = supervised.drop(columns=["y"])
    y_train = supervised["y"]

    if len(X_train) == 0:
        continue

    rf = RandomForestRegressor(n_estimators=200, random_state=42)
    rf.fit(X_train, y_train)

    history = list(train_series.values)
    rf_preds = []
    for _ in range(horizon):
        last_vals = history[-n_lags:]
        x = np.array(last_vals).reshape(1, -1)
        y_hat = float(rf.predict(x)[0])
        rf_preds.append(y_hat)
        history.append(y_hat)

    rf_errors.append({
        "rmse": np.sqrt(mean_squared_error(test_series, rf_preds)),
        "mae": mean_absolute_error(test_series, rf_preds),
        "mape": mape(test_series, rf_preds),
    })




## Aggregate results

We aggregate error metrics across all rolling windows to get an overall
comparison between ARIMA(1,1,1) and Random Forest with lag features.

In [5]:
arima_df = pd.DataFrame(arima_errors)
rf_df = pd.DataFrame(rf_errors)

print("ARIMA(1,1,1) Backtest (mean over windows):")
display(arima_df.mean())

print("\nRandom Forest Backtest (mean over windows):")
display(rf_df.mean())

ARIMA(1,1,1) Backtest (mean over windows):


rmse    694.061051
mae     525.834673
mape     31.680736
dtype: float64


Random Forest Backtest (mean over windows):


rmse    744.962851
mae     561.793533
mape     34.367106
dtype: float64

### Backtest Summary

In this notebook, we performed a simple walk-forward backtest using the **`train.csv`** file from the
Kaggle "Store Sales – Time Series Forecasting" competition (Store 1, family = "Grocery I").

- **Series:** daily sales for one (store, family) combination  
- **Horizon:** 7 days ahead per window  
- **Initial training window:** 365 days  
- **Models compared:**  
  > ARIMA(1,1,1) – classical econometric model  
  > Random Forest Regressor – machine learning model with lag features

The metrics printed above (RMSE, MAE, MAPE) are **averaged over all rolling windows**.

In our experiment, **ARIMA(1,1,1) achieved slightly lower error than Random Forest**, which is reasonable
for a noisy univariate retail series without extra regressors (promotions, holidays, prices, etc.).
This backtest is only a **sanity check** and a **small-scale laboratory** to mirror what the Streamlit app
does interactively on other datasets (Airline Passengers, Logan housing, yfinance series, etc.).


### Interpretation

- ARIMA(1,1,1) is a simple classical baseline.
- Random Forest uses 7 lag features and a recursive strategy to forecast `horizon` steps.
- Metrics are averaged across multiple rolling windows.
- This notebook is exploratory; the production Streamlit app exposes a simpler
  train/test split and model configuration on the Core Project page.