# 02 — Rolling Window Backtesting (ARIMA vs LightGBM)

**Goal:** Evaluate forecasting models with a **walk‑forward (rolling window) backtest**, which better simulates live trading than a single train/test split.

**You will learn:**
- Why walk‑forward is crucial in time‑series quant research.
- How to implement a rolling evaluation for **ARIMA** and **LightGBM**.
- How to summarize and visualize out‑of‑sample performance.

> We use `statsmodels` ARIMA for compatibility in Conda environments.


## 0) Setup & Imports

In [None]:
# If needed, install dependencies (uncomment)
# %pip install yfinance pandas numpy matplotlib scikit-learn lightgbm statsmodels

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error, mean_squared_error
from math import sqrt

import yfinance as yf
from statsmodels.tsa.arima.model import ARIMA
import lightgbm as lgb

plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['axes.grid'] = True


## 1) Why rolling backtesting?

A single train/test split may **overestimate** performance due to leakage or non‑stationarity.
**Walk‑forward** works like this:
1. Start with an initial training window.
2. At time *t*, **train** on data up to *t‑1*, then **predict** *t* (1‑step ahead).
3. Slide the window and repeat, collecting errors over time.


## 2) Download data

In [None]:
TICKER = "XLE"  # Alternatives: "VDE", "FILL"
df = yf.download(TICKER, start="2015-01-01").reset_index()
df = df[["Date", "Close"]].dropna().reset_index(drop=True)
df.head()


## 3) Helpers

In [None]:
def add_time_features(df, ts_col="Date"):
    df = df.copy()
    df[ts_col] = pd.to_datetime(df[ts_col])
    df["year"] = df[ts_col].dt.year
    df["month"] = df[ts_col].dt.month
    df["dayofweek"] = df[ts_col].dt.dayofweek
    return df

def make_lags(df, y_col="Close", lags=(1,5,22)):
    df = df.copy()
    for L in lags:
        df[f"lag_{L}"] = df[y_col].shift(L)
    return df

def rmse(y_true, y_pred):
    return sqrt(((y_true - y_pred) ** 2).mean())

def mape(y_true, y_pred):
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100.0


## 4) Feature engineering

In [None]:
df_feat = add_time_features(df, "Date")
df_feat = make_lags(df_feat, "Close", lags=(1,5,22))
df_feat = df_feat.dropna().reset_index(drop=True)

feature_cols = [c for c in df_feat.columns if c.startswith("lag_")] + ["month","dayofweek","year"]
df_feat.head()


## 5) Rolling split logic

In [None]:
N = len(df_feat)
test_frac = 0.2
start_test = int(N * (1 - test_frac))
print("Total:", N, "| Rolling eval starts at index:", start_test)


## 6) ARIMA order selection (tiny grid)

In [None]:
CANDIDATE_ORDERS = [(1,0,0), (1,1,0), (2,1,0), (1,1,1), (3,1,1), (5,1,0)]

def fit_best_arima(y_train):
    best_aic = np.inf
    best_model = None
    best_order = None
    for order in CANDIDATE_ORDERS:
        try:
            res = ARIMA(y_train, order=order).fit()
            if res.aic < best_aic:
                best_aic = res.aic
                best_model = res
                best_order = order
        except Exception:
            continue
    return best_model, best_order, best_aic


## 7) Walk‑forward loop

In [None]:
y = df_feat["Close"].values
dates = df_feat["Date"].values
X_all = df_feat[feature_cols].values

arima_preds, lgbm_preds, actuals, pred_dates = [], [], [], []

for t in range(start_test, len(df_feat)):
    y_train = y[:t]
    X_train = X_all[:t]
    y_true = y[t]
    x_row = X_all[t].reshape(1, -1)

    pred_dates.append(dates[t])
    actuals.append(y_true)

    # ARIMA
    arima_model, order, aic = fit_best_arima(y_train)
    if arima_model:
        arima_fc = arima_model.forecast(steps=1)[0]
    else:
        arima_fc = y_train[-1]  # naive fallback
    arima_preds.append(arima_fc)

    # LightGBM
    lgbm = lgb.LGBMRegressor(
        n_estimators=600, learning_rate=0.02,
        subsample=0.9, colsample_bytree=0.9, random_state=42
    )
    lgbm.fit(X_train, y_train)
    lgb_fc = lgbm.predict(x_row)[0]
    lgbm_preds.append(lgb_fc)

len(actuals), len(arima_preds), len(lgbm_preds)


## 8) Metrics & plots

In [None]:
actuals = np.array(actuals)
arima_preds = np.array(arima_preds)
lgbm_preds = np.array(lgbm_preds)

print("ARIMA  | RMSE:", rmse(actuals, arima_preds), "MAE:", mean_absolute_error(actuals, arima_preds), "MAPE%:", mape(actuals, arima_preds))
print("LGBM   | RMSE:", rmse(actuals, lgbm_preds), "MAE:", mean_absolute_error(actuals, lgbm_preds), "MAPE%:", mape(actuals, lgbm_preds))

plt.figure()
plt.plot(pred_dates, actuals, label="Actual")
plt.plot(pred_dates, arima_preds, label="ARIMA")
plt.plot(pred_dates, lgbm_preds, label="LightGBM")
plt.title("Walk-forward — Actual vs Predictions"); plt.legend(); plt.show()

plt.figure()
plt.plot(pred_dates, np.abs(actuals - arima_preds), label="|Error| ARIMA")
plt.plot(pred_dates, np.abs(actuals - lgbm_preds), label="|Error| LGBM")
plt.title("Absolute Error over time"); plt.legend(); plt.show()
