<a href="https://colab.research.google.com/github/jeanmhuang/Daily-Quant-Notes/blob/main/Daily_Quant_Notes_2025_09_11_TimeSeries_Analysis_Toolkit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Time‑Series Analysis Toolkit (Equities)
**Daily Quant Notes — 2025-09-11**

This notebook is a **single-stop time‑series toolkit** you can run in Google Colab:
- Pulls market data (default: `SPY`) and computes **log returns**
- Runs **stationarity and dependence diagnostics** (ADF, KPSS, Ljung‑Box, ACF/PACF, ARCH LM)
- Fits **ARIMA** for mean dynamics and **GARCH(1,1)** for conditional volatility
- Performs an **expanding‑window walk‑forward** forecast
- Reports **forecast metrics** and a small **toy trading rule** (for discussion only)

> ⚠️ The toy strategy ignores costs/slippage; use it for diagnostics and interview discussion, not production.


## 0) Setup


In [None]:
# %pip install yfinance pandas numpy matplotlib statsmodels arch --quiet

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
import statsmodels.api as sm
import statsmodels.tsa.api as tsa
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.stats.diagnostic import acorr_ljungbox, het_arch
from arch import arch_model

plt.rcParams['figure.figsize'] = (10, 5)
pd.options.display.float_format = '{:.6f}'.format
SEED = 42
np.random.seed(SEED)

## 1) Parameters

In [None]:
TICKER = "SPY"      # change to any liquid symbol (e.g., 'QQQ', 'AAPL')
START = "2005-01-01"
END = None          # None = through today

print({"TICKER": TICKER, "START": START, "END": END})

## 2) Data Download & Returns

In [None]:
data = yf.download(TICKER, start=START, end=END, auto_adjust=True, progress=False)
assert not data.empty, "No data downloaded; check ticker or internet connection."
px = data['Close'].rename('close')
log_px = np.log(px)
ret = log_px.diff().dropna().rename('r')
ret.describe()

## 3) Exploratory Plots (separate charts)

In [None]:
plt.figure(); px.plot(); plt.title(f"{TICKER} Adjusted Close"); plt.xlabel("Date"); plt.ylabel("Price"); plt.show()

In [None]:
plt.figure(); ret.plot(); plt.title(f"{TICKER} Daily Log Returns"); plt.xlabel("Date"); plt.ylabel("Log return"); plt.show()

In [None]:
plt.figure(); ret.rolling(21).std().plot(); plt.title("21‑Day Rolling Volatility"); plt.xlabel("Date"); plt.ylabel("Vol"); plt.show()

## 4) Stationarity & Dependence Diagnostics

In [None]:
def adf_series(x):
    stat, p, lags, nobs, crit, icbest = adfuller(x, autolag='AIC')
    return pd.Series({"ADF stat": stat, "ADF p-value": p, "lags used": lags, "n obs": nobs})

def kpss_series(x, regression='c'):
    stat, p, lags, crit = kpss(x, regression=regression, nlags='auto')
    return pd.Series({"KPSS stat": stat, "KPSS p-value": p, "lags": lags})

tests = pd.concat([adf_series(ret), kpss_series(ret)], axis=1)
tests.columns = ["ADF", "KPSS"]
tests

In [None]:
# ACF (returns)
plt.figure()
sm.graphics.tsa.plot_acf(ret, lags=40)
plt.title("ACF — Returns")
plt.show()

In [None]:
# PACF (returns)
plt.figure()
sm.graphics.tsa.plot_pacf(ret, lags=40)
plt.title("PACF — Returns")
plt.show()

In [None]:
# Ljung‑Box on returns (5, 10, 20 lags)
lb = acorr_ljungbox(ret, lags=[5, 10, 20], return_df=True)
lb

In [None]:
# ARCH LM test for heteroskedasticity (10 lags)
arch_lm_stat, arch_lm_p, lags, crit = het_arch(ret, nlags=10)
pd.Series({"ARCH LM stat": arch_lm_stat, "p-value": arch_lm_p, "lags": lags})

## 5) Mean Model (ARIMA) — Small AIC Grid

In [None]:
orders = [(0,0,0), (1,0,0), (2,0,0), (1,0,1), (0,0,1)]
results = []
for p,d,q in orders:
    try:
        res = tsa.ARIMA(ret, order=(p,d,q)).fit()
        results.append((res.aic, (p,d,q), res))
    except Exception as e:
        pass

best_aic, best_order, arima_res = sorted(results, key=lambda x: x[0])[0]
print("Best ARIMA order by AIC:", best_order, "AIC:", best_aic)
print(arima_res.summary())

In [None]:
resid = arima_res.resid.dropna()

plt.figure(); resid.plot(); plt.title("ARIMA Residuals"); plt.xlabel("Date"); plt.ylabel("Residual"); plt.show()

plt.figure()
sm.graphics.tsa.plot_acf(resid, lags=40)
plt.title("ACF — ARIMA Residuals")
plt.show()

## 6) Volatility Model (GARCH(1,1))

In [None]:
garch = arch_model(resid, vol='GARCH', p=1, q=1, mean='Zero', dist='normal')
garch_res = garch.fit(disp='off')
print(garch_res.summary())

## 7) Expanding‑Window Walk‑Forward Forecast
Forecast **next‑day return mean** and **volatility** and evaluate on unseen data.

In [None]:
min_train = 252  # ~1 year
idx = ret.index
y = ret.copy()

mean_fc = []
vol_fc = []
y_true = []

for t in range(min_train, len(y)-1):
    # Fit ARIMA on expanding window using previously chosen order
    try:
        ar = tsa.ARIMA(y.iloc[:t], order=best_order).fit()
    except Exception:
        ar = tsa.ARIMA(y.iloc[:t], order=(1,0,0)).fit()

    mf = float(ar.forecast(1))
    resid_t = ar.resid.dropna()

    # Fit GARCH on residuals to forecast vol
    try:
        g = arch_model(resid_t, vol='GARCH', p=1, q=1, mean='Zero', dist='normal').fit(disp='off')
        vf = float(np.sqrt(g.forecast(horizon=1).variance['h.1'].iloc[-1]))
    except Exception:
        vf = float(resid_t.std())

    mean_fc.append(mf); vol_fc.append(vf); y_true.append(y.iloc[t+1])

oos = pd.DataFrame({"y_true": y_true, "mean_fc": mean_fc, "vol_fc": vol_fc}, index=idx[min_train+1:])

# Metrics
mse = ((oos['y_true'] - oos['mean_fc'])**2).mean()
hit = (np.sign(oos['y_true']) == np.sign(oos['mean_fc'])).mean()

# Toy trading rule: long if forecast mean > 0 else short (no costs)
toy_ret = np.sign(oos['mean_fc']) * oos['y_true']
ann_ret = toy_ret.mean() * 252
ann_vol = toy_ret.std(ddof=0) * np.sqrt(252)
toy_sharpe = ann_ret / ann_vol if ann_vol != 0 else np.nan

pd.Series({
    "MSE": mse,
    "Direction Hit-Rate": hit,
    "Toy Ann.Return": ann_ret,
    "Toy Ann.Vol": ann_vol,
    "Toy Sharpe": toy_sharpe
})

In [None]:
plt.figure(); (oos['y_true']).cumsum().plot(); plt.title("Cumulative Sum of True Returns (log)"); plt.xlabel("Date"); plt.ylabel("Cum log returns"); plt.show()

In [None]:
plt.figure(); (oos['mean_fc']).cumsum().plot(); plt.title("Cumulative Sum of Forecasted Mean (log)"); plt.xlabel("Date"); plt.ylabel("Cum forecast"); plt.show()

In [None]:
plt.figure(); oos['vol_fc'].plot(); plt.title("Forecasted Volatility (GARCH)"); plt.xlabel("Date"); plt.ylabel("Vol forecast"); plt.show()

## 8) Notes & Extensions
- **Diagnostics first:** ADF/KPSS can disagree; discuss trend‑stationarity vs. unit root.
- **Autocorrelation:** If Ljung‑Box p-values are low, low-order AR terms may help; else mean ≈ white noise.
- **Volatility clustering:** ARCH LM significance motivates GARCH‑type models.
- **Walk‑forward:** Evaluate only on **unseen** data; avoid tuning on full sample.
- **Toy strategy is illustrative:** add costs/slippage, constraints, and execution modeling for realism.
- **Extensions:** EGARCH/GJR, t‑distribution, regime filters, realized volatility, cross‑asset testing.


## 9) Reproducibility

In [None]:
import sys, platform
print("Python:", sys.version)
print("Platform:", platform.platform())
print("NumPy:", np.__version__)
print("pandas:", pd.__version__)
import statsmodels, arch
print("statsmodels:", statsmodels.__version__)
print("arch:", arch.__version__)
