<a href="https://colab.research.google.com/github/jeanmhuang/Daily-Quant-Notes/blob/main/2025_09_11_MeanReversionInefficiency.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Detecting a Market Inefficiency: Short‑Horizon Mean Reversion
**Daily Quant Notes — 2025-09-11**

**Hypothesis:** There is short‑horizon mean reversion in equity returns: a negative return today predicts a positive return tomorrow (and vice versa).

**Plan:**
1. Define the inefficiency and formalize the hypothesis.
2. Pull data (SPY by default; optionally a basket of tickers).
3. Build signals (lagged return, rolling z‑score of returns).
4. Test predictive power with OLS and HAC (Newey‑West) standard errors.
5. Run a naïve mean‑reversion strategy and compute summary stats.
6. Do a quick out‑of‑sample (expanding window) check.
7. Summarize results and caveats.


## 1) Setup

In [1]:
%pip install yfinance pandas numpy statsmodels matplotlib pytz --quiet

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
import statsmodels.api as sm
from statsmodels.stats import diagnostic as smd
from statsmodels.stats.stattools import jarque_bera
from datetime import datetime, timedelta, timezone

plt.rcParams['figure.figsize'] = (10, 5)

RISK_FREE_ANNUAL = 0.0   # set to 0 for simplicity or replace with daily T‑bill later
TC_BPS = 5               # round‑trip transaction cost assumption in basis points for the demo
SEED = 42
np.random.seed(SEED)


## 2) Define the inefficiency (formal hypothesis)
Let $r_t$ be the daily close‑to‑close return. **Mean reversion** implies negative autocorrelation at short lags:

- **Null (H0):** $\mathbb{E}[r_{t+1} | r_t] = 0$ (no predictability)
- **Alt (H1):** $\mathbb{E}[r_{t+1} | r_t] = \beta\, r_t$ with $\beta < 0$

We test via OLS: $r_{t+1} = \alpha + \beta r_t + \epsilon_{t+1}$ and use **HAC (Newey‑West)** standard errors.


## 3) Data
By default we load **SPY** (SPDR S&P 500 ETF) daily data. You can switch `TICKER` below or expand to multiple tickers.


In [2]:
TICKER = "SPY"
START = "2005-01-01"
END = None  # None = today

data = yf.download(TICKER, start=START, end=END, auto_adjust=True, progress=False)
assert not data.empty, "No data downloaded. Check ticker or internet access."
px = data['Close'].rename('close') # Select the 'Close' column before renaming
ret = px.pct_change().dropna().rename('r')
px.head() # Print the first few rows of px
ret.head()

TypeError: 'str' object is not callable

## 4) Signals
We construct two simple predictors:
1. **Lag‑1 return**: `r_{t-1}`
2. **Return z‑score (short window)**: standardized return over a short rolling window (e.g., 5 days)


In [None]:
W = 5  # rolling window for z‑score
df = pd.DataFrame({'r': ret})
df['r_lag1'] = df['r'].shift(1)
roll_mean = df['r'].rolling(W).mean()
roll_std = df['r'].rolling(W).std(ddof=0)
df['r_z'] = (df['r'] - roll_mean) / roll_std
df['r_z_lag1'] = df['r_z'].shift(1)
df = df.dropna()
df.tail()

## 5) Predictive tests (OLS + Newey‑West HAC)
We regress **next‑day returns** on the **lagged signal** and compute HAC (Newey‑West) standard errors to account for autocorrelation/heteroskedasticity.


In [None]:
def ols_hac(y, X, lags=5):
    X = sm.add_constant(X)
    model = sm.OLS(y, X, missing='drop')
    res = model.fit(cov_type='HAC', cov_kwds={'maxlags': lags})
    return res

# (A) Using raw lag‑1 return as predictor
y = df['r']            # today's return (acts as r_{t})
X = df[['r_lag1']]     # yesterday's return (acts as r_{t-1})
res_a = ols_hac(y=y, X=X, lags=5)
print(res_a.summary())

# (B) Using lagged z‑score as predictor
Xz = df[['r_z_lag1']]
res_b = ols_hac(y=y, X=Xz, lags=5)
print(res_b.summary())


### Diagnostic: scatter & binned means

In [None]:
# Scatter
plt.figure()
plt.scatter(df['r_lag1'], df['r'], s=6, alpha=0.4)
plt.axhline(0, linewidth=1)
plt.axvline(0, linewidth=1)
plt.title("Scatter: r_t vs r_{t+1}")
plt.xlabel("r_t (lag‑1)")
plt.ylabel("r_{t+1}")
plt.show()

# Binned means to visualize relation
bins = pd.qcut(df['r_lag1'], q=20, duplicates='drop')
binned = df.groupby(bins)['r'].mean()
plt.figure()
binned.plot(kind='bar')
plt.title("Binned mean of r_{t+1} by r_t quantiles")
plt.ylabel("Mean r_{t+1}")
plt.tight_layout()
plt.show()


## 6) Naïve mean‑reversion strategy
**Rule:** If yesterday's return was **negative**, go **long** today; if it was **positive**, go **short** today.
- Position $p_t = -\text{sign}(r_{t-1})$
- Strategy return $s_t = p_t \cdot r_t$ minus transaction costs when position changes.


In [None]:
sign = np.sign(df['r_lag1']).replace(0, 0)  # treat zero as flat
position = -sign  # long after down day, short after up day

# Transaction costs: charge TC_BPS on absolute change in position (round‑trip approx)
pos_change = position.diff().abs().fillna(0)
tc = (TC_BPS / 1e4) * pos_change  # cost as fraction of capital
strat_ret_gross = position * df['r']
strat_ret_net = strat_ret_gross - tc

equity = (1 + strat_ret_net).cumprod()
buyhold = (1 + df['r']).cumprod()

cum = pd.DataFrame({
    'Strategy': equity,
    f'Buy&Hold({TICKER})': buyhold
})

# Summary stats
def ann_stats(series, freq=252):
    r = series
    mu = r.mean() * freq
    sig = r.std(ddof=0) * np.sqrt(freq)
    sharpe = mu / sig if sig != 0 else np.nan
    dd = (1 + r).cumprod()
    peak = dd.cummax()
    mdd = (dd/peak - 1).min()
    return pd.Series({'Ann.Return': mu, 'Ann.Vol': sig, 'Sharpe': sharpe, 'MaxDD': mdd})

summary = pd.DataFrame({
    'Strategy': ann_stats(strat_ret_net),
    f'Buy&Hold({TICKER})': ann_stats(df['r'])
})

display(summary.round(4))

plt.figure()
cum.plot()
plt.title("Cumulative Growth of $1")
plt.ylabel("Growth of $1")
plt.xlabel("Date")
plt.show()


## 7) Quick out‑of‑sample (expanding window) check
We estimate $\beta$ on an expanding window and trade only if $\beta<0$ (mean‑reversion sign) using the **lag‑1 return** predictor.


In [None]:
min_train = 252  # ~1 year
r = df['r'].copy()
rlag = df['r_lag1'].copy()

oos_returns = []
betas = []
idx = df.index

for t in range(min_train, len(df)):
    train_y = r.iloc[:t]
    train_X = sm.add_constant(rlag.iloc[:t])
    res = sm.OLS(train_y, train_X, missing='drop').fit()
    beta = res.params.get('r_lag1', np.nan)
    betas.append(beta)

    # Signal for day t: use r_{t-1}, position = -sign(beta) * sign(r_{t-1}) if beta < 0 else 0
    # (only trade if sign indicates mean reversion)
    rt_1 = rlag.iloc[t]
    pos = -np.sign(beta) * np.sign(rt_1) if np.isfinite(beta) else 0
    # Require beta < 0 to trade
    if beta < 0:
        oos_returns.append(pos * r.iloc[t])
    else:
        oos_returns.append(0.0)

oos = pd.Series(oos_returns, index=idx[min_train:])
oos_equity = (1 + oos).cumprod()
oos_stats = ann_stats(oos)

display(oos_stats.round(4))

plt.figure()
oos_equity.plot()
plt.title("Expanding‑Window OOS Strategy (Mean Reversion Gate)")
plt.ylabel("Growth of $1")
plt.xlabel("Date")
plt.show()


## 8) Conclusion & Caveats
- Check if $\beta$ is **significantly negative** (HAC t‑stat) → evidence of short‑horizon mean reversion.
- Inspect out‑of‑sample performance; in practice, add **transaction cost modeling**, **slippage**, and **execution constraints**.
- Avoid **data‑snooping**: don’t tune on the full sample. Prefer *walk‑forward* validation and holdout periods.
- Consider robustness: different windows, volatility filters, and multiple tickers/universe tests.

**Next ideas:**
- Use **Newey‑West lag selection** based on sampling frequency.
- Add a **volatility or spread filter** (trade only when volatility or bid‑ask spreads are favorable).
- Test across **multiple assets** (e.g., large‑cap constituents) and evaluate cross‑sectional signals.
