# Summary of Statistical Arbitrage in the US Equities Market (2008) by Avellaneda and Lee
This notebook summarizes the paper Statistical Arbitrage in the US Equities Market by Avellaneda and Lee, deriving key equations, clarifying omitted intuitions, and highlighting potential ambiguities.

## A. Introduction
Statistical arbitrage focuses on the idiosyncratic part of the returns in the equation:
$$R_i = \sum_{j=1}^m \beta_{ij} F_j + \tilde R_i$$
where
- $R_i$ is the return of asset $i$,
- $\beta_{ij}$ is the sensitivity of asset $i$ to factor $j$,
- $F_j$ is the return of factor $j$, and
- $\tilde R_i$ is the idiosyncratic return of asset $i$.

A trading portfolio is **factor-j-neutral** if it has no net exposure to the factor $j$, i.e., the sum of the factor $j$ sensitivities is zero:
$$ \bar{\beta}_j = \sum_{i=1}^N \beta_{ij} Q_i = 0$$
where
- $\bar{\beta}_j$ is the portfolio beta to factor $j$,
- $Q_i$ is the dollar amount invested in asset $i$.

A market-neutral portfolio has vanishing portfolio betas, since it is uncorrelated with the systematic factors. The portfolio returns satisfy:
$$\sum_{i=1}^N Q_i R_i = \sum_{i=1}^N Q_i \left[\sum_{j=1}^m \beta_{ij} F_j \right] + \sum_{i=1}^N Q_i \tilde R_i $$
$$ = \sum_{j=1}^m \left[\sum_{i=1}^N Q_i \beta_{ij}\right] F_j + \sum_{i=1}^N Q_i \tilde R_i$$
$$ \text{If the portfolio is factor-j-neutral, then } \sum_{i=1}^N Q_i \beta_{ij} = 0, \text{ and so}$$
$$ = 0 + \sum_{i=1}^N Q_i \tilde R_i$$
Thus the market neutral portfolio is affected only by the idiosyncratic returns.

## B. Interpretation of Eigenvectors and Eigenportfolios

**1. Definitions**
Suppose we apply PCA to the correlation matrix of the returns and obtain eigenvectors **$v^{(j)}\in \mathbb{R}^N$** and eigenvalues **$\lambda_j$** sorted so that **$N \geq  \lambda_1 \geq \lambda_2\geq \dots\geq \lambda_m\geq \dots\geq \lambda_N \geq  0$**, where **$m$** is the number of factors we keep.

- **Eigenportfolio weights:**
$$
Q_i^{(j)} = \frac{v_i^{(j)}}{\bar{\sigma_i}}
$$
- **Eigenportfolio returns:**
$$
F_{jk} = \sum_{i=1}^N Q_i^{(j)}\,R_{ik}
= \sum_{i=1}^N \frac{v_i^{(j)}}{\bar{\sigma_i}}\,R_{ik}
$$
where **$F_{jk}$** is the return of the eigenportfolio **$j$** at time **$k$**.
- **Orthogonality -> zero correlation of factor returns:**
$$
v^{(j)\,T}v^{(j')}=0
\quad\Longrightarrow\quad
\mathrm{Cov}(F_j,F_{j'})=0,\; j\neq j'.
$$

---

**2. Interpretation**

- **Scaling intuition:**
  By dividing the eigenvector, **$v_i^{(j)}$** by **$\bar\sigma_i$**, we ensure dollar risk $\propto$ 1/vol.  This tilts the portfolio toward lower‐volatility (often larger‐cap) stocks—i.e. a form of cap‐weighting.

- **PC 1 (“market”):**
  Since almost all **$v_i^{(1)}>0$**, after $1/\bar\sigma_i$ scaling this becomes a market‐tilt portfolio: long broadly across the universe, cap‐tilted.

- **PC 2 (“sector”):**
  Because **$v^{(2)}$** is orthogonal to **$v^{(1)}$**, it has mixed signs.  If we reorder
  $$
    v^{(2)}=(v_1^{(2)},v_2^{(2)},\dots,v_N^{(2)})
    \;\to\;
    v_{\mathrm{reordered}}^{(2)}=(v_{i_1}^{(2)},\dots,v_{i_N}^{(2)})
  $$
  with **$v_{i_1}^{(2)}\ge v_{i_2}^{(2)}\ge\dots\ge v_{i_N}^{(2)}$**, stocks in the same industry cluster at the top or bottom.  This is called **coherence**, and it means PC2 captures cross‐sectional dispersion—i.e. sector long‐short.

- **Low-rank Approximation:**
  The correlation matrix of stock returns can be approximated by the first **$m$** eigenvectors $$\bar{\rho}_{ij} = \sum_{k=0}^m \lambda_k v_i^{(k)} v_j^{(k)} + \epsilon_{ii}^2 \delta_{ij}$$ where $\delta_{ij}$ is the Kronecker delta, and $\epsilon_{ii}^2$ is the idiosyncratic variance of stock $i$. This approximation captures the main systematic components of the returns, and ensures that $\bar{\rho}_{ii}=1$ for all $i$. Thus $$\epsilon_{ii}^2 = 1 - \sum_{k=0}^m \lambda_k v_i^{(k)} v_i^{(k)}$$

---

**3. Clustering extension**

The top **$k$** eigenvectors form an **$N\times k$** matrix of factor loadings.  Running k-means or hierarchical clustering on its rows groups assets with similar **$k$**-dimensional exposures, yielding sector/style clusters.


## C. Pricing Model
Assume that stock returns satisfy the SDE:
$$ \frac{dS_i(t)}{S_i(t)} = \alpha_i dt + \sum_{j=1}^m \beta_{ij} \frac{dI_j(t)}{I_j(t)} + dX_i(t)$$
where
- $\sum_{j=1}^m \beta_{ij} \frac{dI_j(t)}{I_j(t)}$ is the contribution of the $m$ (systematic) factors to the return of stock $i$,
- $\alpha_i dt+ dX_i(t)$ is the idiosyncratic component,
- $\alpha_i dt$ is the drift of stock $i$, and
- $dX_i(t)$ is the increment of a stationary stochastic process, which are not explained by the factors.

Model assumes:
1. A drift which measures systematic deviations from the sector,
2. A price fluctuation that is mean-reverting to the industry level.

With these assumptions, we introduce the **Ornstein–Uhlenbeck (OU) process**:
$$dX_i(t) = \kappa_i (m_i - X_i(t)) dt + \sigma_i dW_i(t)$$
where
- $\kappa_i > 0$ is the speed of mean reversion,
- $m_i$ is the long-term mean,
- $\sigma_i$ is the volatility, and
- $dW_i(t)$ is a Wiener process.

This process, $X_i$, is stationary, and is AR(1), i.e. $$X_{i}(t) = c + \phi X_{i}(t-1) + \eta_i(t) \quad \text{ and }\quad |\phi| < 1$$
The increment has unconditional and conditional means of
$$E[dX_i(t)] = 0, \quad \text{and} \quad E[dX_i(t) | X_i(s), s \leq t ] = \kappa_i (m_i - X_i(t)) dt$$
The model can be fitted for each stock $i$ by estimating the parameters $\kappa_i$, $m_i$, and $\sigma_i$ using historical data of 60 days of residuals, and they are assumed to vary slowly in relation to the Brownian increments $dW_i(t)$.

## D. OU Estimation
Recall, the continuous-time SDE:
$$ \frac{dS_i(t)}{S_i(t)} = \alpha_i dt + \sum_{j=1}^m \beta_{ij} \frac{dI_j(t)}{I_j(t)} + dX_i(t)$$
Given that we have ETF or PCA factor returns, for each stock $i$, we estimate the regression (with discrete-time variables):

---

**One-ETF Case:**\
$$ R_i(t) = \beta_{i0} + \beta_{ij} R_{ETF j}(t) + \epsilon_i(t), \quad t=1, 2, \ldots, 60 $$
- $R_{ETF j}(t)$ is the return of the ETF or factor $j$ at time $t$

---
**Multiple-PCA-Factors Case:**\
$$ R_i(t) = \beta_{i0} + \sum_{j=1}^m \beta_{ij} F_j(t) + \epsilon_i(t), \quad t=1, 2, \ldots, 60 $$
- $F_j(t) = \sum_{q=1}^N Q_q^{(j)} R_q(t)$ are the eigenportfolio returns for $j$-th eigenportfolio.

---
where $R_i(t)$ is the return of stock $i$ at time $t$, and $\epsilon_i(t)$ is the residual return of stock $i$ at time $t$.



Then define the auxiliary process (cumulative residuals):
$$X_{i}(t) = \sum_{k=1}^t \epsilon_i(k), \quad t=1,2,\ldots,60$$
which is the discretized version of OU process, $X(t)$ in the SDE.

By the properties of OLS, $X_{i}(60) = 0$. To estimate OU parameters, we fit an AR(1) model to the auxiliary process:
$$ X_{i}(t+1) = a_i + b_i X_{i}(t) + \xi_{i}(t+1),\quad t=1,2,\ldots,59$$
where $\xi_{i}(t+1)$ is a white noise process. The parameters $a$ and $b$, and the variance of the noise term $\xi_{i}(t+1)$ are estimated from this OLS. Once these are obtained, we transform them to derive OU parameters. In particular:
- $a_i$ is used to derive the long-term mean $m_i$.
- $b_i$ is used to derive the speed of mean reversion $\kappa_i$.
- $\xi_{i}(t+1)$ is the new shock, assumed to be iid with mean zero and variance $\sigma_i^2$. Used to derive the instantaneous volatility $\sigma_i$ (and the equilibrium volatility $\sigma_{i, eq}$).


---
### Digression: Re-writing the OU SDE under the assumption that the model parameters are constant
Integrating-factor derivation of the OU solution over a finite interval $[t_0, t_0 + \Delta t]$:
1. Start from the OU SDE: $$dX_i(t) = \kappa_i (m_i - X_i(t)) dt + \sigma_i dW_i(t)$$
2. Shift to zero-mean form: Define $Y_i(t) = X_i(t) - m_i$, then the SDE becomes: $$dY_i(t) = dX_i(t) - 0 = \kappa_i(m_i - [Y_i(t) +m_i]) dt + \sigma_i dW_i(t) = -\kappa_i Y_i(t) dt + \sigma_i dW_i(t)$$
3. Multiply by the (deterministic) integrating factor $\mu(t) = e^{\kappa_i t}$: $$d[\mu(t) Y_i(t)] = \mu(t) dY_i(t) + Y_i(t) d\mu(t) = e^{\kappa_i t}(-\kappa_i Y_i(t) dt + \sigma_i dW_i(t)) + Y_i(t) \kappa_i e^{\kappa_i t} dt$$  Thus $$d[\mu(t) Y_i(t)] = d[e^{\kappa_i t} Y_i(t)]  = \sigma_i e^{\kappa_i t} dW_i(t)$$
4. Integrate both sides over the interval $[t_0, t_0 + \Delta t]$: $$ \int_{t_0}^{t_0+\Delta t} d[e^{\kappa_i s} Y_i(s)] = \int_{t_0}^{t_0+\Delta t} \sigma_i e^{\kappa_i s} dW_i(s)$$ $$ \Rightarrow e^{\kappa_i (t_0 + \Delta t)} Y_i(t_0 + \Delta t) - e^{\kappa_i t_0} Y_i(t_0) = \sigma_i \int_{t_0}^{t_0+\Delta t} e^{\kappa_i s} dW_i(s)$$ Re-arranging gives us the solution: $$ Y_i(t_0 + \Delta t) = e^{-\kappa_i \Delta t} Y_i(t_0) + \sigma_i \int_{t_0}^{t_0+\Delta t} e^{-\kappa_i (t_0+\Delta t - s)} dW_i(s)$$
5. Return to $X_i(t)$ and add back the long-term mean $m_i$: $$X_i(t_0 + \Delta t) = e^{-\kappa_i \Delta t} (X_i(t_0) - m_i) + m_i + \sigma_i \int_{t_0}^{t_0+\Delta t} e^{-\kappa_i (t_0+\Delta t - s)} dW_i(s)$$ Hence $$X_i(t_0 + \Delta t) = e^{-\kappa_i \Delta t} X_i(t_0) + (1-e^{-\kappa_i \Delta t}) m_i + \sigma_i \int_{t_0}^{t_0+\Delta t} e^{-\kappa_i (t_0+\Delta t - s)} dW_i(s)$$

---


The exact discrete-time solution of the OU model is (found in above Digression):
$$X_i(t_0 + \Delta t) = e^{-\kappa_i \Delta t} X_i(t_0) + (1-e^{-\kappa_i \Delta t}) m_i + \sigma_i \int_{t_0}^{t_0+\Delta t} e^{-\kappa_i (t_0+\Delta t - s)} dW_i(s)$$
matching the parameters of the AR(1) model:
$$ X_{i}(t+1) = a_i + b_i X_{i}(t) + \xi_{i}(t+1),\quad t=1,2,\ldots,59$$
we get:
$$ a_i = (1-e^{-\kappa_i \Delta t})m_i $$
$$ b_i = e^{-\kappa_i \Delta t} $$
$$Var(\xi_i) = Var\left(\sigma_i \int_{t_0}^{t_0+\Delta t} e^{-\kappa_i (t_0+ \Delta t -s)} dW_i(s)\right) = \sigma_i^2 \int_{t_0}^{t_0+\Delta t} \left( e^{-\kappa_i(t_0+\Delta t -s)} \right)^2 ds = \sigma_i^2 \frac{1-e^{-2\kappa_i \Delta t}}{2 \kappa_i} =\sigma_i^2 \frac{1-b_i^2}{2\kappa_i}$$

Then
$$\kappa_i = -ln(b_i) * 252$$
$$ m_i = \frac{a_i}{1-b_i}$$
$$\sigma_i = \sqrt{\frac{Var(\xi_i) \cdot 2 \kappa_i}{1-b_i^2}} $$
Note, $\sigma_i$ is the instantaneous volatility parameter in the continuous time OU SDE, which controls the magnitude of the white noise shocks $dW_i$, which tells us how large the *infinitesimal* shocks are over an interval $dt$.

We now consider the equilibrium volatility, $\sigma_{i, eq}$, which is the standard deviation of $X_i(t)$ once the OU process has settled into its stationarity, i.e. $X_i(t) \sim \mathcal{N}(m_i, \sigma_{i, eq}^2) \text{ as } t\to \infty$. This is (from OU SDE):
$$\sigma_{i, eq} = \sqrt{Var(X_i(\infty))} = \sqrt{\frac{Var(\xi)}{1-b_i^2}} = \frac{\sigma_i}{\sqrt{2\kappa_i}}$$
which tells us how far, on average, $X_i$ wanders away from its mean $m_i$ in the long run. It's also the condition when we let $\Delta t \to \infty$. Thus in this equilibrium, the probability distribution of the process $X_i(t)$ is:
$$X_i(t) \sim \mathcal{N}(m_i, \frac{\sigma_i^2}{2\kappa_i})$$


Suppose we use an ETF factor. Also suppose we invest in a market-neutral portfolio in which we long $\$1$ in the stock and short $\$\beta_{ij}$ in the $j$-th ETF. Then the expected 1-day return is:

$$  E\left[\frac{dS_i(t)}{S_i(t)} - \beta_{ij} R_{ETF j}(t)\right] = E\left[\alpha_i dt + \beta_{ij} R_{ETF j}(t) + dX_i(t)- \beta_{ij} R_{ETF j}(t)\right] $$

$$ = E[\alpha_i dt + \kappa_i (m_i - X_i(t)) dt + \sigma_i dW_i(t)] $$
$$ = \alpha_i dt + \kappa_i (m_i - X_i(t)) dt $$
Thus if the position of the stationary process, and the cumulative residual, $X_i(t)$, is sufficiently high, we expect a negative return from the long-short position, and vice versa.

The characteristic time-scale for mean reversion, $\tau_i=\frac{1}{\kappa_i}$ is the time it takes for the process to revert to its mean by half. The half-life can be estimated from the fitted parameters of the OU process. If $\kappa_i \gg 1$,  the process reverts quickly to its mean, and the effect of drift is negligible. In the strategy, we filter stocks with fast mean-reversion, i.e. $\tau_i \ll T_1$.

## E. Signal Generation
For an estimation window of 60 days, we set $T_1 = \frac{60}{252}$. This window incorporates at least one earnings cycle for the stocks. Also we select stocks with mean-reversion times, $\tau_i$, less than 1/2 period, i.e. $\tau_i < \frac{T_1}{2} \approx 30$ days:
$$\tau_i = \frac{1}{\kappa_i} < \frac{T_1}{2} = \frac{30}{252}\Rightarrow \kappa_i > \frac{252}{30} = 8.4$$

**Case 1: Mean-Reversion without Drift**\
Suppose the process $X_i(t)$ without the drift term, i.e. $\alpha_i = 0$. Then the conditional expected return over the period $dt$ is:
$$E[dX_i(t) | X_i(s), s \leq t ] =\kappa_i (m_i - X_i(t)) dt $$
The equilibrium variance is:
$$ \sigma_{i, eq} = \frac{\sigma_i}{\sqrt{2\kappa_i}} = \sigma_i \sqrt{\frac{\tau_i}{2}}$$
Define the dimensionless *s-score*:
$$ s_i(t) = \frac{X_i(t) - m_i}{\sigma_{i, eq}} = \frac{X_i(t) - m_i}{\sigma_i \sqrt{\tau_i/2}}$$
It's a measure of how far the process $X_i(t)$ is from its mean, in units of the equilibrium standard deviation. The trading rules are as follows:
- Buy-to-open if $s_i(t) < -\bar{s}_{bo}$
- Sell-to-open if $s_i(t) > +\bar{s}_{so}$
- Close-short position if $s_i(t) < +\bar{s}_{bc}$
- Close-long position if $s_i(t) > -\bar{s}_{sc}$

To keep the trade dollar-neutral, when we enter a trade, e.g. buy-to-open, for every $\$1$ long position in the stock, and short $\$\beta_{i}$ in the sector ETF or in the case of multiple ETFs, we short $\$\sum_{j=1}^m \beta_{ij}$ in the $j$-th ETFs.

**Case 2: Mean-Reversion with Drift**\
Suppose the process $X_i(t)$ with the drift term, i.e. $\alpha_i \neq 0$. The conditional expectation of the residual return over a period of $dt$ becomes:
$$E[dX_i(t) | X_i(s), s \leq t ] = \alpha_i dt + \kappa_i (m_i - X_i(t)) dt = \kappa_i \left(\frac{\alpha_i}{\kappa_i} + m_i - X_i(t)\right) dt$$
$$= \kappa_i \left(\frac{\alpha_i}{\kappa_i} - \sigma_{i, eq} s_i\right) dt$$
Thus for this case, we use the *modified s-score*:
$$ s_{i, mod}(t) = s_i - \frac{\alpha_i}{\kappa_i \sigma_{i, eq}} = s_i - \frac{\alpha_i \tau_i}{\sigma_{i, eq}}$$

## F. PnL and Portfolio Rebalancing
Below is the PnL at time $t$ is:
$$ E_{t+\Delta t} = E_t + E_t r \Delta t + \sum_{i=1}^N Q_{it} R_{it} - \left( \sum_{i=1}^N Q_{it}\right) r \Delta t + \sum_{i=1}^N Q_{it} \frac{D_{it}}{S_{it}} - \sum_{i=1}^N | Q_{i(t+\Delta t)} - Q_{it} | \epsilon$$
where:
- $E_t$ is the portfolio equity at time $t$,
- $r$ is the risk-free rate,
- $Q_{it} = E_{t} \Lambda_t$ is the dollar amount invested in stock $i$ at time $t$, where $\Lambda_t=\frac{\text{Desired Leverage per side}}{\text{Number of Stocks per side}}$ is the fraction of equity invested in any single stock, e.g. if we want long positions to be 2 times the portfolio equity, and short positions to be 2 times the equity, then $\Lambda_{t} = \frac{2}{N}$, where $N$ is the number of stocks in the portfolio.
- $R_{it}$ is the return of stock $i$ at time $t$,
- $\sum_{i=1}^N Q_{it} R_{it}$ is the PnL from the stocks in the portfolio,
- $\left( \sum_{i=1}^N Q_{it}\right) r \Delta t$ is the expected return of the portfolio over the period $\Delta t$,
- $\frac{D_{it}}{S_{it}}$ is the dividend yield of stock $i$ at time $t$,
- $| Q_{i(t+\Delta t)} - Q_{it} | \epsilon$ is the transaction cost of rebalancing the portfolio, where $\epsilon$ is the transaction cost per side.

If we simplify the PnL equation with the assumption $r$=0:
$$ E_{t+\Delta t} = E_t + \sum_{i=1}^N Q_{it} \left(R_{it}+\frac{D_{it}}{S_{it}}\right)  - \sum_{i=1}^N | Q_{i(t+\Delta t)} - Q_{it} | \epsilon$$

**Portfolio Rebalancing**\
There is no continuous rebalancing of the hedge. When the signal is triggered at time $t$, (buy-to-open or sell-to-open), $\$$ $Q_{it} = E_{t} \Lambda_t$, is invested in each stock with the signal. So is dependent on the equity value at time $t$, and the positions (including the hedge) are not continously rebalanced, and are unwound when the closing signal is triggered (close-long or close-short). The transaction cost is incurred only when the position is opened or closed, and not when the position is held.

## G. Volume Incorporated Model
**Calendar Time vs. Trading Time**
- Calendar Time is the time that passes at a constant rate. In this context, each trading day is treated as an equal interval.
- Trading Time speeds up when the market is very active and slows down when it's quiet. Trading volume is a common measure of this activity. A day with 10 million shares traded is a "longer" day in trading time than a day with only 1 million shares traded.

**The Relationship with Volume**\
A specific way to implement trading time is to multiply daily returns by a factor that is inversely proportional to the trading volume.
- Low Volume Day: If a stock moves significantly (a "contrarian price signal") on very low volume, the model gives this signal more weight. The inverse of a small volume is a large number, so the return is magnified. The model's logic is, "Few people are trading, yet the price moved a lot. This might be a real inefficiency or an overreaction that is likely to revert. I should pay more attention to this."
- High Volume Day: If a stock moves significantly on very high volume, the model gives this signal less weight. The inverse of a large volume is a small number, so the return is diminished. The model's logic is, "Everyone is trading, and a new price has been established with high conviction (high volume). This price is more likely to be the 'correct' new price. I should be less willing to bet against this strong move."