
## Intro

In this post, I replicate  _Ledoit & Wolfs's Honey: I Shrunk the Covariance Matrix_ 2003 paper on modern data (S&P from 2005-2022), demonstrating how shrinking the covariance matrix in portfolio optimization increases the realized information ratio & decreases tracking error in active portfolio management.
 
**Corey Hoffstein Tweet picture**

To understand the methodology, I looked at selected chapters of Grinold & Kahn's _Active Portfolio Management_ (1972). In this paper, Ledoit & Wolf generate monthly portfolios of different sizes on US stocks from 1983 with randomized excess return forecasts, using a shrunk covariance matrix and a sample covariance matrix, then plot the realized information ratio over different runs. 

**Paper Boxplots**.

The portfolios generated were _active portfolios_. In active management, the manager uses information to forecasts excess returns. The goal is to deviate from the benchmark weights to harvest excess returns to beat the benchmark index, but not too far that risk is a problem.

## Portfolio Optimization

Ledoit & Wolf conduct the study as follows. At the beginning of the month, they form a value-weighted index of $N$ largest stocks. They feed benchmark weights $W_B$, alphas $\hat{\alpha}$, the covariance matrix $\hat{\Sigma}$ of the last $T=60$ monthly returns, a gain $g$, and an upper bound $c$ into a quadratic optimizer.  

This produces an (active) weight vector $\textbf{x}$. Excess returns are computed as $\textbf{x}^T\textbf{y}$. Over the months, they compute the (annualized) ex post informatio ratio. The expected excess returns $\hat{\alpha}$ are random, so they repeat the experiment 50 times for any $N$. They then plot IR statistics. The optimization problem is:

\begin{align*}
\text{Minimize:} \quad & \textbf{x}^T \Sigma \textbf{x} \\
\text{such that:} \quad & \textbf{x}^T \alpha \geq g \\
& \textbf{x}^T \mathbf{1} = 0 \\
& \textbf{x} \geq -\textbf{w}_B \\
& \textbf{x} \leq c\mathbf{1} - \textbf{w}_B
\end{align*}

The main body of the code is as such:

**opt code**

In this context, alpha is a cross-sectional expected return vector. It's formula comes from Grinold & Kahn: $$\alpha = Vol \cdot IC \cdot score$$ Scores are created from raw forecasts: $raw_{t+1}=e_t + \epsilon$. Hence, $e_t$ is the realized excess return. Random noise is added from a standard normal. $raw$ is then z-scored (cross-sectionally) to get scores. $Vol$ is rolling historical vol of excess returns. From the Fundamental Law: $$IR \approx IC \cdot \sqrt{breadth}$$

Breadth is the annualized number of bets, or $12\cdot N$. The realized _ex-ante_ informatio ratio is fixed at 1.5, then we back out IC: the ex-post correlation between alphas and realized returns. The better an alpha historically predicts excess returns, the more weight it has. 

Other constraints include the portfolio being long only ($\textbf{x} \geq -\textbf{w}_B$), the active weights summing to ($\textbf{x}^T \mathbf{1} = 0$). 

**alpha code**

Just like the paper, we run the simulations 50 times each for $N=20, 100, 225, 400$, one using sample covariance, the other using the Ledoit-Wolf estimator and plot our IR statistics.

**IR boxplot**

Our replication was not exact, as I didn't have access to historical market cap data & historical S&P constituents - only historical prices. The minimum gain $g$ tells us our weights must be sensitive to alphas by this amount.

**top n code**

 To make do, we took 400 S&P 'survivors' from 2005 to 2022, the current S&P benchmark weights, and normalized them to sum to 1, then computed a 'custom' S&P index.



## Sample vs Shrunk Covariance

The formula for shrinkage is $$\hat{\Sigma}_{\text{Shrink}} = \delta^* F + (1 - \delta^*) S$$

$S$ is sample covariance, $F$ is a structured estimator: the sample constant correlation matrix. We take a convex combination between $S$ and $F$ weighted by $\delta$.

From our understanding, when  $P \gg N$, with a large number of stocks with a small rolling window, $\hat{\Sigma}$ is singular (proof skipped). This gives problems in the optimizer: the weight vectors $\textbf{x}$ produced deviate from the alphas. To empirically verify this,  cosine similarity of $\alpha$ with $\textbf{x}$ over time shows this:

**cosine plots**. 

Shrinkage aligns the weights better with the alphas. In addition, using sample cov leads to more portfolio turnover. If we use $\sum_i |\textbf{x}_t - \textbf{x}_{t-1}|$ as turnover and plot it over time, we see the same effect.

# Conclusion

In conclusion, this post replicates Ledoit & Wolf's _Honey I Shrunk the Covariance Matrix_ 2003 paper. We code up the portfolio optimization procedures on data from 2005-2022 and replicate the ex-ante information ratio boxplots to understand why shrinkage leads to better IRs. We then verify that the sample weights deviate further from the alphas and have higher turnover than shrunk weights.

While the paper is focused on risk, my next paper project aims to be one covering expected returns and factor investing. For a first paper in equities, I think this one was a good place to start.