# Week 3 Seminar — Portfolio Theory, Factor Models & Risk

The lecture told you that alpha is what remains after you subtract all the risks you knowingly took. It showed you that Markowitz's beautiful optimizer goes insane when fed noisy covariance estimates. And it introduced HRP as the tree-based escape hatch.

Now you're going to stress-test those claims yourself. In three exercises, you'll run Fama-French regressions on real stocks and discover how few of them have genuine alpha, separate signal from noise in a covariance matrix using random matrix theory, and pit three portfolio construction methods against each other in a rolling out-of-sample horse race. The results will be humbling for anyone who believes sophistication always wins.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf
from scipy import stats
from scipy.cluster.hierarchy import linkage, fcluster, dendrogram
from scipy.spatial.distance import squareform
import statsmodels.api as sm
import warnings
warnings.filterwarnings('ignore', category=FutureWarning)

plt.rcParams['figure.figsize'] = (10, 5)
plt.rcParams['figure.dpi'] = 100
sns.set_style('whitegrid')

def get_close(data):
    """Extract close prices, handling yfinance MultiIndex."""
    if isinstance(data.columns, pd.MultiIndex):
        return data['Close']
    return data[['Close']]

We need data for all three exercises up front: daily prices for a broad set of stocks and Fama-French factor returns. The stock universe spans sectors deliberately — tech, healthcare, financials, energy, utilities, consumer staples, industrials. Exercise 1 uses 20 of these tickers for factor regressions, Exercise 2 uses a wider set of 100 for eigenvalue analysis, and Exercise 3 uses 40 for the portfolio horse race.

For Fama-French factors, we'll pull directly from Kenneth French's data library using `pandas_datareader`. These are the daily returns of long-short portfolios designed to isolate specific risk premia: market (Mkt-RF), size (SMB), value (HML), profitability (RMW), and investment conservatism (CMA).

In [None]:
# --- Data download: ALL data for ALL exercises ---
import pandas_datareader.data as web

# 20 stocks for Exercise 1 (FF regressions), spanning sectors
tickers_20 = [
    'AAPL', 'MSFT', 'NVDA', 'GOOGL', 'META',   # tech
    'JPM', 'BAC', 'GS', 'MS',                    # financials
    'JNJ', 'PFE', 'UNH', 'ABT',                  # healthcare
    'XOM', 'CVX', 'COP',                          # energy
    'DUK', 'SO', 'NEE',                           # utilities
    'PG'                                           # consumer staples
]

Now we'll define the broader universes and download everything in one pass. The 100-stock universe for Exercise 2 extends across sectors to give us a rich correlation structure — we want to see market factors, sector factors, and noise in the eigenvalue spectrum. The 40-stock universe for Exercise 3 sits in between: enough stocks for meaningful portfolio construction, few enough that the rolling optimization runs in reasonable time.

In [None]:
# Additional tickers for Exercise 2 (100 stocks) and Exercise 3 (40 stocks)
tickers_extra = [
    'AMZN', 'TSLA', 'AMD', 'INTC', 'CRM', 'ADBE', 'ORCL', 'CSCO',
    'NFLX', 'PYPL', 'V', 'MA', 'BRK-B', 'C', 'WFC', 'AXP',
    'MRK', 'LLY', 'TMO', 'ABBV', 'MDT', 'BMY', 'AMGN', 'GILD',
    'SLB', 'EOG', 'PSX', 'VLO', 'MPC', 'OXY',
    'D', 'AEP', 'EXC', 'SRE', 'WEC',
    'KO', 'PEP', 'COST', 'WMT', 'MCD', 'CL', 'MDLZ',
    'CAT', 'HON', 'UNP', 'RTX', 'DE', 'GE', 'MMM', 'LMT',
    'DIS', 'CMCSA', 'T', 'VZ', 'TMUS',
    'HD', 'LOW', 'TGT', 'SBUX', 'NKE',
    'BA', 'FDX', 'UPS', 'DAL', 'LUV',
    'SPG', 'AMT', 'PLD', 'O', 'WELL',
    'GD', 'NOC', 'TXT', 'HII', 'SNA'
]
all_tickers = list(set(tickers_20 + tickers_extra))

With our ticker lists ready, let's pull price data and Fama-French factors. We use five years of daily data — enough history for meaningful regressions but recent enough that the results reflect the current market regime. The date range 2019-2024 captures the pre-COVID calm, the March 2020 crash, the recovery rally, the 2022 rate-hike drawdown, and the 2023-2024 AI-fueled tech surge. That's a lot of regime variation in five years.

In [None]:
start, end = '2019-01-01', '2024-12-31'

raw = yf.download(all_tickers, start=start, end=end, auto_adjust=True)
prices = get_close(raw).dropna(axis=1, thresh=int(0.9 * len(raw)))
returns = prices.pct_change().dropna()

# Fama-French 5 factors (daily)
ff5 = web.DataReader(
    'F-F_Research_Data_5_Factors_2x3_daily', 'famafrench',
    start=start, end=end
)[0] / 100  # convert from percentage to decimal

ff5.index = ff5.index.to_timestamp() if hasattr(ff5.index, 'to_timestamp') else ff5.index
ff5.index = pd.to_datetime(ff5.index)

Let's verify we have clean, aligned data before proceeding. We need the stock returns and factor returns on the same dates — any mismatch here will silently corrupt every regression that follows.

In [None]:
# Align dates between stock returns and FF factors
common_idx = returns.index.intersection(ff5.index)
returns = returns.loc[common_idx]
ff5 = ff5.loc[common_idx]

available_20 = [t for t in tickers_20 if t in returns.columns]
available_40 = [t for t in (tickers_20 + tickers_extra)[:40]
                if t in returns.columns][:40]
available_100 = [t for t in returns.columns][:100]

pd.DataFrame({
    'Trading days': [len(common_idx)],
    'Stocks (Ex 1)': [len(available_20)],
    'Stocks (Ex 2)': [len(available_100)],
    'Stocks (Ex 3)': [len(available_40)],
    'FF factors': [ff5.shape[1]]
}, index=['Universe'])

Good — we should have roughly 1,250 trading days (five years) and the full factor set. The RF column in the Fama-French data is the daily risk-free rate, which we'll subtract from stock returns to get excess returns for the regressions.

---

## Exercise 1: Fama-French Factor Regressions

**The question:** Do individual stocks have alpha, or are their returns just compensation for known risk factors?

When a hedge fund manager boasts about earning 25% last year, the sophisticated response isn't "congratulations" — it's "relative to what?" If NVDA returned 230% in 2023, that sounds spectacular. But NVDA has a market beta around 1.8, loaded heavily on the growth side of HML, and benefited from the AI narrative that drove the momentum factor. Strip all that out, and the question becomes: was there anything left? That residual — alpha — is what separates genuine skill from risk factor exposure that happened to pay off.

For each of 20 stocks spanning tech, financials, healthcare, energy, and utilities, you'll run the Fama-French 5-factor regression:

$$R_i - R_f = \alpha_i + \beta_i^{MKT}(R_m - R_f) + \beta_i^{SMB} \cdot SMB + \beta_i^{HML} \cdot HML + \beta_i^{RMW} \cdot RMW + \beta_i^{CMA} \cdot CMA + \epsilon_i$$

**Your tasks:**
1. Compute excess returns for each stock ($R_i - R_f$)
2. Run OLS regressions against the five Fama-French factors
3. Collect alpha, its t-statistic, R-squared, and all five factor betas
4. Identify which stocks have statistically significant alpha ($|t| > 2$)
5. Visualize the alpha distribution and factor loadings by sector

In [None]:
# YOUR EXPLORATION HERE


---
### ▶ Solution

In [None]:
# Run FF5 regressions for each of the 20 stocks
ff_factors = ff5[['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA']]
rf = ff5['RF']

results = []
for ticker in available_20:
    excess_ret = returns[ticker] - rf
    X = sm.add_constant(ff_factors)
    model = sm.OLS(excess_ret, X, missing='drop').fit()
    results.append({
        'Ticker': ticker,
        'Alpha (ann %)': model.params['const'] * 252 * 100,
        'Alpha t-stat': model.tvalues['const'],
        'R²': model.rsquared,
        'β_MKT': model.params['Mkt-RF'],
        'β_SMB': model.params['SMB'],
        'β_HML': model.params['HML'],
        'β_RMW': model.params['RMW'],
        'β_CMA': model.params['CMA']
    })

ff_df = pd.DataFrame(results).set_index('Ticker')
ff_df.round(3)

Look at that alpha column. Most values are modest — a few percent annualized in either direction — and look at how few t-statistics exceed 2.0 in absolute value. A t-stat below 2 means we cannot reject the null hypothesis that alpha is zero. In plain English: most of these stocks' returns are fully explained by their exposure to known risk factors. The market went up, they had market beta, they went up. No mystery, no skill, no alpha.

Notice the R-squared values. They typically range from 0.3 to 0.7, meaning the five factors explain 30-70% of daily return variation. The remaining 30-70% is idiosyncratic — stock-specific news, earnings surprises, CEO tweets. That idiosyncratic portion is what your ML model will be hunting for in later weeks. But a large chunk of *that* is pure noise. The IC of 0.05 that seemed small? It's 0.05 of whatever signal exists in that leftover slice.

In [None]:
# Visualize: which stocks have significant alpha?
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

colors = ['#e74c3c' if abs(t) > 2 else '#95a5a6'
          for t in ff_df['Alpha t-stat']]
axes[0].barh(ff_df.index, ff_df['Alpha (ann %)'], color=colors)
axes[0].axvline(0, color='black', linewidth=0.8)
axes[0].set_xlabel('Annualized Alpha (%)')
axes[0].set_title('FF5 Alpha by Stock (red = |t| > 2)')

beta_cols = ['β_MKT', 'β_SMB', 'β_HML', 'β_RMW', 'β_CMA']
ff_df[beta_cols].plot(kind='bar', ax=axes[1], width=0.8)
axes[1].set_title('Factor Loadings by Stock')
axes[1].set_ylabel('Beta')
axes[1].legend(fontsize=8)
plt.tight_layout()
plt.show()

The left panel makes the alpha story visual: the red bars (statistically significant at the 5% level) are the rare exceptions, not the rule. The grey bars — the majority — are stocks whose entire return is explained by factor exposure. If you'd built a portfolio of these stocks thinking you were earning alpha, you were actually just harvesting factor premia that you could have gotten more cheaply through an ETF.

The right panel is equally revealing. Look at the factor loadings and notice the sector patterns: tech stocks (AAPL, MSFT, NVDA) have high market betas and negative HML loadings (they're growth stocks, not value). Utilities (DUK, SO, NEE) have low market betas and tend to load positively on RMW — they're profitable, defensive businesses. Energy names (XOM, CVX, COP) load heavily on the value side. These loadings aren't random; they reflect the fundamental economic character of each business. When your ML model predicts that NVDA will outperform, the first thing to ask is: "Is this a genuine insight, or did my model just learn that high-beta growth stocks go up in bull markets?"

In [None]:
# Summary statistics
n_sig = (ff_df['Alpha t-stat'].abs() > 2).sum()
avg_r2 = ff_df['R²'].mean()
median_alpha = ff_df['Alpha (ann %)'].median()

summary = pd.DataFrame({
    'Stocks with |t| > 2': [n_sig],
    'Stocks with |t| < 2': [len(ff_df) - n_sig],
    'Avg R²': [f'{avg_r2:.3f}'],
    'Median Alpha (ann %)': [f'{median_alpha:.2f}']
}, index=['FF5 Results'])
summary

The summary table crystallizes the lesson: out of 20 stocks spanning five sectors, very few — likely two or three at most — show statistically significant alpha over this five-year period. The median alpha is close to zero. And the average R-squared tells us the factors explain a substantial portion of return variation, leaving relatively little room for idiosyncratic signal.

Here's what this means for your ML models going forward. If you build a return prediction model that simply learns to buy high-beta stocks in a bull market, the Fama-French regression will expose it instantly: your "alpha" will vanish once you control for the market factor. The only alpha worth having is the residual that survives after all known factors have taken their cut. That's a much harder target — and it's why an IC of 0.05 is considered genuinely valuable rather than pathetically small.

---

## Exercise 2: Eigenvalue Analysis & the Marchenko-Pastur Boundary

**The question:** How much of your covariance estimate is real market structure versus statistical noise?

The lecture showed that Markowitz's optimizer amplifies estimation error. But how bad is the problem, exactly? Random matrix theory gives us a precise answer. The Marchenko-Pastur distribution describes what the eigenvalue spectrum of a covariance matrix looks like when the data is *purely random* — no correlations, no structure, just noise. Any eigenvalue that falls within the MP bounds is indistinguishable from what you'd get by estimating covariance from random data. Only eigenvalues above the upper MP bound carry genuine signal.

For a matrix of $N$ stocks observed over $T$ days, the theoretical bounds are:

$$\lambda_{\pm} = \sigma^2 \left(1 \pm \sqrt{N/T}\right)^2$$

where $\sigma^2$ is the variance of the noise (typically estimated as the average eigenvalue of the bulk). When $N/T$ is large — many stocks relative to observations — the bounds widen and swallow more eigenvalues into the noise zone.

**Your tasks:**
1. Compute the sample correlation matrix for ~100 stocks
2. Extract eigenvalues and plot their distribution
3. Overlay the Marchenko-Pastur theoretical density
4. Count how many eigenvalues exceed the upper MP bound (these are signal)
5. Vary the N/T ratio — try 50 and 100 stocks with the same time window — and observe how the noise proportion changes

In [None]:
# YOUR EXPLORATION HERE


---
### ▶ Solution

In [None]:
# Compute the sample correlation matrix and its eigenvalues
ret_100 = returns[available_100].dropna(axis=1)
N = ret_100.shape[1]
T = ret_100.shape[0]
q = N / T  # the critical ratio

corr_matrix = ret_100.corr()
eigenvalues = np.linalg.eigvalsh(corr_matrix.values)
eigenvalues = np.sort(eigenvalues)[::-1]  # descending

# Marchenko-Pastur bounds
sigma2 = 1.0  # for a correlation matrix
lambda_plus = sigma2 * (1 + np.sqrt(q))**2
lambda_minus = sigma2 * (1 - np.sqrt(q))**2

n_signal = np.sum(eigenvalues > lambda_plus)
pd.DataFrame({
    'N (stocks)': [N], 'T (days)': [T], 'q = N/T': [f'{q:.3f}'],
    'λ_upper (MP)': [f'{lambda_plus:.3f}'],
    'λ_lower (MP)': [f'{lambda_minus:.3f}'],
    'Signal eigenvalues': [n_signal],
    'Noise eigenvalues': [N - n_signal]
}, index=['Spectrum'])

There's the punchline: out of roughly 100 eigenvalues in the correlation matrix, only a handful exceed the Marchenko-Pastur upper bound. The rest are statistically indistinguishable from what you'd get if the stocks were completely uncorrelated and you just happened to estimate correlations from finite data. Your 100-by-100 matrix has roughly 5,050 unique entries, but the genuine structure lives in maybe 5-10 dimensions.

The largest eigenvalue is always the market factor — it captures the tendency of all stocks to move together. The next few typically correspond to sector factors (tech vs. energy vs. utilities) and style factors (growth vs. value, large vs. small). Everything below the MP boundary is noise that the Markowitz optimizer will happily treat as diversification opportunity. It isn't.

In [None]:
# Plot eigenvalue histogram with MP density overlay
def mp_density(x, q, sigma2=1.0):
    """Marchenko-Pastur probability density."""
    lp = sigma2 * (1 + np.sqrt(q))**2
    lm = sigma2 * (1 - np.sqrt(q))**2
    density = np.zeros_like(x)
    mask = (x >= lm) & (x <= lp)
    density[mask] = (T / (2 * np.pi * sigma2 * N)) * (
        np.sqrt((lp - x[mask]) * (x[mask] - lm)) / x[mask]
    )
    return density

fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(eigenvalues, bins=40, density=True, alpha=0.6,
        color='steelblue', edgecolor='white', label='Sample eigenvalues')
x_mp = np.linspace(0.01, lambda_plus * 1.5, 500)
ax.plot(x_mp, mp_density(x_mp, q), 'r-', lw=2.5,
        label=f'MP density (q={q:.3f})')
ax.axvline(lambda_plus, color='red', ls='--', lw=1.5,
           label=f'λ+ = {lambda_plus:.2f}')
ax.set_xlabel('Eigenvalue')
ax.set_ylabel('Density')
ax.set_title('Eigenvalue Distribution vs. Marchenko-Pastur Bound')
ax.legend()
plt.tight_layout()
plt.show()

The histogram tells the story at a glance. The bulk of eigenvalues clusters within the MP density curve — these are noise. The few eigenvalues to the right of the red dashed line (the upper MP bound) are the signal components. The largest one, far to the right, is the market factor: when the S&P moves, nearly everything moves with it. The next handful are sector and style factors.

Think about what this means for portfolio optimization. The Markowitz optimizer inverts this matrix, which means it places the *most* weight on the *smallest* eigenvalues — the ones that are pure noise. It's fitting to the exact part of the data that contains no information. The signal eigenvalues, which correspond to real economic structure, get the least attention. This is why the optimizer produces wild, concentrated portfolios: it's chasing phantoms in the noise floor.

In [None]:
# Vary N/T: compare 50 stocks vs. 100 stocks with same T
results_nt = []
for n_stocks in [30, 50, 75, N]:
    sub = ret_100.iloc[:, :n_stocks]
    n_i, t_i = sub.shape[1], sub.shape[0]
    q_i = n_i / t_i
    lp_i = (1 + np.sqrt(q_i))**2
    eigs_i = np.linalg.eigvalsh(sub.corr().values)
    n_sig_i = np.sum(eigs_i > lp_i)
    results_nt.append({
        'N': n_i, 'T': t_i, 'q = N/T': round(q_i, 4),
        'λ_upper': round(lp_i, 3),
        'Signal eigs': n_sig_i,
        'Noise %': f'{100 * (n_i - n_sig_i) / n_i:.1f}%'
    })

pd.DataFrame(results_nt).set_index('N')

As $N/T$ increases — more stocks, same observation window — the MP upper bound rises, and the noise zone swallows more eigenvalues. With 30 stocks and ~1,250 days ($q \approx 0.024$), the bound is tight and most eigenvalues may qualify as signal. With 100 stocks ($q \approx 0.08$), roughly 90% or more are noise. This is the fundamental curse of high-dimensional covariance estimation: the more stocks you add, the worse your estimate gets relative to the number of genuine factors.

In practice, institutional portfolios often hold 200-500 stocks. With $N = 500$ and five years of daily data ($T = 1{,}260$), $q \approx 0.4$, and the MP upper bound jumps to about $\lambda_+ \approx 2.87$. Only the very largest eigenvalues — perhaps 10-15 — survive that threshold. The other 485+ are noise. This is precisely why Ledoit-Wolf shrinkage (pulling the small eigenvalues toward a structured target) and HRP (avoiding matrix inversion entirely) are not optional refinements. They're survival strategies.

---

## Exercise 3: HRP vs. Markowitz vs. 1/N

**The question:** Does the fancier approach actually win when it counts — out of sample, with real data, after transaction costs?

DeMiguel, Garlappi & Uppal (2009) dropped a bomb on the portfolio optimization community: naive equal weighting (1/N) beats 14 different optimization methods in most realistic settings. Lopez de Prado responded with HRP — a method that uses the covariance matrix for clustering rather than inversion, avoiding the estimation-error amplification that kills Markowitz. But claims in papers aren't the same as claims in your portfolio. Let's run the horse race on real data.

You'll implement a rolling out-of-sample test: estimate the covariance matrix on the trailing 252 days, construct three portfolios (equal-weight, Markowitz max-Sharpe with Ledoit-Wolf shrinkage, and HRP), hold for 21 trading days, then re-estimate and rebalance. Track returns, Sharpe ratio, maximum drawdown, and turnover for all three.

**Your tasks:**
1. Implement equal-weight, Markowitz (with Ledoit-Wolf), and HRP portfolio construction
2. Run a rolling 252-day estimation / 21-day hold out-of-sample backtest
3. Compute cumulative returns, Sharpe ratio, max drawdown, and turnover
4. Visualize cumulative performance and the drawdown profiles

In [None]:
# YOUR EXPLORATION HERE


---
### ▶ Solution

We'll build each allocation method as a function, then run them through a shared backtesting loop. The key design choice: Markowitz uses Ledoit-Wolf shrinkage (the minimum fix for covariance noise), not the raw sample covariance. This gives Markowitz its best shot — we're not stacking the deck against it.

In [None]:
from sklearn.covariance import LedoitWolf

ret_40 = returns[available_40].dropna(axis=1)
n_assets = ret_40.shape[1]

def equal_weight(cov, n):
    """1/N allocation."""
    return np.ones(n) / n

def markowitz_minvar(cov, n):
    """Minimum-variance portfolio via Ledoit-Wolf shrinkage."""
    inv_cov = np.linalg.pinv(cov)
    ones = np.ones(n)
    w = inv_cov @ ones
    w = w / w.sum()
    w = np.clip(w, 0, None)  # long-only constraint
    w = w / w.sum() if w.sum() > 0 else ones / n
    return w

The Markowitz implementation above uses the minimum-variance formulation with a long-only constraint (negative weights clipped to zero). This is more stable than the tangency portfolio, which requires expected return estimates we don't have. The Ledoit-Wolf shrinkage will be applied to the covariance matrix before it's passed to the optimizer.

Now for HRP — the three-step algorithm that never inverts the covariance matrix. We compute a distance matrix from correlations, run hierarchical clustering, and then recursively bisect the portfolio, allocating risk inversely proportional to each sub-cluster's variance.

In [None]:
def hrp_weights(returns_window):
    """Hierarchical Risk Parity (Lopez de Prado)."""
    cov = returns_window.cov().values
    corr = returns_window.corr().values
    n = cov.shape[0]
    # Step 1: distance matrix from correlations
    dist = np.sqrt(0.5 * (1 - corr))
    np.fill_diagonal(dist, 0)
    condensed = squareform(dist, checks=False)
    link = linkage(condensed, method='ward')
    # Step 2: quasi-diagonalization (leaf order)
    sort_ix = dendrogram(link, no_plot=True)['leaves']
    # Step 3: recursive bisection
    w = np.ones(n)
    cluster_items = [sort_ix]
    while cluster_items:
        cluster_items_next = []
        for items in cluster_items:
            if len(items) <= 1:
                continue
            mid = len(items) // 2
            left, right = items[:mid], items[mid:]
            var_l = cov[np.ix_(left, left)].sum()
            var_r = cov[np.ix_(right, right)].sum()
            alpha_lr = var_r / (var_l + var_r)
            w[left] *= alpha_lr
            w[right] *= (1 - alpha_lr)
            cluster_items_next += [left, right]
        cluster_items = cluster_items_next
    return w / w.sum()

Notice what HRP never does: it never calls `np.linalg.inv()` or `np.linalg.pinv()`. It uses the covariance matrix only to compute distances (step 1) and cluster-level variances (step 3). Both of these operations are robust to noise in the off-diagonal entries because they aggregate rather than invert. That's the fundamental insight: inversion amplifies errors, aggregation averages them out.

Now let's wire up the rolling backtest. We estimate on the trailing 252 days (one year), construct all three portfolios, and hold for 21 trading days (one month) before rebalancing.

In [None]:
lookback = 252
hold = 21
dates = ret_40.index

port_returns = {'1/N': [], 'Markowitz': [], 'HRP': []}
port_weights = {'1/N': [], 'Markowitz': [], 'HRP': []}
port_dates = []

for start_idx in range(lookback, len(dates) - hold, hold):
    train = ret_40.iloc[start_idx - lookback:start_idx]
    test = ret_40.iloc[start_idx:start_idx + hold]
    
    # Ledoit-Wolf covariance
    lw = LedoitWolf().fit(train.values)
    cov_lw = lw.covariance_
    
    w_eq = equal_weight(cov_lw, n_assets)
    w_mv = markowitz_minvar(cov_lw, n_assets)
    w_hrp = hrp_weights(train)
    
    for name, w in [('1/N', w_eq), ('Markowitz', w_mv), ('HRP', w_hrp)]:
        port_returns[name].extend((test.values @ w).tolist())
        port_weights[name].append(w)
    port_dates.extend(test.index.tolist())

The backtest loop generates daily portfolio returns for each method over the entire out-of-sample period. Each month, the three methods see the same trailing data and construct their portfolios. The equal-weight method ignores the data entirely — it always allocates 1/N to each stock. Markowitz uses the Ledoit-Wolf shrunk covariance to find minimum-variance weights. HRP uses hierarchical clustering on the correlation structure.

Let's compute the performance metrics and see who wins.

In [None]:
# Compute performance metrics for each method
perf_df = pd.DataFrame(port_returns, index=port_dates)
cum = (1 + perf_df).cumprod()

def compute_metrics(rets):
    """Compute key portfolio performance metrics."""
    ann_ret = rets.mean() * 252
    ann_vol = rets.std() * np.sqrt(252)
    sharpe = ann_ret / ann_vol if ann_vol > 0 else 0
    cum_r = (1 + rets).cumprod()
    peak = cum_r.cummax()
    dd = (cum_r - peak) / peak
    mdd = dd.min()
    calmar = ann_ret / abs(mdd) if mdd != 0 else 0
    downside = rets[rets < 0].std() * np.sqrt(252)
    sortino = ann_ret / downside if downside > 0 else 0
    return {
        'Ann Return %': ann_ret * 100,
        'Ann Vol %': ann_vol * 100,
        'Sharpe': sharpe, 'Sortino': sortino,
        'Max DD %': mdd * 100, 'Calmar': calmar
    }

metrics = pd.DataFrame({name: compute_metrics(perf_df[name])
                         for name in perf_df.columns}).round(3)
metrics

Study that table carefully. Equal weight is likely competitive with or even superior to Markowitz on risk-adjusted metrics. HRP should sit near the top on Calmar ratio (return per unit of maximum drawdown), because it avoids the concentrated positions that cause Markowitz to blow up during market stress. The Sharpe ratios will be close — probably within 0.1-0.3 of each other — but the drawdown profiles tell very different stories.

Remember: Markowitz is using Ledoit-Wolf shrinkage, which is the *best available fix* for covariance estimation error. This is Markowitz's best case. Without shrinkage, the results would be dramatically worse — the raw sample covariance produces even wilder weights and even deeper drawdowns.

In [None]:
# Cumulative returns and drawdown comparison
fig, axes = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

for name in perf_df.columns:
    axes[0].plot(cum.index, cum[name], label=name, lw=1.5)
axes[0].set_ylabel('Cumulative Return ($1 invested)')
axes[0].set_title('Out-of-Sample Cumulative Returns')
axes[0].legend()

for name in perf_df.columns:
    cum_r = (1 + perf_df[name]).cumprod()
    dd = (cum_r - cum_r.cummax()) / cum_r.cummax()
    axes[1].plot(dd.index, dd * 100, label=name, lw=1.2)
axes[1].set_ylabel('Drawdown (%)')
axes[1].set_title('Drawdown Profile')
axes[1].legend()
plt.tight_layout()
plt.show()

The drawdown panel is where the real differences live. During March 2020 — the COVID crash — watch how deep each method falls. Markowitz, despite its "optimal" construction, often suffers the steepest drawdown because its concentrated positions leave less room for error when correlations spike (and they always spike during crashes — exactly when you need diversification most, just like LTCM discovered in 1998).

HRP's drawdown profile tends to be shallower because the hierarchical clustering respects the natural grouping of assets. Instead of one massive bet computed from an inverted covariance matrix, it distributes risk across clusters. When tech crashes, the utilities cluster acts as a buffer — and HRP allocated to both based on their cluster structure, not based on a covariance estimate that was computed during calm markets.

Now let's add the final dimension: turnover and transaction costs.

In [None]:
# Compute turnover for each method
turnover = {}
for name in ['1/N', 'Markowitz', 'HRP']:
    ws = port_weights[name]
    monthly_turnover = [np.sum(np.abs(ws[i+1] - ws[i]))
                        for i in range(len(ws)-1)]
    turnover[name] = np.mean(monthly_turnover)

cost_bps = 10  # 10 bps round-trip
cost = cost_bps / 10_000

turnover_df = pd.DataFrame({
    'Avg Monthly Turnover': turnover,
    'Annual Turnover %': {k: v * 12 * 100 for k, v in turnover.items()},
    'Annual Cost Drag %': {k: v * 12 * cost * 100
                           for k, v in turnover.items()}
}).round(3)
turnover_df

Turnover is the silent killer that separates paper returns from real returns. Equal weight has zero or near-zero turnover at rebalancing (it always returns to the same weights). Markowitz typically has the highest turnover — each month, the optimizer looks at a slightly different covariance matrix and suggests dramatically different weights, even though the underlying market structure hasn't changed much. That instability isn't just aesthetically ugly; it costs real money on every trade.

At 10 bps round-trip (a reasonable assumption for liquid large-cap US equities), the annual cost drag may seem modest in percentage terms. But relative to the return differences between methods — which are often just 1-3% per year — even a 0.5% annual cost drag can flip the ranking. The "most sophisticated" approach doesn't just sometimes lose to the simplest one; it often loses *because* of its sophistication, since the constant re-optimization generates unnecessary trading.

Harry Markowitz himself, when asked how he invested his own retirement money, admitted he used a simple 50/50 split between stocks and bonds. Not the efficient frontier. Not the tangency portfolio. Just 50/50. The inventor of optimal portfolio construction used the dumbest possible approach for his own wealth. He understood something the optimizer doesn't: estimation error is the real enemy, and simplicity is a hedge against it.

In [None]:
# Net-of-cost Sharpe comparison
net_returns = {}
for name in perf_df.columns:
    monthly_cost = turnover.get(name, 0) * cost
    daily_cost = monthly_cost / hold
    net_returns[name] = perf_df[name] - daily_cost

net_df = pd.DataFrame(net_returns)
net_metrics = pd.DataFrame({
    name: compute_metrics(net_df[name]) for name in net_df.columns
}).round(3)

comparison = pd.concat([
    metrics.loc['Sharpe'].rename('Gross Sharpe'),
    net_metrics.loc['Sharpe'].rename('Net Sharpe')
], axis=1)
comparison

The gross-to-net comparison reveals the final truth of this exercise. Whichever method had the highest gross Sharpe may not have the highest net Sharpe once trading costs enter the picture. Equal weight barely changes — its turnover is near zero, so costs are negligible. Markowitz suffers the largest drop. HRP typically sits in the middle, achieving near-best risk-adjusted performance with moderate turnover.

This is the pattern you'll encounter throughout the course: the most intellectually satisfying approach (optimize everything! invert the covariance matrix! find the tangency portfolio!) often loses to simpler methods in practice. Not because the theory is wrong, but because the *inputs* to the theory — expected returns, covariance estimates — are noisy, and the optimization machinery amplifies that noise. HRP represents a middle path: it uses the structure in the data (correlations cluster into economic sectors) without demanding the precision that inversion requires.

---

## Key Takeaways

- **Most stocks have no alpha.** Fama-French factor regressions explained the majority of return variation for our 20-stock panel. The few stocks with significant alpha were the exceptions, not the rule — and even those might be artifacts of the specific time period. If your ML model just learns to buy high-beta growth stocks, the FF regression will expose it instantly.

- **Your covariance matrix is mostly noise.** For 100 stocks and five years of daily data, only about 5-10 eigenvalues carried genuine signal. The remaining 90+ were indistinguishable from what random data would produce. The Marchenko-Pastur law gives you a precise, theoretically grounded way to draw the line between signal and noise — use it.

- **Sophistication can lose to simplicity.** In the rolling out-of-sample horse race, equal weight was competitive with (and sometimes beat) Markowitz optimization, especially after transaction costs. HRP offered the best combination of risk-adjusted returns, reasonable drawdowns, and moderate turnover. The DeMiguel et al. (2009) result held up: 1/N is a surprisingly tough benchmark.

- **Turnover is the hidden cost of optimization.** Markowitz portfolios change dramatically each month as the covariance estimate shifts, generating trading costs that erode returns. Simple methods have low turnover by construction. When evaluating any portfolio strategy, always report net-of-cost performance — gross Sharpe ratios are a fiction.

In the homework, you'll scale these ideas to 100 stocks, add Fama-MacBeth cross-sectional regressions to estimate factor risk premia, and generate full QuantStats tear sheets for the three portfolio methods. The patterns you saw here — estimation noise, the turnover tax, the stubborn competitiveness of equal weight — will only become more pronounced at scale.