# Week 3 Homework: Factor Model & Portfolio Construction

**ML for Quantitative Finance** | Portfolio Theory, Factor Models & Risk

## Your Mission

You're about to build your first complete investment pipeline: from raw data to factor analysis to portfolio construction to performance evaluation. This is the workflow that every quantitative analyst at every quant fund executes daily. The difference between your version and theirs is scale (they have 3,000 stocks and 50+ factors), not methodology.

Your job is to answer a specific question: **does sophisticated portfolio construction add value over naive approaches?** The theory says yes — Markowitz proved it in 1952 and won the Nobel Prize. The empirics are less kind. DeMiguel, Garlappi & Uppal (2009) showed that equal-weighting beats mean-variance optimization for most realistic parameter settings — a paper with over 5,000 citations, essentially saying the most sophisticated technique from 70 years of financial theory loses to the simplest possible approach. Lopez de Prado's Hierarchical Risk Parity (HRP) claims to fix this. You're going to run the horse race yourself, with real data, real factor models, and real transaction costs, and report the honest results.

There's a twist. You'll also run Fama-MacBeth cross-sectional regressions to estimate factor risk premia. This is the regression methodology that Weeks 4-5 will build on — you're getting a preview of the engine that powers cross-sectional alpha research. If the value premium (HML) shows up as insignificant in your results, congratulations: you've just replicated one of the biggest ongoing debates in academic finance with a few lines of code.

When Harry Markowitz — the inventor of mean-variance optimization — was asked how he invested his own retirement money, he admitted he used a simple 50/50 split between stocks and bonds. The inventor of the optimal portfolio used the dumbest possible approach for his own money. By the end of this homework, you'll understand exactly why.

## Deliverables

1. **Data acquisition** — Download daily returns for 100 US stocks (2015-2024) + Fama-French 5 factors + momentum. Align dates, handle missing data, acknowledge survivorship bias.
2. **Fama-MacBeth regressions** — Cross-sectional regressions to estimate factor risk premia. Report average risk premium, t-statistic, and R-squared per factor.
3. **Three portfolios** — Construct equal-weight (1/N), mean-variance with Ledoit-Wolf shrinkage (long-only, max Sharpe), and Hierarchical Risk Parity, all with monthly rebalancing.
4. **Performance evaluation** — Annualized return, Sharpe, Sortino, max drawdown, Calmar, VaR(95%), CVaR(95%) for all three portfolios.
5. **Transaction cost analysis** — Net-of-cost returns at 10 bps round-trip. Which portfolio suffers most?
6. **QuantStats tear sheets** — Full tear sheets for all three portfolios vs. SPY benchmark.

In [None]:
%%capture
!pip install yfinance pandas-datareader PyPortfolioOpt quantstats scipy

We install the necessary libraries quietly. `PyPortfolioOpt` handles Markowitz and HRP optimization, `pandas-datareader` pulls Fama-French factors directly from Kenneth French's data library, and `quantstats` generates the professional tear sheets you'll see at the end. These are the same tools quant desks use — the difference is they wrap them in proprietary infrastructure. The core math is identical.

In [None]:
import numpy as np
import pandas as pd
import yfinance as yf
import matplotlib.pyplot as plt
import seaborn as sns
import quantstats as qs
import warnings
from pandas_datareader import data as pdr
from scipy import stats
from pypfopt import expected_returns, risk_models
from pypfopt import EfficientFrontier, HRPOpt

plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['figure.dpi'] = 100
sns.set_style('whitegrid')
warnings.filterwarnings('ignore', category=FutureWarning)

A single utility function below handles yfinance's MultiIndex quirk. If you download multiple tickers, yfinance returns a MultiIndex with `('Close', 'AAPL')` as column headers. This helper flattens that so you can work with clean ticker-named columns everywhere. Write it once, never think about it again.

In [None]:
def get_close(data):
    """Extract close prices, handling yfinance MultiIndex."""
    if isinstance(data.columns, pd.MultiIndex):
        return data['Close']
    return data[['Close']]

---

## Deliverable 1: Data Acquisition — 100 Stocks + Fama-French Factors (2015-2024)

We need 100 liquid US stocks spanning multiple sectors. The universe below intentionally includes Technology, Healthcare, Financials, Energy, Consumer Discretionary, Consumer Staples, Industrials, Utilities, and Materials. This diversity matters: if you only pick tech stocks, your "diversified" portfolio is just a tech bet with extra steps, and your covariance matrix will have one giant eigenvalue (market/tech) and 99 noise eigenvalues.

A word on survivorship bias before we start: every stock in this list is currently alive and listed. We're picking stocks that survived to 2024. The ones that went bankrupt, got delisted, or merged out of existence between 2015 and 2024 are invisible to us. This inflates historical returns by roughly 1-3% per year, depending on the study. We acknowledge this and move on — fixing it properly requires delisted data from CRSP, which costs more than a university course budget.

In [None]:
# 100 stocks across sectors — large-cap + select mid-cap
TICKERS = [
    # Technology (20)
    'AAPL','MSFT','GOOGL','AMZN','NVDA','META','AVGO','CSCO','ORCL','CRM',
    'ADBE','ACN','TXN','QCOM','INTC','AMD','IBM','INTU','NOW','AMAT',
    # Healthcare (15)
    'JNJ','UNH','PFE','ABBV','MRK','TMO','ABT','DHR','LLY','BMY',
    'AMGN','GILD','MDT','ISRG','SYK',
    # Financials (15)
    'JPM','BAC','WFC','GS','MS','BLK','SCHW','AXP','C','USB',
    'PNC','TFC','CB','MMC','ICE',
]

That's 50 tickers across the three heaviest sectors. Now the remaining 50, spread across Energy, Consumer, Industrials, Utilities, REITs, and Materials. The goal is genuine cross-sector diversity — when the market sells off, utilities and staples tend to hold up better than tech and discretionary. That divergence is the raw material for portfolio optimization.

In [None]:
TICKERS += [
    # Energy (8)
    'XOM','CVX','COP','SLB','EOG','MPC','PSX','VLO',
    # Consumer Discretionary (10)
    'TSLA','HD','MCD','NKE','LOW','SBUX','TJX','BKNG','MAR','YUM',
    # Consumer Staples (8)
    'PG','KO','PEP','COST','WMT','PM','CL','KMB',
    # Industrials (10)
    'HON','UNP','UPS','CAT','DE','RTX','LMT','GE','MMM','FDX',
    # Utilities (6)
    'NEE','DUK','SO','D','AEP','EXC',
    # REITs & Materials (8)
    'AMT','PLD','SPG','CCI','LIN','APD','ECL','SHW',
]

Now let's pull the price data. This is roughly 100 tickers over 10 years of daily data — about 250,000 data points. Your laptop earns its keep on this one. We'll also download SPY separately as our benchmark for the QuantStats tear sheets later.

In [None]:
raw = yf.download(TICKERS + ['SPY'], start='2015-01-01', end='2024-12-31',
                  auto_adjust=True, progress=True)
prices = get_close(raw).dropna(axis=1, how='all')
prices.shape

Some tickers may have been dropped if yfinance couldn't find them or if they have no data for part of the range. That's fine — we'll work with whatever survived the download. The important thing is we have broad sector coverage and enough history for meaningful covariance estimation.

Let's compute daily log returns and handle the alignment. We forward-fill up to 5 days (to handle holidays and brief trading halts) and then drop any remaining NaN columns. A stock that's missing more than 5 consecutive days of data isn't worth the estimation noise it introduces.

In [None]:
prices = prices.ffill(limit=5).dropna(axis=1)

# Separate SPY benchmark
spy_prices = prices.pop('SPY') if 'SPY' in prices.columns else prices.pop(prices.columns[-1])
spy_returns = np.log(spy_prices / spy_prices.shift(1)).dropna()

# Stock returns
returns = np.log(prices / prices.shift(1)).dropna()
returns.shape

Now the Fama-French factors. We pull the 5 factors (Mkt-RF, SMB, HML, RMW, CMA) plus Momentum from Kenneth French's data library via `pandas_datareader`. These are daily factor returns expressed in percentage points, so we divide by 100 to get decimal returns. The risk-free rate (RF) comes along for free — it's the 1-month T-bill rate, which we'll need for excess return calculations.

In [None]:
ff5 = pdr.DataReader('F-F_Research_Data_5_Factors_2x3_daily', 'famafrench',
                     start='2015-01-01', end='2024-12-31')[0]
mom = pdr.DataReader('F-F_Momentum_Factor_daily', 'famafrench',
                     start='2015-01-01', end='2024-12-31')[0]

ff5 = ff5 / 100
mom = mom / 100
factors = ff5.join(mom).dropna()
factors.columns

We have the five Fama-French factors plus Momentum. Let's align the dates between our stock returns and the factor returns — they need to share the same trading calendar. Any date that's missing from either dataset gets dropped.

In [None]:
common_idx = returns.index.intersection(factors.index)
returns = returns.loc[common_idx]
spy_returns = spy_returns.reindex(common_idx).dropna()
factors = factors.loc[common_idx]
common_idx = returns.index.intersection(spy_returns.index)
returns = returns.loc[common_idx]
factors = factors.loc[common_idx]
spy_returns = spy_returns.loc[common_idx]

tickers = returns.columns.tolist()
n_stocks = len(tickers)
f'Universe: {n_stocks} stocks, {len(returns)} trading days ({returns.index[0].date()} to {returns.index[-1].date()})'

That's Deliverable 1 complete. We have a clean universe of stocks with aligned factor data. The survivorship bias is real but documented — every stock here was alive on both the start and end dates. In professional settings, you'd use a point-in-time database like CRSP to include delisted stocks. For our purposes, the bias inflates returns by roughly 1-2% annually but doesn't change the relative ranking of portfolio strategies, which is what we actually care about.

---

## Deliverable 2: Fama-MacBeth Cross-Sectional Regressions

Here's where things get methodologically interesting. The lecture showed time-series regressions: for each stock, regress its returns on the factors over time to estimate betas. Fama-MacBeth flips the axis. For each month, you run a **cross-sectional** regression: regress that month's stock returns on their factor betas (estimated from prior data). The slope coefficients tell you how much the market paid for each unit of factor exposure that month.

Average those monthly coefficients over time, and you get the factor risk premium — the extra return investors earn for bearing each type of risk. If the t-statistic exceeds 2, the premium is statistically significant. The big question for recent data: is the value premium (HML) still alive? Fama and French themselves acknowledged in 2020 that it has weakened significantly. Your regressions will tell you whether it's merely weak or effectively dead.

### Step 1: Estimate rolling factor betas for each stock

For each stock, we need its loading on each factor — estimated from the prior 60 months (approximately 1,260 trading days) of daily data. We use a rolling window so that betas evolve over time, which is critical: a stock's sensitivity to the market, to value, or to momentum isn't constant. Tesla in 2015 had a very different factor profile than Tesla in 2023. These rolling betas become the right-hand-side variables in our monthly cross-sectional regressions.

In [None]:
rf = factors['RF']
factor_names = ['Mkt-RF', 'SMB', 'HML', 'RMW', 'CMA', 'Mom   ']
# Clean whitespace in factor column names
factors.columns = factors.columns.str.strip()
factor_names = [c for c in factors.columns if c != 'RF']

# Excess returns
excess_ret = returns.subtract(rf, axis=0)

We subtract the risk-free rate from each stock's return to get excess returns. This is important: the Fama-French model explains **excess** returns (returns above the risk-free rate), not raw returns. The distinction matters less when rates are near zero (2015-2021) but becomes meaningful when the 1-month T-bill yields 4-5% (2023-2024).

Now let's estimate rolling betas. We'll use a 252-day (1-year) rolling window for computational tractability, running OLS for each stock over each window. The 60-month ideal is computationally expensive at daily frequency for 100 stocks, so we use 252 days as a practical balance.

In [None]:
# YOUR CODE HERE
# Step 1: Choose a rolling window (e.g., 252 days)
# Step 2: For each monthly date, estimate factor betas using OLS
#         on the prior window of excess returns vs. factor returns
# Step 3: Run a cross-sectional regression: this month's returns ~ betas
# Step 4: Store the cross-sectional coefficients (gamma)
# Step 5: Average gammas over time and compute t-statistics
#
# Hint: np.linalg.lstsq can solve OLS for all stocks at once
# Hint: The Fama-MacBeth t-stat = mean(gamma) / (std(gamma) / sqrt(T))

---
## ━━━ SOLUTION: Deliverable 2 ━━━

The student workspace above sketches the approach. Here's the production implementation. We estimate betas using straightforward OLS for each stock over each rolling window, then run the cross-sectional regression month by month.

In [None]:
BETA_WINDOW = 252
monthly_dates = returns.resample('ME').last().index
monthly_dates = monthly_dates[monthly_dates >= returns.index[BETA_WINDOW + 21]]

X_factors = factors[factor_names].values
factor_cols = factors[factor_names].columns.tolist()

def estimate_betas(window_ret, window_factors):
    """OLS betas for all stocks against factors over a window."""
    X = np.column_stack([np.ones(len(window_factors)), window_factors])
    # (X'X)^{-1} X'Y for all stocks at once
    coeffs = np.linalg.lstsq(X, window_ret, rcond=None)[0]
    return coeffs[1:]  # drop intercept; shape (n_factors, n_stocks)

That `np.linalg.lstsq` call solves OLS for all 100 stocks simultaneously — no loop over tickers needed. The result is a (n_factors x n_stocks) matrix of betas for each window. Vectorized linear algebra is the difference between this running in seconds versus minutes. Now let's build the rolling beta estimates and run the cross-sectional regressions.

In [None]:
gamma_list = []  # monthly cross-sectional regression coefficients

for i, dt in enumerate(monthly_dates):
    loc = returns.index.get_loc(dt)
    if loc < BETA_WINDOW + 21:
        continue
    # Betas estimated from prior window
    beta_window = slice(loc - BETA_WINDOW - 21, loc - 21)
    betas = estimate_betas(
        excess_ret.iloc[beta_window].values,
        X_factors[beta_window]
    )  # (n_factors, n_stocks)
    
    # Monthly return for this month
    if i + 1 < len(monthly_dates):
        next_dt = monthly_dates[i + 1]
    else:
        next_dt = returns.index[-1]
    month_ret = returns.loc[dt:next_dt].sum()  # sum of log returns = log return
    
    # Cross-sectional regression: R_i = gamma_0 + sum(gamma_k * beta_ik)
    X_cs = np.column_stack([np.ones(n_stocks), betas.T])
    gamma = np.linalg.lstsq(X_cs, month_ret.values, rcond=None)[0]
    gamma_list.append(gamma)

Each iteration runs one cross-sectional regression: this month's returns regressed on last period's betas. The coefficient on each factor tells you how much the market rewarded that factor exposure this month. The Fama-MacBeth insight is that averaging these monthly coefficients gives you an unbiased estimate of the factor risk premium, with a standard error that accounts for time-series variation.

Let's compute the average risk premia and their t-statistics.

In [None]:
gammas = np.array(gamma_list)
avg_gamma = gammas.mean(axis=0)
std_gamma = gammas.std(axis=0, ddof=1)
t_stats = avg_gamma / (std_gamma / np.sqrt(len(gammas)))

fm_results = pd.DataFrame({
    'Avg Premium (monthly)': avg_gamma,
    'Avg Premium (annual %)': avg_gamma * 12 * 100,
    't-statistic': t_stats,
    'Significant (|t|>2)': np.abs(t_stats) > 2
}, index=['Intercept'] + factor_cols)

fm_results.round(4)

Look at those t-statistics carefully. The market risk premium (Mkt-RF) and momentum should show significance — they're the most robust factors across nearly every time period and dataset. The value premium (HML) is the one to watch: if its t-statistic is below 2 (or even negative), you've just confirmed with your own data what Fama and French acknowledged in 2020 — the value premium has weakened dramatically since the 2008 crisis.

This isn't just an academic curiosity. If you build an ML model that loads heavily on the value factor, you're implicitly betting that the value premium will return. Maybe it will. But the last 15 years of data say otherwise. The profitability factor (RMW) and investment factor (CMA) tell a different story — they've been more stable, which is why some quant funds have shifted from value to quality as their primary signal.

The intercept deserves attention too. If it's significantly positive, there's a component of returns that none of the factors capture — potential alpha in the cross-section that a well-designed ML model might exploit.

---

## Deliverable 3: Build Three Portfolios (1/N, Markowitz, HRP)

This is the main event. We're going to construct three portfolios, rebalance them monthly, and track their performance over the full sample. The three contenders:

1. **Equal-weight (1/N)** — The no-brainer. Put the same dollar amount in every stock. No optimization, no estimation, no parameters. DeMiguel et al. (2009) showed this beats 14 different optimization methods. The bar to clear.

2. **Mean-variance (Markowitz with Ledoit-Wolf)** — The Nobel Prize-winning approach, with the Ledoit-Wolf shrinkage fix to tame the covariance estimation problem. Long-only constraint to prevent the optimizer from going completely insane with short positions.

3. **Hierarchical Risk Parity (HRP)** — Lopez de Prado's tree-based alternative. Clusters correlated stocks, allocates risk top-down through the hierarchy, and never inverts the covariance matrix. Developed while managing $13 billion at Guggenheim Partners.

Deliverable 4 is where it gets interesting — you'll see which approach actually wins when the numbers come in. Spoiler: it's not the one with the fanciest math.

In [None]:
# Monthly rebalancing dates
rebal_dates = returns.resample('ME').last().index
# Need at least 252 days of history for covariance estimation
rebal_dates = rebal_dates[rebal_dates >= returns.index[252]]

# Simple returns for portfolio math (log returns don't aggregate across assets)
simple_returns = prices.pct_change().dropna()
simple_returns = simple_returns.loc[returns.index]

rebal_dates = rebal_dates[rebal_dates <= simple_returns.index[-22]]
len(rebal_dates)

An important subtlety: we switch to simple (arithmetic) returns for portfolio construction. Log returns are additive across time but not across assets — you can't compute a portfolio's log return as the weighted sum of individual log returns. Simple returns, on the other hand, aggregate cleanly across assets: $R_p = \sum_i w_i R_i$. We use log returns for time-series analysis (Deliverable 2) and simple returns for cross-sectional portfolio math (Deliverables 3-5).

Now let's build the three portfolio strategies. We'll store the weights at each rebalancing date, then compute forward returns.

In [None]:
def get_equal_weights(n):
    """1/N equal weights."""
    return np.ones(n) / n

def get_markowitz_weights(hist_returns):
    """Max-Sharpe with Ledoit-Wolf shrinkage, long-only."""
    mu = expected_returns.mean_historical_return(hist_returns, frequency=252)
    S = risk_models.CovarianceShrinkage(hist_returns, frequency=252).ledoit_wolf()
    ef = EfficientFrontier(mu, S, weight_bounds=(0, 0.10))
    try:
        ef.max_sharpe(risk_free_rate=0.02)
        w = ef.clean_weights()
        return np.array([w.get(t, 0) for t in hist_returns.columns])
    except Exception:
        return get_equal_weights(len(hist_returns.columns))

Notice the `weight_bounds=(0, 0.10)` constraint on Markowitz. Without it, the optimizer would cheerfully put 40% of your wealth in a single stock — exactly the instability the lecture warned about. The 10% cap is a practical guardrail used by real funds. Even so, Markowitz will still produce concentrated portfolios relative to equal-weight.

The `try/except` is not laziness — it's production hygiene. The Markowitz optimizer can fail when the covariance matrix is near-singular or when no portfolio achieves a positive Sharpe ratio in the estimation window. When it fails, we fall back to equal-weight rather than crashing the entire backtest. This happens more often than you'd expect, especially during crisis periods.

In [None]:
def get_hrp_weights(hist_returns):
    """Hierarchical Risk Parity weights."""
    try:
        hrp = HRPOpt(returns=hist_returns)
        w = hrp.optimize()
        return np.array([w.get(t, 0) for t in hist_returns.columns])
    except Exception:
        return get_equal_weights(len(hist_returns.columns))

HRP's implementation is deceptively simple: one call to `HRPOpt`. Under the hood, it's doing hierarchical clustering on the correlation matrix, quasi-diagonalizing the covariance matrix so correlated assets are adjacent, then recursively bisecting the tree to allocate risk. The key property: no matrix inversion anywhere. The covariance matrix is only used for distances (clustering) and cluster-level variances (allocation) — operations that are far more robust to estimation noise than inversion.

Now the main backtest loop. For each rebalancing date, we estimate weights from the prior 252 days, hold for one month, and track the portfolio returns.

In [None]:
port_returns = {'EqualWeight': [], 'Markowitz': [], 'HRP': []}
port_weights = {'EqualWeight': [], 'Markowitz': [], 'HRP': []}
port_dates = []

for i in range(len(rebal_dates) - 1):
    dt = rebal_dates[i]
    next_dt = rebal_dates[i + 1]
    loc = simple_returns.index.get_loc(dt)
    hist = prices.iloc[max(0, loc-252):loc+1]
    
    # Forward returns for the holding period
    fwd = simple_returns.loc[dt:next_dt].iloc[1:]  # exclude rebal day
    if len(fwd) == 0:
        continue
    
    w_eq = get_equal_weights(n_stocks)
    w_mv = get_markowitz_weights(hist)
    w_hrp = get_hrp_weights(simple_returns.iloc[max(0,loc-252):loc+1])
    
    for name, w in [('EqualWeight', w_eq), ('Markowitz', w_mv), ('HRP', w_hrp)]:
        port_ret = fwd.values @ w
        port_returns[name].extend(port_ret.tolist())
        port_weights[name].append(w)
    port_dates.extend(fwd.index.tolist())

The backtest loop does something that many tutorials get wrong: it uses **forward returns** from the day after rebalancing. If you compute weights using data through January 31 and then include January 31's return in the portfolio performance, you're using information you wouldn't have had at market close. This look-ahead bias can inflate Sharpe ratios by 0.1-0.3 — enough to make a mediocre strategy look good.

Let's assemble the portfolio return series and take a first look at cumulative performance.

In [None]:
port_df = pd.DataFrame(port_returns, index=port_dates[:len(port_returns['EqualWeight'])])
port_df.index = pd.DatetimeIndex(port_df.index)
port_df = port_df[~port_df.index.duplicated(keep='first')]

# Align SPY benchmark
spy_simple = spy_prices.pct_change().dropna()
spy_aligned = spy_simple.reindex(port_df.index).fillna(0)

# Cumulative returns
cum_ret = (1 + port_df).cumprod()
spy_cum = (1 + spy_aligned).cumprod()

cum_ret.plot(title='Cumulative Returns: Three Portfolio Strategies')
spy_cum.plot(label='SPY', linestyle='--', color='black', alpha=0.5)
plt.legend()
plt.ylabel('Growth of $1')
plt.tight_layout()

This chart tells the story at a glance, but the real insights come from the risk-adjusted metrics. Raw cumulative return is misleading — a strategy that returned 200% but had a 60% drawdown along the way is a very different beast from one that returned 150% with a maximum drawdown of 20%. The first strategy would have gotten its portfolio manager fired during the drawdown, regardless of the eventual recovery. In quantitative finance, surviving the drawdown is the prerequisite for capturing the long-run return.

---

## Deliverable 4: Performance Evaluation — The Full Risk Dashboard

Now we compute the metrics that actually matter. The Sharpe ratio is the headline number — risk-adjusted return, annualized, comparable across strategies. But the Sortino ratio (which only penalizes downside volatility) and the Calmar ratio (annual return divided by maximum drawdown) capture dimensions that Sharpe misses. A strategy with symmetric volatility (equal up and down moves) gets the same Sharpe as one with skewed volatility (small gains, rare large losses). Sortino distinguishes between the two. And the maximum drawdown tells you whether you'd survive the worst period with your job intact.

If the Sharpe for any strategy exceeds 2.5, be suspicious. Jim Simons' Medallion Fund — the best-performing fund in history — runs at about 2.5 after fees. If your homework beats Simons, you have a bug, not a breakthrough.

In [None]:
def compute_metrics(ret_series, rf_annual=0.02):
    """Compute the full risk dashboard for a return series."""
    rf_daily = rf_annual / 252
    excess = ret_series - rf_daily
    ann_ret = ret_series.mean() * 252
    ann_vol = ret_series.std() * np.sqrt(252)
    sharpe = excess.mean() / excess.std() * np.sqrt(252) if excess.std() > 0 else 0
    downside = excess[excess < 0].std() * np.sqrt(252)
    sortino = excess.mean() * 252 / downside if downside > 0 else 0
    cum = (1 + ret_series).cumprod()
    drawdown = cum / cum.cummax() - 1
    mdd = drawdown.min()
    calmar = ann_ret / abs(mdd) if mdd != 0 else 0
    var_95 = np.percentile(ret_series, 5)
    cvar_95 = ret_series[ret_series <= var_95].mean()
    return {'Ann. Return': ann_ret, 'Ann. Vol': ann_vol, 'Sharpe': sharpe,
            'Sortino': sortino, 'Max DD': mdd, 'Calmar': calmar,
            'VaR(95%)': var_95, 'CVaR(95%)': cvar_95}

That function computes eight metrics in one pass. The VaR at 95% says "there's a 5% chance you'll lose more than this amount in a single day." CVaR (Conditional VaR, also called Expected Shortfall) is the more honest metric: it tells you the **average** loss in the worst 5% of days. VaR says the door to the danger zone starts here; CVaR says what's actually behind the door.

CVaR is a coherent risk measure (it satisfies subadditivity, unlike VaR) and is increasingly preferred by regulators. Basel III moved banks toward Expected Shortfall precisely because VaR can hide tail risk — a distribution with a thin tail and a distribution with a fat tail can have the same VaR but very different CVaRs.

In [None]:
metrics = {}
for name in port_df.columns:
    metrics[name] = compute_metrics(port_df[name])
metrics['SPY'] = compute_metrics(spy_aligned)

metrics_df = pd.DataFrame(metrics).T
metrics_df = metrics_df[['Ann. Return', 'Ann. Vol', 'Sharpe', 'Sortino',
                          'Max DD', 'Calmar', 'VaR(95%)', 'CVaR(95%)']]

# Format for readability
styled = metrics_df.style.format({
    'Ann. Return': '{:.2%}', 'Ann. Vol': '{:.2%}', 'Sharpe': '{:.2f}',
    'Sortino': '{:.2f}', 'Max DD': '{:.2%}', 'Calmar': '{:.2f}',
    'VaR(95%)': '{:.4f}', 'CVaR(95%)': '{:.4f}'
})
styled

Study this table carefully — it's the core deliverable of the entire homework. A few things to look for:

**Sharpe ratios.** Equal-weight will likely be competitive with or better than Markowitz. If HRP leads on Sharpe, it's by a narrow margin. The DeMiguel result holds: naive diversification is hard to beat, especially after accounting for estimation error in the optimizer's inputs.

**Maximum drawdown.** This is where the strategies diverge most dramatically. Markowitz typically has the worst drawdown because the optimizer concentrates the portfolio in stocks it believes are "optimal" — and when those stocks sell off together during a crisis, the concentrated portfolio gets crushed. HRP's hierarchical allocation avoids this concentration. Look at the Calmar ratio (return/drawdown) for the true risk-adjusted picture.

**CVaR vs. VaR.** The gap between these two numbers tells you how fat the left tail is. A large gap means that when things go wrong, they go *really* wrong. If Markowitz has a much larger CVaR gap than equal-weight, that's the concentration risk showing up in the tail.

Remember LTCM: they had thousands of positions across dozens of markets and thought they were diversified. During the Russian debt crisis, correlations spiked to nearly 1.0 and the "diversification" evaporated. The maximum drawdown is what happens when your correlation assumptions break down — and they always break down during crises, exactly when you need them most.

---

## Deliverable 5: Transaction Cost Analysis

A strategy that looks great on paper can die on execution. Transaction costs are the silent killer of quantitative strategies. We assume 10 basis points (0.10%) round-trip cost — 5 bps to buy, 5 bps to sell. This is realistic for liquid large-cap US equities. For mid-caps or less liquid names, the real cost is 2-5x higher.

The key insight: transaction costs are proportional to **turnover** — the fraction of the portfolio that changes at each rebalancing. Equal-weight has low turnover (you only trade to rebalance drifted weights). HRP has moderate turnover (the hierarchy changes slowly). Markowitz has the highest turnover because the optimizer suggests large position changes every month as the estimated covariance matrix shifts. Small changes in the covariance estimate produce large changes in "optimal" weights — the same instability that makes Markowitz fragile also makes it expensive.

In [None]:
COST_BPS = 10  # round-trip cost in basis points
cost_rate = COST_BPS / 10_000  # 0.001

turnover = {name: [] for name in port_df.columns}

for name in port_df.columns:
    weights = port_weights[name]
    for i in range(1, len(weights)):
        # Turnover = sum of absolute weight changes
        turn = np.sum(np.abs(weights[i] - weights[i-1]))
        turnover[name].append(turn)

avg_turnover = {name: np.mean(vals) for name, vals in turnover.items()}
ann_turnover = {name: v * 12 for name, v in avg_turnover.items()}

pd.DataFrame({'Monthly Turnover': avg_turnover, 
              'Annual Turnover': ann_turnover}).round(4)

Those turnover numbers reveal the hidden cost of sophistication. Markowitz's annual turnover is likely several times that of equal-weight. If the annual turnover is, say, 300%, and the cost is 10 bps round-trip, that's 3% per year burned on transaction costs alone. For context, the equity risk premium — the extra return you earn for holding stocks instead of T-bills — is roughly 5-7% per year. Spending 3% on transaction costs means you're giving away half your risk premium to market makers.

A strategy that turns over daily at 10 bps round-trip burns 25% per year in costs. That's not a strategy — that's a donation to market makers. The homework's monthly rebalancing is relatively gentle, and even here the costs matter.

In [None]:
# Compute net-of-cost returns
net_returns = {}
for name in port_df.columns:
    gross = port_df[name].copy()
    weights = port_weights[name]
    rebal_idx = [port_df.index[0]]  # first date
    
    # Spread turnover cost across trading days in each month
    net = gross.copy()
    for i in range(len(weights) - 1):
        turn = np.sum(np.abs(weights[min(i+1, len(weights)-1)] - weights[i]))
        tc = turn * cost_rate
        # Deduct cost on first day of new period
        start = rebal_dates[i]
        end = rebal_dates[min(i+1, len(rebal_dates)-1)]
        mask = (net.index > start) & (net.index <= end)
        if mask.sum() > 0:
            first_day = net.index[mask][0]
            net.loc[first_day] -= tc
    net_returns[name] = net

We deduct the full turnover cost on the first trading day of each new period. This is slightly pessimistic — in practice you'd spread the trades over several days — but it's the standard assumption in academic backtests. Better to be conservative: if a strategy survives pessimistic cost assumptions, it'll survive reality.

Now let's see how costs change the picture.

In [None]:
net_df = pd.DataFrame(net_returns)

# Gross vs net metrics comparison
comparison = {}
for name in port_df.columns:
    gross_m = compute_metrics(port_df[name])
    net_m = compute_metrics(net_df[name])
    comparison[f'{name} (Gross)'] = gross_m
    comparison[f'{name} (Net)'] = net_m

comp_df = pd.DataFrame(comparison).T[['Ann. Return', 'Sharpe', 'Sortino', 'Max DD']]
comp_df.style.format({'Ann. Return': '{:.2%}', 'Sharpe': '{:.2f}',
                       'Sortino': '{:.2f}', 'Max DD': '{:.2%}'})

This is the reality check. Compare the Sharpe ratio drop from gross to net for each strategy. Markowitz should suffer the most — its higher turnover translates directly into higher costs. If the net-of-cost Markowitz Sharpe drops below equal-weight's net Sharpe, you've just replicated one of the most important findings in portfolio theory: **the most sophisticated approach loses to the dumbest approach, after costs.**

This pattern repeats throughout quantitative finance. The simple vs. sophisticated thread is the same one from Week 2 (GARCH(1,1) vs. fancier variants): the parsimonious approach wins more often than ML engineers expect. The reason is always the same — estimation error in the complex model's parameters overwhelms the theoretical advantage of the more flexible specification.

Let's visualize the cost erosion for all three strategies.

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(18, 5), sharey=True)
for ax, name in zip(axes, port_df.columns):
    gross_cum = (1 + port_df[name]).cumprod()
    net_cum = (1 + net_df[name]).cumprod()
    ax.plot(gross_cum, label='Gross', linewidth=1.5)
    ax.plot(net_cum, label='Net of Costs', linewidth=1.5, linestyle='--')
    ax.fill_between(gross_cum.index, gross_cum, net_cum, alpha=0.15, color='red')
    ax.set_title(name)
    ax.legend()
    ax.set_ylabel('Growth of $1')
plt.suptitle('Transaction Cost Erosion by Strategy', y=1.02)
plt.tight_layout()

The red-shaded area is money you paid to market makers. Notice how the gap is widest for Markowitz — the optimizer's instability doesn't just produce suboptimal weights, it produces expensive weight changes. Every time the covariance estimate shifts slightly, the optimizer demands a portfolio overhaul. Each overhaul costs money.

This is why every homework from here forward includes transaction costs. A backtest without costs is a fantasy. And at 10 bps round-trip — which is actually generous for most stocks — the difference between a viable strategy and an expensive hobby is often just the turnover rate.

---

## Deliverable 6: QuantStats Tear Sheets

QuantStats generates the same style of performance reports you'd see at a hedge fund or institutional asset manager. The tear sheet includes dozens of metrics, rolling statistics, drawdown analysis, and monthly return heatmaps. It's the standard way to present portfolio performance to someone who allocates capital for a living.

We generate one tear sheet per portfolio, benchmarked against SPY. Pay special attention to the drawdown analysis and the worst monthly returns — these are the periods that test whether you'd actually hold the portfolio through the pain.

In [None]:
# Prepare return series for quantstats (needs simple returns with datetime index)
eq_rets = port_df['EqualWeight'].copy()
eq_rets.index = pd.DatetimeIndex(eq_rets.index)
eq_rets.name = 'EqualWeight'

spy_bench = spy_aligned.copy()
spy_bench.index = pd.DatetimeIndex(spy_bench.index)
spy_bench.name = 'SPY'

QuantStats expects simple return series with proper datetime indices. We prepare each portfolio's returns and the SPY benchmark in the required format. The `qs.reports.html()` function generates a comprehensive HTML report with over 30 metrics and visualizations. For inline display in the notebook, we use `qs.reports.basic()` which shows the key charts directly.

In [None]:
qs.extend_pandas()
qs.reports.basic(eq_rets, benchmark=spy_bench, title='Equal-Weight Portfolio')

The equal-weight tear sheet is your baseline. Everything else gets compared against this. Note the beta to SPY — for an equal-weight portfolio of 100 stocks, it should be close to 1.0 but not exactly 1.0, because equal-weighting gives more relative exposure to smaller stocks (by market cap) than SPY's cap-weighted methodology. This small-cap tilt is a hidden factor exposure — it loads on the SMB (Small Minus Big) factor.

Now the Markowitz portfolio.

In [None]:
mv_rets = port_df['Markowitz'].copy()
mv_rets.index = pd.DatetimeIndex(mv_rets.index)
mv_rets.name = 'Markowitz'

qs.reports.basic(mv_rets, benchmark=spy_bench, title='Markowitz (Ledoit-Wolf) Portfolio')

Compare the drawdown chart between Markowitz and equal-weight. The Markowitz portfolio likely has deeper drawdowns despite potentially higher returns. This is the concentration risk at work — the optimizer loads up on stocks it deems optimal, and when those stocks sell off together (as they tend to do in crises), the portfolio takes a disproportionate hit. At most funds, a 20% drawdown triggers serious conversations. A 30% drawdown gets people fired. The max drawdown isn't just a number — it's a career risk metric.

Finally, HRP.

In [None]:
hrp_rets = port_df['HRP'].copy()
hrp_rets.index = pd.DatetimeIndex(hrp_rets.index)
hrp_rets.name = 'HRP'

qs.reports.basic(hrp_rets, benchmark=spy_bench, title='HRP Portfolio')

HRP's tear sheet typically shows the best Calmar ratio of the three — competitive returns with lower drawdowns. The hierarchical clustering respects the economic structure of the market: banks cluster with banks, tech clusters with tech, and the risk allocation happens both within and between these clusters. When tech sells off, HRP's allocation to the tech cluster shrinks in the next rebalancing, but the overall portfolio isn't crushed because the risk was distributed across the full hierarchy.

Lopez de Prado developed HRP while managing $13 billion at Guggenheim Partners, and this is exactly the scenario he designed it for — allocating across a large, diverse stock universe where the covariance matrix is too noisy for traditional optimization to work reliably.

---

## Summary of Discoveries

Here's what this homework revealed — insights you couldn't have seen without running 100 stocks through the full pipeline:

- **The value premium (HML) has weakened or disappeared in recent data.** Your Fama-MacBeth regressions likely showed a statistically insignificant HML coefficient. From 1927-2007, value stocks outperformed by roughly 5% per year. Since 2007, that premium has essentially vanished. Whether it was arbitraged away or is merely cyclical remains one of the biggest open questions in academic finance. Your ML models should not naively load on the value factor.

- **Equal-weight is shockingly hard to beat.** Despite being the simplest possible approach — no optimization, no estimation, no parameters — 1/N likely matched or beat Markowitz on risk-adjusted metrics. The DeMiguel et al. (2009) result isn't just an academic curiosity; it's a practical reality. You need approximately 250 years of data for mean-variance optimization to reliably beat 1/N for a 25-stock portfolio.

- **HRP wins on the metrics that matter most for survival.** Calmar ratio (return per unit of max drawdown) and drawdown depth are where HRP typically separates from the pack. It doesn't always have the highest return, but it avoids the concentrated bets that produce devastating losses. In practice, surviving the drawdown is the prerequisite for capturing the long-run return.

- **Transaction costs hit the most sophisticated strategy hardest.** Markowitz's instability isn't just a theoretical concern — it translates directly into higher turnover, which translates directly into higher costs. The optimizer's sensitivity to covariance estimation noise means it demands large portfolio changes every month, each of which costs money.

- **CVaR reveals tail risk that VaR hides.** The gap between VaR and CVaR for Markowitz is likely wider than for HRP or equal-weight, indicating fatter left tails. This is concentration risk showing up in the distribution of extreme losses — exactly the phenomenon that destroyed LTCM.

- **The Fundamental Law connects everything.** The IC from your factor model and the breadth from your 100-stock universe determine the upper bound on your portfolio's information ratio. With 100 stocks and monthly rebalancing (BR = 1,200), even an IC of 0.03 gives you an IR of about 1.0 — which is already better than most hedge funds. Breadth, not accuracy, is the multiplier.

These findings set the stage for Week 4, where you'll build your first cross-sectional return prediction model. The Fama-MacBeth methodology you implemented here becomes the evaluation framework. The risk metrics you computed become the KPIs. And HRP becomes the default portfolio construction method — stable, reasonable, and robust to the estimation noise that kills fancier approaches.

### Suggested Reading

- **DeMiguel, Garlappi & Uppal (2009), "Optimal Versus Naive Diversification"** — The paper that proved equal-weighting beats 14 optimization methods. Short, readable, and humbling. Over 5,000 citations. Read it to understand why estimation error is the enemy of optimization.

- **Lopez de Prado, *Advances in Financial Machine Learning*, Chapter 16** — HRP explained by its inventor. The key insight is that hierarchical structure avoids matrix inversion entirely. Dense in places, but the motivation section alone is worth the read.

- **Grinold & Kahn, *Active Portfolio Management*, Chapter 2** — The Fundamental Law of Active Management. Tells you whether your ML model has any chance of making money before you ever run a backtest. If you read one chapter of one book this week, make it this one.