# Week 14 — Capstone Solution: End-to-End ML Trading Strategy

**Course:** ML for Quantitative Finance  
**Status:** SOLUTION — do not distribute to students before deadline

---

**Strategy:** Cross-sectional momentum + mean-reversion + volatility model  
**Model:** XGBoost with expanding-window walk-forward  
**Portfolio:** Long-short quintile with 10 bps/side transaction costs

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

## Part 1: Data Pipeline

In [None]:
TICKERS = [
    'AAPL', 'MSFT', 'AMZN', 'GOOGL', 'META', 'NVDA', 'JPM', 'JNJ', 'V', 'PG',
    'UNH', 'HD', 'MA', 'DIS', 'BAC', 'XOM', 'CSCO', 'PFE', 'COST', 'ABT',
    'PEP', 'AVGO', 'CRM', 'NKE', 'CVX', 'WMT', 'MRK', 'LLY', 'ABBV', 'INTC',
    'T', 'VZ', 'QCOM', 'TXN', 'PM', 'UNP', 'NEE', 'LOW', 'BMY', 'AMGN',
    'MDT', 'HON', 'SBUX', 'GS', 'MS', 'BLK', 'GILD', 'MMC', 'ADP', 'AMT',
    'CME', 'CI', 'LRCX', 'MO', 'MDLZ', 'SO', 'DUK', 'CL', 'ZTS', 'BDX',
    'REGN', 'ITW', 'APD', 'SHW', 'FISV', 'NOC', 'ICE', 'CSX', 'WM', 'FDX',
    'EMR', 'PNC', 'USB', 'NSC', 'CCI', 'D', 'GM', 'F', 'TGT', 'AEP',
]

cache_path = Path('w14_capstone_cache.pkl')
if cache_path.exists():
    raw = pd.read_pickle(cache_path)
else:
    raw = yf.download(TICKERS, start='2010-01-01', end='2024-12-31', progress=True)
    raw.to_pickle(cache_path)

prices = raw['Close'].ffill()
volume = raw['Volume'].ffill()
returns_daily = prices.pct_change()

# Drop tickers with >20% missing
good = prices.isnull().mean() < 0.2
prices = prices.loc[:, good]
volume = volume.loc[:, prices.columns]
returns_daily = returns_daily[prices.columns]

monthly_prices = prices.resample('M').last()
monthly_returns = monthly_prices.pct_change()
monthly_volume = volume.resample('M').mean()

print(f"Universe: {prices.shape[1]} stocks")
print(f"Period: {prices.index[0].date()} to {prices.index[-1].date()}")
print(f"Monthly observations: {len(monthly_prices)}")

## Part 2: Feature Engineering (18 features across 4 categories)

In [None]:
features = {}

# --- Momentum (5 features) ---
features['mom_1m'] = monthly_prices.pct_change(1)
features['mom_3m'] = monthly_prices.pct_change(3)
features['mom_6m'] = monthly_prices.pct_change(6)
features['mom_12m'] = monthly_prices.pct_change(12)
features['mom_12m_skip1'] = monthly_prices.pct_change(12).shift(1)

# --- Reversal (2 features) ---
features['reversal_1m'] = -monthly_prices.pct_change(1)
features['reversal_1w'] = -(prices.pct_change(5)).resample('M').last()

# --- Volatility (4 features) ---
features['vol_20d'] = returns_daily.rolling(20).std().resample('M').last()
features['vol_60d'] = returns_daily.rolling(60).std().resample('M').last()
features['vol_ratio'] = features['vol_20d'] / (features['vol_60d'] + 1e-8)
# Vol-of-vol: rolling std of rolling vol
daily_vol = returns_daily.rolling(20).std()
features['vol_of_vol'] = daily_vol.rolling(60).std().resample('M').last()

# --- Volume (3 features) ---
features['volume_ratio_5_60'] = (volume.rolling(5).mean() / volume.rolling(60).mean()).resample('M').last()
features['dollar_volume'] = (prices * volume).rolling(20).mean().resample('M').last()
features['volume_trend'] = (volume.rolling(5).mean() / volume.rolling(20).mean()).resample('M').last()

# --- Technical (4 features) ---
features['ma_50_ratio'] = (prices / prices.rolling(50).mean()).resample('M').last()
features['ma_200_ratio'] = (prices / prices.rolling(200).mean()).resample('M').last()
# Bollinger band position
bb_mid = prices.rolling(20).mean()
bb_std = prices.rolling(20).std()
features['bb_position'] = ((prices - bb_mid) / (2 * bb_std + 1e-8)).resample('M').last()
# RSI approximation (14-day)
delta = prices.diff()
gain = delta.clip(lower=0).rolling(14).mean()
loss = (-delta.clip(upper=0)).rolling(14).mean()
rs = gain / (loss + 1e-8)
features['rsi_14'] = (100 - 100 / (1 + rs)).resample('M').last()

# Target: next month return
target = monthly_returns.shift(-1)

print(f"Features: {len(features)}")
for name in features:
    print(f"  {name}")

In [None]:
# Build panel dataset
months = sorted(set.intersection(*[set(f.index) for f in features.values()]))
months = [m for m in months if pd.Timestamp('2012-01-01') <= m <= pd.Timestamp('2024-06-30')]

X_all, y_all, dates_all, tickers_all = [], [], [], []
for month in months:
    X_cs = pd.DataFrame({name: feat.loc[month] for name, feat in features.items() if month in feat.index})
    y_cs = target.loc[month] if month in target.index else pd.Series(dtype=float)
    valid = X_cs.dropna().index.intersection(y_cs.dropna().index)
    if len(valid) > 10:
        # Rank-transform features cross-sectionally
        X_ranked = X_cs.loc[valid].rank(pct=True)
        X_all.append(X_ranked)
        y_all.append(y_cs.loc[valid])
        dates_all.extend([month] * len(valid))
        tickers_all.extend(valid.tolist())

X_panel = pd.concat(X_all)
y_panel = pd.concat(y_all)
dates_panel = np.array(dates_all)
tickers_panel = np.array(tickers_all)

print(f"Panel: {len(X_panel)} obs, {X_panel.shape[1]} features, {len(months)} months")

## Part 3: Labeling

Using forward 1-month returns as labels. Justification:
- Monthly rebalancing frequency is standard for cross-sectional equity strategies
- Regression target (continuous returns) rather than classification
- Triple-barrier is better suited for single-stock entry/exit timing, not cross-sectional ranking

In [None]:
# Label statistics
print("Target (next-month return) statistics:")
print(f"  Mean: {y_panel.mean():.4f}")
print(f"  Std: {y_panel.std():.4f}")
print(f"  Skew: {stats.skew(y_panel.dropna()):.2f}")
print(f"  Kurtosis: {stats.kurtosis(y_panel.dropna(), fisher=False):.2f}")
print(f"  % positive: {(y_panel > 0).mean():.0%}")

## Part 4: Model Training (XGBoost with Expanding Window)

In [None]:
# XGBoost parameters (conservative to avoid overfitting)
xgb_params = {
    'max_depth': 4,
    'learning_rate': 0.05,
    'n_estimators': 200,
    'subsample': 0.7,
    'colsample_bytree': 0.7,
    'reg_alpha': 1.0,
    'reg_lambda': 1.0,
    'verbosity': 0,
}

# Check IC on validation set first
val_start = pd.Timestamp('2016-01-31')
val_end = pd.Timestamp('2018-01-31')

train_mask = dates_panel < val_start
val_mask = (dates_panel >= val_start) & (dates_panel < val_end)

model_val = xgb.XGBRegressor(**xgb_params)
model_val.fit(X_panel.values[train_mask], y_panel.values[train_mask])
pred_val = model_val.predict(X_panel.values[val_mask])

# IC per month on validation
val_ics = []
val_dates = dates_panel[val_mask]
for m in np.unique(val_dates):
    m_mask = val_dates == m
    if m_mask.sum() > 5:
        ic = stats.spearmanr(pred_val[m_mask], y_panel.values[val_mask][m_mask])[0]
        val_ics.append(ic)

print(f"Validation IC (2016-2017):")
print(f"  Mean IC: {np.mean(val_ics):.4f}")
print(f"  IC > 0: {np.mean([x > 0 for x in val_ics]):.0%}")
print(f"  IC t-stat: {np.mean(val_ics)/np.std(val_ics)*np.sqrt(len(val_ics)):.2f}")

## Part 5: Walk-Forward Backtest

In [None]:
pred_start = pd.Timestamp('2018-01-31')
pred_months = [m for m in months if m >= pred_start]

portfolio_returns = []
monthly_ics = []
tc_per_side = 10  # bps
tc_total = tc_per_side * 2 / 10000  # both sides

prev_longs, prev_shorts = set(), set()

for month in pred_months:
    train_mask = (dates_panel < month) & (dates_panel >= pd.Timestamp('2012-01-01'))
    test_mask = dates_panel == month

    if test_mask.sum() < 10 or train_mask.sum() < 500:
        continue

    X_tr, y_tr = X_panel.values[train_mask], y_panel.values[train_mask]
    X_te = X_panel.values[test_mask]
    y_te = y_panel.values[test_mask]
    te_tickers = tickers_panel[test_mask]

    # Train fresh model each month (expanding window)
    model = xgb.XGBRegressor(**xgb_params)
    model.fit(X_tr, y_tr)
    pred = model.predict(X_te)

    # IC
    ic = stats.spearmanr(pred, y_te)[0]
    monthly_ics.append({'month': month, 'IC': ic})

    # Long-short quintile portfolio
    pred_series = pd.Series(pred, index=te_tickers)
    n_stocks = len(pred_series) // 5
    if n_stocks < 2:
        continue

    longs = set(pred_series.nlargest(n_stocks).index)
    shorts = set(pred_series.nsmallest(n_stocks).index)

    actual = pd.Series(y_te, index=te_tickers)
    long_ret = actual.loc[list(longs)].mean()
    short_ret = actual.loc[list(shorts)].mean()
    gross_ret = long_ret - short_ret

    # Turnover-adjusted costs
    long_turnover = 1 - len(longs & prev_longs) / max(len(longs), 1)
    short_turnover = 1 - len(shorts & prev_shorts) / max(len(shorts), 1)
    avg_turnover = (long_turnover + short_turnover) / 2
    cost = avg_turnover * tc_total

    net_ret = gross_ret - cost

    portfolio_returns.append({
        'month': month,
        'gross': gross_ret,
        'net': net_ret,
        'turnover': avg_turnover,
        'long_ret': long_ret,
        'short_ret': short_ret,
    })

    prev_longs, prev_shorts = longs, shorts

results = pd.DataFrame(portfolio_returns).set_index('month')
ic_df = pd.DataFrame(monthly_ics).set_index('month')
print(f"Backtest period: {results.index[0].date()} to {results.index[-1].date()}")
print(f"Total months: {len(results)}")

## Part 6: Evaluation

In [None]:
def deflated_sharpe_ratio(sharpe_obs, n_trials, T, skew=0, kurtosis=3):
    euler_mascheroni = 0.5772
    sr0 = np.sqrt(2 * np.log(max(n_trials, 2))) - (np.log(np.pi) + euler_mascheroni) / (2 * np.sqrt(2 * np.log(max(n_trials, 2))))
    se_sr = np.sqrt((1 + 0.5 * sharpe_obs**2 - skew * sharpe_obs +
                     (kurtosis - 3) / 4 * sharpe_obs**2) / T)
    z = (sharpe_obs - sr0) / se_sr
    return stats.norm.cdf(z)


def full_tear_sheet(returns_series, name='Strategy'):
    r = returns_series.dropna()
    cum = (1 + r).cumprod()
    dd = cum / cum.cummax() - 1

    metrics = {
        'Ann. Return': r.mean() * 12,
        'Ann. Volatility': r.std() * np.sqrt(12),
        'Sharpe': r.mean() / r.std() * np.sqrt(12) if r.std() > 0 else np.nan,
        'Sortino': r.mean() / r[r < 0].std() * np.sqrt(12) if (r < 0).sum() > 0 else np.nan,
        'Calmar': (r.mean() * 12) / abs(dd.min()) if dd.min() != 0 else np.nan,
        'Max Drawdown': dd.min(),
        'Hit Rate': (r > 0).mean(),
        'Profit Factor': abs(r[r > 0].sum() / r[r < 0].sum()) if (r < 0).sum() > 0 else np.nan,
        'Skewness': stats.skew(r),
        'Kurtosis': stats.kurtosis(r, fisher=False),
        'VaR 5%': np.percentile(r, 5),
        'CVaR 5%': r[r <= np.percentile(r, 5)].mean(),
        'Tail Ratio': abs(np.percentile(r, 95) / np.percentile(r, 5)) if np.percentile(r, 5) != 0 else np.nan,
    }

    print(f"\n{'='*55}")
    print(f" {name}")
    print(f"{'='*55}")
    for k, v in metrics.items():
        if 'Rate' in k or 'Return' in k or 'Volatility' in k or 'Drawdown' in k or 'VaR' in k or 'CVaR' in k:
            print(f"  {k:<20s}: {v:.1%}")
        else:
            print(f"  {k:<20s}: {v:.3f}")
    return metrics


# Performance
metrics_gross = full_tear_sheet(results['gross'], 'XGBoost L/S (Gross)')
metrics_net = full_tear_sheet(results['net'], 'XGBoost L/S (Net of 10bps/side)')

In [None]:
# Deflated Sharpe Ratio
# Honest accounting: we tried 1 model type (XGBoost), 1 parameter config,
# but experimented with ~3 feature sets during development = ~3 trials
n_trials_honest = 3
r = results['net']
sharpe = r.mean() / r.std() * np.sqrt(12)
dsr = deflated_sharpe_ratio(
    sharpe_obs=sharpe, n_trials=n_trials_honest, T=len(r),
    skew=stats.skew(r), kurtosis=stats.kurtosis(r, fisher=False)
)
print(f"\nDeflated Sharpe Ratio:")
print(f"  Observed Sharpe: {sharpe:.2f}")
print(f"  Trials (honest): {n_trials_honest}")
print(f"  DSR: {dsr:.3f} {'(significant at 95%)' if dsr > 0.95 else '(NOT significant at 95%)'}") 

In [None]:
# IC analysis
ic_vals = ic_df['IC'].values
print(f"\nInformation Coefficient:")
print(f"  Mean IC: {np.mean(ic_vals):.4f}")
print(f"  IC Std: {np.std(ic_vals):.4f}")
print(f"  IC > 0: {np.mean(ic_vals > 0):.0%}")
print(f"  IC t-stat: {np.mean(ic_vals)/np.std(ic_vals)*np.sqrt(len(ic_vals)):.2f}")
print(f"  ICIR (ann.): {np.mean(ic_vals)/np.std(ic_vals)*np.sqrt(12):.2f}")

In [None]:
# Comprehensive plots
fig, axes = plt.subplots(3, 2, figsize=(16, 14))

# 1. Cumulative returns
cum_gross = (1 + results['gross']).cumprod()
cum_net = (1 + results['net']).cumprod()
axes[0, 0].plot(cum_gross, label='Gross', color='steelblue')
axes[0, 0].plot(cum_net, label='Net (10bps/side)', color='salmon')
axes[0, 0].set_title('Cumulative Returns')
axes[0, 0].legend()
axes[0, 0].set_ylabel('Growth of $1')

# 2. Drawdown
dd = cum_net / cum_net.cummax() - 1
axes[0, 1].fill_between(dd.index, dd.values, 0, color='salmon', alpha=0.5)
axes[0, 1].set_title('Drawdown (Net)')
axes[0, 1].set_ylabel('Drawdown')

# 3. Monthly returns
colors = ['steelblue' if x > 0 else 'salmon' for x in results['net']]
axes[1, 0].bar(results.index, results['net'], color=colors, width=20)
axes[1, 0].set_title('Monthly Net Returns')
axes[1, 0].set_ylabel('Return')

# 4. Rolling IC
rolling_ic = ic_df['IC'].rolling(12).mean()
axes[1, 1].plot(rolling_ic, color='steelblue')
axes[1, 1].axhline(0, color='red', linestyle='--', alpha=0.5)
axes[1, 1].set_title('Rolling 12-Month IC')
axes[1, 1].set_ylabel('IC')

# 5. Rolling Sharpe
rolling_sharpe = results['net'].rolling(12).mean() / results['net'].rolling(12).std() * np.sqrt(12)
axes[2, 0].plot(rolling_sharpe, color='steelblue')
axes[2, 0].axhline(0, color='red', linestyle='--', alpha=0.5)
axes[2, 0].set_title('Rolling 12-Month Sharpe')
axes[2, 0].set_ylabel('Sharpe')

# 6. Long vs Short leg
cum_long = (1 + results['long_ret']).cumprod()
cum_short = (1 + results['short_ret']).cumprod()
axes[2, 1].plot(cum_long, label='Long leg', color='steelblue')
axes[2, 1].plot(cum_short, label='Short leg', color='salmon')
axes[2, 1].set_title('Long vs Short Leg')
axes[2, 1].legend()
axes[2, 1].set_ylabel('Growth of $1')

plt.tight_layout()
plt.show()

In [None]:
# Compare to SPY buy-and-hold
spy_cache = Path('w14_spy_cache.pkl')
if spy_cache.exists():
    spy = pd.read_pickle(spy_cache)
else:
    spy = yf.download('SPY', start='2010-01-01', end='2024-12-31')['Close']
    spy.to_pickle(spy_cache)

spy_monthly = spy.resample('M').last().pct_change()
# Align dates
common_dates = results.index.intersection(spy_monthly.index)
spy_aligned = spy_monthly.loc[common_dates]
strat_aligned = results.loc[common_dates, 'net']

fig, ax = plt.subplots(figsize=(14, 6))
ax.plot((1 + strat_aligned).cumprod(), label='XGBoost L/S (net)', color='steelblue')
ax.plot((1 + spy_aligned).cumprod(), label='SPY Buy & Hold', color='gray', alpha=0.7)
ax.set_title('Strategy vs SPY')
ax.set_ylabel('Growth of $1')
ax.legend()
plt.tight_layout()
plt.show()

spy_sharpe = spy_aligned.mean() / spy_aligned.std() * np.sqrt(12)
print(f"SPY Sharpe: {spy_sharpe:.2f}")
print(f"Strategy Sharpe: {sharpe:.2f}")
print(f"Correlation with SPY: {strat_aligned.corr(spy_aligned):.2f}")

## Part 7: Analysis & Discussion

### Source of Return

This strategy exploits **cross-sectional momentum** (stocks that outperformed recently tend to continue) combined with **mean-reversion at short horizons** and **volatility signals**. The economic mechanism behind momentum is debated but likely involves:
- Investor underreaction to news (behavioral)
- Gradual information diffusion across investor types
- Risk compensation for momentum crashes

### Limitations

- **Regime dependence:** Momentum crashes (e.g., 2009 reversal, 2020 COVID snap-back) can cause severe drawdowns
- **Crowding:** Momentum is a well-known factor — as more capital chases it, alpha decays
- **Survivorship bias:** Our universe (current S&P 500 members) contains survivorship bias. Stocks in our universe that were small in 2012 are there because they succeeded
- **Transaction costs:** We assumed 10 bps/side, which is reasonable for liquid large-caps but may underestimate costs for smaller names or during volatile periods

### Overfitting Risk

- We tried ~3 feature configurations during development → DSR accounts for this
- Conservative XGBoost parameters (max_depth=4, subsample=0.7, strong regularization) reduce in-sample overfitting
- Walk-forward methodology prevents look-ahead bias
- The rolling Sharpe plot reveals whether performance is stable or deteriorating

### Capacity

- Universe of ~80 large-cap stocks → each quintile has ~16 stocks
- With equal weighting, each position is ~6% of portfolio
- For $100M AUM: ~$6M per position → easily executable for liquid large-caps
- Capacity likely $500M-$1B before market impact becomes significant

### Improvements

- **Larger universe:** 200+ stocks for better diversification and more stable quintiles
- **More features:** Sentiment (FinBERT embeddings from Week 10), fundamental data
- **Ensemble:** Combine XGBoost with Ridge and LightGBM (Week 5 showed ensembles help)
- **Meta-labeling:** Use primary model → meta-model filter (Week 6) to improve precision
- **Dynamic position sizing:** Scale positions by model confidence or volatility regime