# Week 14 — Seminar: Backtesting & Strategy Evaluation

**Course:** ML for Quantitative Finance  
**Type:** Seminar (90 min)

---

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
from scipy import stats
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

## Exercise 1: Deflated Sharpe Ratio (20 min)

1. Implement the deflated Sharpe ratio function
2. Generate 200 random strategies on real S&P 500 data
3. Report the best Sharpe — then compute the DSR. Is it significant?

In [None]:
def deflated_sharpe_ratio(sharpe_obs, n_trials, T, skew=0, kurtosis=3):
    """Provided helper: compute DSR."""
    euler_mascheroni = 0.5772
    sr0 = np.sqrt(2 * np.log(n_trials)) - (np.log(np.pi) + euler_mascheroni) / (2 * np.sqrt(2 * np.log(n_trials)))
    se_sr = np.sqrt((1 + 0.5 * sharpe_obs**2 - skew * sharpe_obs +
                     (kurtosis - 3) / 4 * sharpe_obs**2) / T)
    z = (sharpe_obs - sr0) / se_sr
    return stats.norm.cdf(z)


# TODO: Download S&P 500 data (SPY)
# TODO: Generate 200 random MA crossover strategies with random (fast, slow) params
# TODO: Compute Sharpe for each
# TODO: Report best Sharpe and its DSR
# TODO: How many of the 200 strategies have DSR > 0.95?

## Exercise 2: Walk-Forward Backtest (30 min)

1. Load 40 stocks, compute momentum/volatility features
2. Implement walk-forward backtest with monthly rebalancing
3. Compare: RF vs XGBoost, with and without transaction costs

In [None]:
TICKERS = [
    'AAPL', 'MSFT', 'AMZN', 'GOOGL', 'META', 'NVDA', 'JPM', 'JNJ', 'V', 'PG',
    'UNH', 'HD', 'MA', 'DIS', 'BAC', 'XOM', 'CSCO', 'PFE', 'COST', 'ABT',
    'PEP', 'AVGO', 'CRM', 'NKE', 'CVX', 'WMT', 'MRK', 'LLY', 'ABBV', 'INTC',
    'T', 'VZ', 'QCOM', 'TXN', 'PM', 'UNP', 'NEE', 'LOW', 'BMY', 'AMGN',
]

cache_path = Path('w14_data_cache.pkl')
if cache_path.exists():
    raw = pd.read_pickle(cache_path)
else:
    raw = yf.download(TICKERS, start='2010-01-01', end='2024-12-31', progress=True)
    raw.to_pickle(cache_path)

prices = raw['Close'].ffill().dropna(axis=1, thresh=int(0.8 * len(raw)))
returns_daily = prices.pct_change()
monthly_prices = prices.resample('M').last()
monthly_returns = monthly_prices.pct_change()

# TODO: Compute features (mom_1m, mom_3m, mom_6m, vol_20d, vol_60d)
# TODO: Build panel dataset
# TODO: Walk-forward loop: for each month from 2016 onward,
#   - train on expanding window
#   - predict next month
#   - form long-short quintile portfolio
#   - record return (net of 10 bps/side costs)
# TODO: Compare RF vs XGBoost

## Exercise 3: Full Tear Sheet (20 min)

1. Implement a comprehensive tear sheet function
2. Compute all key metrics: Sharpe, Sortino, Calmar, Max DD, VaR, CVaR, Tail Ratio
3. Plot cumulative returns and drawdown chart

In [None]:
# TODO: Implement full_tear_sheet(returns_series) function
# Should compute and print:
#   - Annualized return and volatility
#   - Sharpe, Sortino, Calmar ratios
#   - Max drawdown
#   - Hit rate and profit factor
#   - VaR 5% and CVaR 5%
#   - Skewness and kurtosis
#   - Tail ratio (|95th pct / 5th pct|)
# TODO: Plot cumulative returns + drawdown
# TODO: Apply to your walk-forward results from Exercise 2

## Discussion (20 min)

1. Your walk-forward strategy has a Sharpe of X. If you tried 3 feature sets, 2 models, and 4 parameter configs (= 24 trials), what's the DSR?
2. A hedge fund shows you a backtest with Sharpe 3.0 but won't tell you how many strategies they tested. What questions do you ask?
3. Why do momentum strategies consistently work? Is this correlation or causation? What's the economic mechanism?
4. Your strategy has great Sharpe but a Calmar of 0.3. Would you trade it with your own money? Why or why not?