# Week 6 — Homework: Meta-Labeling Pipeline

**Course:** ML for Quantitative Finance  
**Due:** Before Week 7 lecture

---

## Objective

Implement 3 of Lopez de Prado's core contributions in one assignment:  
triple-barrier labeling, meta-labeling, and purged k-fold CV.

## Deliverable

This notebook + reusable `TripleBarrierLabeler` and `PurgedKFold` classes.

---

## Setup

In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

## Part 1: Primary Model (10 pts)

Implement a simple SMA crossover (50/200 day) as a long/short signal for SPY.

In [None]:
# TODO: Download SPY data
# TODO: Compute SMA(50) and SMA(200)
# TODO: Generate signal: +1 when SMA50 > SMA200, -1 otherwise
# TODO: Plot price with SMAs and signal

## Part 2: Triple-Barrier Labeling (25 pts)

Implement a `TripleBarrierLabeler` class:
- Dynamic barriers scaled by daily volatility
- Configurable profit-take, stop-loss multipliers, and max holding period
- Returns labels (-1, 0, +1) with metadata (which barrier was hit, holding period, return)

In [None]:
class TripleBarrierLabeler:
    """Triple-barrier labeling for financial ML."""

    def __init__(self, pt_mult=2.0, sl_mult=2.0, max_holding=10, vol_window=20):
        self.pt_mult = pt_mult
        self.sl_mult = sl_mult
        self.max_holding = max_holding
        self.vol_window = vol_window

    def label(self, prices):
        """Generate triple-barrier labels."""
        # TODO: implement
        pass

# TODO: Test with different parameter combinations
# TODO: Visualize barrier distribution

## Part 3: Meta-Labeling Model (25 pts)

1. Generate meta-labels: for each primary signal, did the trade in that direction make money?
2. Build features for the meta-model: volatility, volume, momentum, signal strength
3. Train RF or XGBoost meta-model
4. Use the meta-model for position sizing: meta_prob > 0.5 → trade, else skip

In [None]:
# TODO: Generate meta-labels from primary signal + triple-barrier
# TODO: Build feature matrix for meta-model
# TODO: Train meta-model
# TODO: Implement position sizing based on meta-model probability

## Part 4: Purged K-Fold CV (20 pts)

Implement `PurgedKFold` class:
- At least 5 folds
- Embargo period ≥ max label duration
- Show the information leakage by comparing standard vs. purged CV scores

In [None]:
class PurgedKFold:
    """Purged K-Fold CV for financial data."""

    def __init__(self, n_splits=5, embargo_days=10):
        # TODO: implement
        pass

    def split(self, dates):
        # TODO: implement
        pass

# TODO: Compare standard vs purged CV
# TODO: Quantify the information leakage

## Part 5: Full Pipeline Backtest (20 pts)

1. Primary signal → meta-label filter → position sizing → backtest
2. Compare: primary alone vs. primary + meta-label
3. Report: Sharpe, hit rate (% profitable trades), profit factor

In [None]:
# TODO: Implement full pipeline
# TODO: Backtest both versions
# TODO: Report comparison metrics