# Week 8 Homework --- Volatility Forecasting Showdown

**Quantitative Finance ML Course**

**Total: 100 points**

---

## Overview

In this homework, you will build a comprehensive volatility forecasting pipeline:
1. Compute realized volatility for 30 stocks
2. Implement GARCH and HAR baselines
3. Build an LSTM vol forecaster with proper temporal splitting
4. Compare all models
5. Extend with attention

### Grading

| Part | Points | Topic |
|------|--------|-------|
| 1 | 15 | Compute RV for 30 stocks |
| 2 | 20 | GARCH + HAR baselines |
| 3 | 30 | LSTM vol forecaster |
| 4 | 20 | Model comparison (QLIKE + MSE) |
| 5 | 15 | TFT or attention extension |

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

np.random.seed(42)
torch.manual_seed(42)

In [None]:
# --- Generate synthetic data for 30 stocks ---

np.random.seed(42)
n_days = 2520  # ~10 years
n_stocks = 30
tickers = [f'STOCK_{i:02d}' for i in range(n_stocks)]

all_data = {}
for ticker in tickers:
    omega = np.random.uniform(5e-7, 5e-6)
    alpha = np.random.uniform(0.04, 0.12)
    beta = np.random.uniform(0.82, 0.94)
    # Ensure stationarity
    if alpha + beta >= 0.999:
        beta = 0.999 - alpha

    returns = np.zeros(n_days)
    sigma2 = np.zeros(n_days)
    sigma2[0] = omega / (1 - alpha - beta)

    for t in range(1, n_days):
        sigma2[t] = omega + alpha * returns[t-1]**2 + beta * sigma2[t-1]
        returns[t] = np.sqrt(sigma2[t]) * np.random.randn()

    close = 100 * np.exp(np.cumsum(returns))
    dv = np.sqrt(sigma2)
    high = close * np.exp(np.abs(np.random.randn(n_days)) * dv * 0.5)
    low = close * np.exp(-np.abs(np.random.randn(n_days)) * dv * 0.5)
    open_p = close * np.exp(np.random.randn(n_days) * dv * 0.2)
    high = np.maximum(high, np.maximum(close, open_p)) * 1.001
    low = np.minimum(low, np.minimum(close, open_p)) * 0.999

    all_data[ticker] = pd.DataFrame({
        'open': open_p, 'high': high, 'low': low, 'close': close,
        'return': returns,
        'volume': np.random.lognormal(18, 0.5, n_days)
    }, index=pd.bdate_range('2015-01-02', periods=n_days))

print(f'Generated data for {n_stocks} stocks, {n_days} days each')
print(f'Tickers: {tickers[:5]} ... {tickers[-1]}')

---

## Part 1: Compute RV for 30 Stocks (15 pts)

For each of the 30 stocks:
1. Compute close-to-close RV (5-day window)
2. Compute Parkinson RV (5-day window)
3. Compute Garman-Klass RV (5-day window)
4. Annualize all estimators
5. Create the target: next-day RV (using close-to-close)

Show summary statistics and a plot for 3 representative stocks.

**Grading**: Correct estimators (5), annualization (5), plots/stats (5)

In [None]:
def rv_close_to_close(returns, window=5):
    """TODO: Close-to-close realized volatility."""
    pass

def rv_parkinson(high, low, window=5):
    """TODO: Parkinson range-based RV."""
    pass

def rv_garman_klass(open_p, high, low, close, window=5):
    """TODO: Garman-Klass RV."""
    pass


# TODO: Apply to all 30 stocks, add rv_cc, rv_park, rv_gk, rv_target columns
# for ticker, stock_df in all_data.items():
#     ...

print('Part 1: implement above')

In [None]:
# TODO: Summary statistics table and plots for 3 stocks

print('Stats and plots: implement above')

---

## Part 2: GARCH + HAR Baselines (20 pts)

For each stock, implement:

### GARCH(1,1)
- Fit on training data (first 60%)
- One-step-ahead forecasts on test data (last 20%)

### HAR Model
- Features: daily RV, 5-day mean RV, 22-day mean RV
- Fit on training data, predict on test data

**Grading**: GARCH implementation (8), HAR implementation (7), proper splitting (5)

In [None]:
# TODO: GARCH(1,1) for all 30 stocks
# from arch import arch_model
#
# For each stock:
#   1. Fit GARCH on training returns
#   2. Forecast on test set
#   3. Store predictions

print('GARCH baselines: implement above')

In [None]:
# TODO: HAR model for all 30 stocks
# For each stock:
#   1. Create HAR features (rv_d, rv_w, rv_m)
#   2. Fit on training data
#   3. Predict on test data
#   4. Store predictions and coefficients

print('HAR baselines: implement above')

---

## Part 3: LSTM Vol Forecaster (30 pts)

Build an LSTM that forecasts next-day RV for all 30 stocks.

**Architecture**:
- Input features: [rv_cc, return, log_volume] (3 features)
- Sequence length: 20 days
- LSTM: 2 layers, hidden_dim=32
- Head: Linear(32, 16) -> ReLU -> Linear(16, 1)

**Training**:
- Temporal split: train (60%), val (20%), test (20%)
- MSE loss, Adam optimizer (lr=1e-3)
- Gradient clipping (max_norm=1.0)
- Early stopping (patience=15)

You can train one model per stock or one model across all stocks (your choice, but justify it).

**Grading**: Correct architecture (10), proper temporal split (8), training loop (7), predictions (5)

In [None]:
class VolSequenceDataset(Dataset):
    """TODO: Implement sequence dataset for vol forecasting."""
    def __init__(self, data, feature_cols, target_col, seq_len=20):
        pass

    def __len__(self):
        pass

    def __getitem__(self, idx):
        pass


class LSTMVolForecaster(nn.Module):
    """TODO: Implement LSTM for vol forecasting."""
    def __init__(self, input_dim, hidden_dim=32, n_layers=2, dropout=0.2):
        super().__init__()
        pass

    def forward(self, x):
        pass

In [None]:
# TODO: Training loop with early stopping
# For each stock (or pooled):
#   1. Create datasets and loaders
#   2. Train with gradient clipping
#   3. Early stopping on validation loss
#   4. Predict on test set

print('LSTM training: implement above')

---

## Part 4: Compare All Models (20 pts)

Compare GARCH, HAR, and LSTM on the test set for all 30 stocks.

**Metrics**:
1. **MSE**: $\frac{1}{N}\sum(\sigma - \hat{\sigma})^2$
2. **QLIKE**: $\frac{1}{N}\sum\left(\frac{\sigma^2}{\hat{\sigma}^2} - \ln\frac{\sigma^2}{\hat{\sigma}^2} - 1\right)$
3. **Correlation**: between predicted and actual RV

**Required outputs**:
- Table: average metrics across all 30 stocks
- Plot: model comparison bar chart
- Per-stock analysis: for which stocks does LSTM win?

**Grading**: Correct QLIKE (5), comprehensive table (5), plots (5), per-stock analysis (5)

In [None]:
def compute_qlike(y_true, y_pred):
    """TODO: Compute QLIKE loss."""
    pass


# TODO: Compute MSE, QLIKE, Correlation for each stock x model
# Aggregate across stocks

print('Model comparison: implement above')

In [None]:
# TODO: Comparison table and bar chart

print('Comparison plots: implement above')

In [None]:
# TODO: Per-stock analysis
# For how many stocks does LSTM beat GARCH? Beat HAR?
# Are there patterns? (e.g., LSTM wins for high-vol stocks?)

print('Per-stock analysis: implement above')

---

## Part 5: Attention Extension (15 pts)

Extend the LSTM with a simple attention mechanism or implement a TFT-style (Temporal Fusion Transformer) layer.

**Option A: LSTM + Attention**
- Add a self-attention layer over LSTM hidden states
- Instead of using only the last hidden state, compute attention-weighted average

**Option B: Simple Temporal Transformer**
- Replace LSTM with a TransformerEncoder
- Use positional encoding for time steps

Compare against the vanilla LSTM from Part 3.

**Grading**: Correct implementation (8), comparison with vanilla LSTM (4), discussion (3)

In [None]:
class LSTMWithAttention(nn.Module):
    """
    TODO: LSTM with attention over hidden states.
    Instead of using only h_T, compute:
        alpha_t = softmax(v^T tanh(W h_t))
        context = sum(alpha_t * h_t)
    Then feed context to the prediction head.
    """
    def __init__(self, input_dim, hidden_dim=32, n_layers=2, dropout=0.2):
        super().__init__()
        # TODO
        pass

    def forward(self, x):
        # TODO
        pass

In [None]:
# TODO: Train the attention model and compare with vanilla LSTM
# - Use same data, same training procedure
# - Report MSE, QLIKE, Correlation
# - Visualize attention weights for a few sample sequences

print('Attention model: implement above')

### Discussion

*Write 1-2 paragraphs on:*

1. Does attention help? Why or why not?
2. What do the attention weights focus on? (recent days? high-vol days?)
3. Is the improvement worth the added complexity?

---

## Submission Checklist

- [ ] Part 1: RV computed for all 30 stocks with 3 estimators
- [ ] Part 2: GARCH and HAR baselines produce predictions
- [ ] Part 3: LSTM trained with proper temporal splitting
- [ ] Part 4: Comparison table with MSE, QLIKE, Correlation
- [ ] Part 5: Attention model implemented and compared
- [ ] All cells run without errors
- [ ] Notebook is clean and well-organized