# Week 8 Seminar --- LSTM/GRU for Volatility Forecasting

**Quantitative Finance ML Course**

---

## Today's Plan (90 min)

| Time | Activity |
|------|----------|
| 25 min | Exercise 1: Compute RV using 3 estimators |
| 25 min | Exercise 2: Implement GARCH + HAR baselines |
| 20 min | Exercise 3: Build LSTM vol forecaster |
| 20 min | Discussion: When does classical beat DL? |

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.linear_model import LinearRegression
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

np.random.seed(42)
torch.manual_seed(42)

In [None]:
# --- Generate synthetic daily OHLCV data ---

np.random.seed(42)
n_days = 2520

# GARCH(1,1) DGP
omega, alpha, beta = 1e-6, 0.08, 0.90
returns = np.zeros(n_days)
sigma2 = np.zeros(n_days)
sigma2[0] = omega / (1 - alpha - beta)

for t in range(1, n_days):
    sigma2[t] = omega + alpha * returns[t-1]**2 + beta * sigma2[t-1]
    returns[t] = 0.0003 + np.sqrt(sigma2[t]) * np.random.randn()

close = 100 * np.exp(np.cumsum(returns))
daily_vol = np.sqrt(sigma2)
high = close * np.exp(np.abs(np.random.randn(n_days)) * daily_vol * 0.5)
low = close * np.exp(-np.abs(np.random.randn(n_days)) * daily_vol * 0.5)
open_p = close * np.exp(np.random.randn(n_days) * daily_vol * 0.2)
high = np.maximum(high, np.maximum(close, open_p)) * 1.001
low = np.minimum(low, np.minimum(close, open_p)) * 0.999

spy = pd.DataFrame({
    'open': open_p, 'high': high, 'low': low, 'close': close,
    'return': returns,
    'volume': np.random.lognormal(18, 0.5, n_days)
}, index=pd.bdate_range('2015-01-02', periods=n_days))

print(f'Data: {len(spy)} days')
spy.head()

---

## Exercise 1: Compute RV Using 3 Estimators (25 min)

Implement three realized volatility estimators:

1. **Close-to-Close**: $\hat{\sigma}_{CC} = \sqrt{\sum_{i=1}^{n} r_i^2}$ (sum of squared returns over window)
2. **Parkinson**: $\hat{\sigma}_P = \sqrt{\frac{1}{4 \ln 2} (\ln H - \ln L)^2}$
3. **Garman-Klass**: $\hat{\sigma}_{GK} = \sqrt{0.5(\ln H - \ln L)^2 - (2\ln 2 - 1)(\ln C - \ln O)^2}$

Use a 5-day rolling window and annualize (multiply by $\sqrt{252}$ or $\sqrt{252/5}$ as appropriate).

In [None]:
def rv_close_to_close(returns, window=5):
    """
    TODO: Compute close-to-close realized volatility.
    - Rolling sum of squared returns over `window` days
    - Take square root
    - Annualize: multiply by sqrt(252/window)
    """
    pass


def rv_parkinson(high, low, window=5):
    """
    TODO: Compute Parkinson range-based RV.
    - log_hl_sq = (ln(H) - ln(L))^2
    - Rolling mean over window
    - Divide by (4 * ln(2))
    - Take square root, annualize
    """
    pass


def rv_garman_klass(open_p, high, low, close, window=5):
    """
    TODO: Compute Garman-Klass RV.
    - gk = 0.5 * (ln(H) - ln(L))^2 - (2*ln(2) - 1) * (ln(C) - ln(O))^2
    - Rolling mean, sqrt, annualize
    """
    pass

In [None]:
# TODO: Compute all three estimators
# spy['rv_cc'] = rv_close_to_close(spy['return'])
# spy['rv_park'] = rv_parkinson(spy['high'], spy['low'])
# spy['rv_gk'] = rv_garman_klass(spy['open'], spy['high'], spy['low'], spy['close'])

print('RV estimators: implement above')

In [None]:
# TODO: Plot all three RV estimators on the same chart
# Also compute summary statistics: mean, std, min, max, autocorrelation(1)

print('RV plots and stats: implement above')

**Questions to answer:**
1. Which estimator is smoothest? Why?
2. What is the autocorrelation of each RV estimator at lag 1? What does this tell you?
3. Why is Parkinson typically lower variance than Close-to-Close?

---

## Exercise 2: Implement GARCH + HAR Baselines (25 min)

### Part A: GARCH(1,1)

Fit a GARCH(1,1) on training returns and produce one-step-ahead volatility forecasts.

### Part B: HAR Model

Implement the HAR model:
$$RV_{t+1} = \beta_0 + \beta_D RV_t + \beta_W \overline{RV}_{t,5} + \beta_M \overline{RV}_{t,22} + \epsilon_t$$

In [None]:
# --- Part A: GARCH(1,1) ---
# from arch import arch_model

# TODO:
# 1. Split data into train (60%) and test (40%)
# 2. Fit GARCH(1,1) on train returns * 100 (arch library convention)
# 3. Produce one-step-ahead forecasts on test set
# 4. Convert forecasted variance to annualized vol

print('GARCH baseline: implement above')

In [None]:
# --- Part B: HAR Model ---

# TODO:
# 1. Create HAR features: rv_d (daily), rv_w (5-day mean), rv_m (22-day mean)
# 2. Create target: next-day RV (shift by -1)
# 3. Split train/test temporally
# 4. Fit LinearRegression on train
# 5. Predict on test
# 6. Report coefficients and R^2

print('HAR baseline: implement above')

In [None]:
# TODO: Compare GARCH vs HAR
# - Compute MSE and correlation for both on test set
# - Plot actual vs predicted for both

print('GARCH vs HAR comparison: implement above')

---

## Exercise 3: Build LSTM Vol Forecaster (20 min)

Build a simple LSTM that takes sequences of (RV, returns, log_volume) and predicts next-day RV.

Architecture:
- LSTM: 2 layers, hidden_dim=32
- Sequence length: 20 days
- Head: Linear(32, 16) -> ReLU -> Linear(16, 1)
- Train with MSE loss and early stopping

In [None]:
class VolSequenceDataset(Dataset):
    """
    TODO: Create sequences for vol forecasting.
    Each sample: (seq_len, n_features) -> scalar target.
    """
    def __init__(self, data, feature_cols, target_col, seq_len=20):
        # TODO: Store features and targets, normalize features
        pass

    def __len__(self):
        # TODO
        pass

    def __getitem__(self, idx):
        # TODO: Return (X, y) where X has shape (seq_len, n_features)
        pass

In [None]:
class LSTMVolForecaster(nn.Module):
    """
    TODO: LSTM for vol forecasting.
    - LSTM: input_dim -> hidden_dim, n_layers, batch_first=True
    - Head: hidden_dim -> 16 -> 1
    - Use last hidden state from LSTM
    """
    def __init__(self, input_dim, hidden_dim=32, n_layers=2, dropout=0.2):
        super().__init__()
        # TODO
        pass

    def forward(self, x):
        # TODO
        pass

In [None]:
# TODO: Train the LSTM
# 1. Create VolSequenceDataset for train/val/test
# 2. Use DataLoader with shuffle=True for train only
# 3. Train with MSE loss, Adam optimizer, gradient clipping
# 4. Use early stopping on validation loss
# 5. Evaluate on test set

print('LSTM training: implement above')

In [None]:
# TODO: Final comparison table
# Compare GARCH, HAR, LSTM on test set
# Metrics: MSE, QLIKE, Correlation

print('Final comparison: implement above')

---

## Discussion: When Does Classical Beat Deep Learning? (20 min)

### Question 1
GARCH(1,1) has only 3 parameters. The LSTM has thousands. Under what conditions would you expect the LSTM to overfit and underperform?

**Think about**: sample size, signal-to-noise ratio, structural breaks.

### Question 2
The HAR model is just a linear regression with 3 features. Why is it so hard to beat?

**Think about**: what the HAR features capture, the smoothness of the vol process.

### Question 3
For which assets or conditions would you expect LSTM to have the biggest advantage?

**Think about**: crypto, FX, individual stocks, exogenous events, high-frequency data.

### Question 4
How would you combine GARCH and LSTM predictions? Is simple averaging optimal?

**Think about**: complementary strengths, regime-dependent weighting, stacking.

### Question 5
The QLIKE loss penalizes underestimation more than overestimation. Is this always desirable? When might MSE be preferred?

**Think about**: risk management vs trading, long vs short vol positions.

### Discussion Notes

*Write your group's key takeaways here:*

- Q1: ...
- Q2: ...
- Q3: ...
- Q4: ...
- Q5: ...

---

## Summary

Today you practiced:
1. Computing realized volatility with 3 different estimators
2. Implementing GARCH(1,1) and HAR baselines
3. Building an LSTM vol forecaster in PyTorch

**For the homework**: you'll extend this to 30 stocks, add the QLIKE loss, and experiment with attention mechanisms.