# Week 12 — Derivatives Pricing with Neural Networks

**Key ideas:**
- Neural networks can learn the pricing function from data — replacing or accelerating Monte Carlo
- PyTorch autograd gives us Greeks (sensitivities) for free
- Deep Hedging learns end-to-end hedging strategies
- NNs can learn pricing for models where no closed-form solution exists

**Outline:**
1. Options crash course: calls/puts, Black-Scholes, Greeks
2. Why neural networks for pricing
3. NN pricing architecture and no-arbitrage constraints
4. Deep Hedging (Buehler et al. 2019)
5. Implied volatility surface learning
6. Demo: Black-Scholes in PyTorch, autograd Greeks, train NN pricer

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from scipy.stats import norm

plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 5)

torch.manual_seed(42)
np.random.seed(42)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

---
## 1. Options Crash Course

### What is an option?
- **Call option**: right (not obligation) to BUY the asset at strike $K$ at expiry $T$
- **Put option**: right to SELL at strike $K$ at expiry $T$

Payoff at expiry:
$$\text{Call payoff} = \max(S_T - K, 0), \qquad \text{Put payoff} = \max(K - S_T, 0)$$

### Black-Scholes formula

Under the Black-Scholes model (geometric Brownian motion, constant volatility):

$$C(S, K, T, \sigma, r) = S \cdot N(d_1) - K e^{-rT} \cdot N(d_2)$$

where:
$$d_1 = \frac{\ln(S/K) + (r + \sigma^2/2)T}{\sigma\sqrt{T}}, \qquad d_2 = d_1 - \sigma\sqrt{T}$$

### The Greeks

| Greek | Symbol | Definition | Interpretation |
|-------|--------|------------|----------------|
| Delta | $\Delta$ | $\frac{\partial C}{\partial S}$ | Price sensitivity to spot |
| Gamma | $\Gamma$ | $\frac{\partial^2 C}{\partial S^2}$ | Delta's sensitivity to spot |
| Theta | $\Theta$ | $\frac{\partial C}{\partial T}$ | Time decay |
| Vega | $\mathcal{V}$ | $\frac{\partial C}{\partial \sigma}$ | Sensitivity to volatility |

In [None]:
# Black-Scholes in NumPy (reference implementation)
def bs_call_numpy(S, K, T, sigma, r):
    """Black-Scholes European call price."""
    d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    d2 = d1 - sigma * np.sqrt(T)
    return S * norm.cdf(d1) - K * np.exp(-r * T) * norm.cdf(d2)

def bs_delta_numpy(S, K, T, sigma, r):
    d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    return norm.cdf(d1)

def bs_gamma_numpy(S, K, T, sigma, r):
    d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    return norm.pdf(d1) / (S * sigma * np.sqrt(T))

def bs_vega_numpy(S, K, T, sigma, r):
    d1 = (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    return S * norm.pdf(d1) * np.sqrt(T)

# Example
S, K, T, sigma, r = 100, 100, 1.0, 0.2, 0.05
price = bs_call_numpy(S, K, T, sigma, r)
print(f"BS Call Price: ${price:.4f}")
print(f"Delta: {bs_delta_numpy(S, K, T, sigma, r):.4f}")
print(f"Gamma: {bs_gamma_numpy(S, K, T, sigma, r):.6f}")
print(f"Vega:  {bs_vega_numpy(S, K, T, sigma, r):.4f}")

In [None]:
# Visualize option price surface
S_range = np.linspace(60, 140, 100)
T_range = np.array([0.1, 0.25, 0.5, 1.0, 2.0])

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Price vs spot for different maturities
for t in T_range:
    prices = bs_call_numpy(S_range, 100, t, 0.2, 0.05)
    axes[0].plot(S_range, prices, label=f'T={t:.2f}')
axes[0].plot(S_range, np.maximum(S_range - 100, 0), 'k--', alpha=0.3, label='Payoff')
axes[0].set_title('Call Price vs Spot')
axes[0].set_xlabel('Spot Price S')
axes[0].set_ylabel('Call Price')
axes[0].legend()

# Delta vs spot
for t in T_range:
    deltas = bs_delta_numpy(S_range, 100, t, 0.2, 0.05)
    axes[1].plot(S_range, deltas, label=f'T={t:.2f}')
axes[1].set_title('Delta vs Spot')
axes[1].set_xlabel('Spot Price S')
axes[1].set_ylabel('Delta')
axes[1].legend()

plt.tight_layout()
plt.show()

---
## 2. Why Neural Networks for Pricing?

### Problem: Black-Scholes is wrong
BS assumes constant volatility — in reality:
- Volatility changes over time (stochastic vol)
- Implied vol depends on strike and maturity (vol smile/skew)
- Exotic options have no closed-form solution

### Traditional alternatives
- **Monte Carlo**: accurate but slow (10-60 seconds per price)
- **Finite difference PDE**: fast for 1-2 factors, impossible for 5+
- **Analytical approximations**: fast but inaccurate

### Neural network approach

Train a neural network to approximate the pricing function:

$$f_\theta(S, K, T, \sigma, r) \approx C(S, K, T, \sigma, r)$$

**Advantages:**
1. **Speed**: Forward pass takes microseconds (1000x faster than Monte Carlo)
2. **Autograd Greeks**: PyTorch gives us $\partial C / \partial S$, $\partial^2 C / \partial S^2$, etc. for free
3. **Flexibility**: Same architecture works for any model (BS, Heston, SABR, ...)
4. **Data-driven**: Can learn from market prices directly, no model assumption needed

**Workflow:**
1. Generate training data: $(S, K, T, \sigma, r) \to C$ using Monte Carlo or analytical formula
2. Train neural network on this data
3. Use the trained network for fast pricing and Greek computation

---
## 3. NN Pricing Architecture

### Basic architecture
```
Input (S, K, T, sigma, r)  -->  [FC 128] --> ReLU --> [FC 128] --> ReLU --> [FC 64] --> ReLU --> [FC 1] --> Softplus --> Price
```

### Key design choices

**Input normalization:** Use log-moneyness $\ln(S/K)$ instead of raw $S$ and $K$ — reduces the input space and helps generalization.

**Output activation:** Use `softplus` ($\log(1 + e^x)$) to ensure positive prices.

### No-arbitrage constraints

A valid call price must satisfy:
- $C \geq 0$ (non-negativity) — enforced by softplus
- $C \leq S$ (bounded above by spot)
- $0 \leq \Delta \leq 1$ (monotonicity)
- $\Gamma \geq 0$ (convexity)

These can be enforced by:
1. **Soft constraints**: Add penalty terms to loss
2. **Hard constraints**: Use architectures that are inherently monotone/convex (ICNN)
3. **Training data**: Ensure training data satisfies these properties

---
## 4. Deep Hedging (Buehler et al. 2019)

Instead of pricing then hedging, **learn the hedging strategy end-to-end**.

### Setup
- Sell an option at time 0, receive premium $C_0$
- At each time step $t$, choose hedge ratio $\delta_t = f_\theta(S_t, t, \ldots)$
- Hold $\delta_t$ shares of the underlying
- At expiry: P&L = premium + hedging gains - option payoff

### Loss function
$$\mathcal{L}(\theta) = \rho\left( -C_0 - \sum_{t=0}^{T-1} \delta_t (S_{t+1} - S_t) + \text{payoff}(S_T) \right)$$

where $\rho$ is a risk measure (e.g., CVaR, variance).

### Why this is powerful
- Works with ANY underlying dynamics (no model needed)
- Handles transaction costs naturally
- Handles discrete hedging (realistic!)
- The "price" is the minimum premium $C_0$ that makes the hedge P&L acceptable

This approach is gaining traction at sell-side banks for exotic options.

---
## 5. Implied Volatility Surface Learning

### What is the implied vol surface?
Given market prices of options at different $(K, T)$, invert BS to find $\sigma_{\text{imp}}(K, T)$.

The IV surface captures:
- **Smile/skew** in strike dimension (OTM puts are expensive)
- **Term structure** in maturity dimension

### NN approach
Train $f_\theta(\text{moneyness}, T) \to \sigma_{\text{imp}}$ to:
- Interpolate between observed strikes/maturities
- Extrapolate to unobserved regions
- Ensure smoothness and no-arbitrage (no butterfly arbitrage)

This replaces traditional parametric models (SVI, SSVI) with a more flexible learned surface.

---
## 6. Demo: Black-Scholes in PyTorch + Autograd Greeks + NN Pricer

In [None]:
# Black-Scholes in PyTorch (differentiable)
def normal_cdf(x):
    """Standard normal CDF using PyTorch."""
    return 0.5 * (1 + torch.erf(x / np.sqrt(2)))

def bs_call_torch(S, K, T, sigma, r):
    """Black-Scholes call price in PyTorch (fully differentiable)."""
    d1 = (torch.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * torch.sqrt(T))
    d2 = d1 - sigma * torch.sqrt(T)
    return S * normal_cdf(d1) - K * torch.exp(-r * T) * normal_cdf(d2)

In [None]:
# Compute Greeks using PyTorch autograd
S = torch.tensor(100.0, requires_grad=True)
K = torch.tensor(100.0)
T = torch.tensor(1.0, requires_grad=True)
sigma = torch.tensor(0.2, requires_grad=True)
r = torch.tensor(0.05)

# Forward pass
price = bs_call_torch(S, K, T, sigma, r)

# Compute first-order Greeks
price.backward(retain_graph=True)

delta_autograd = S.grad.item()
theta_autograd = T.grad.item()  # Note: this is dC/dT, not -dC/dT
vega_autograd = sigma.grad.item()

# Compute Gamma (second derivative)
S2 = torch.tensor(100.0, requires_grad=True)
price2 = bs_call_torch(S2, K, T.detach(), sigma.detach(), r)
delta_fn = torch.autograd.grad(price2, S2, create_graph=True)[0]
gamma_autograd = torch.autograd.grad(delta_fn, S2)[0].item()

print(f"{'Greek':<10} {'Autograd':>12} {'Analytical':>12} {'Error':>12}")
print('-' * 48)
print(f"{'Price':<10} {price.item():>12.4f} {bs_call_numpy(100, 100, 1, 0.2, 0.05):>12.4f}")
print(f"{'Delta':<10} {delta_autograd:>12.4f} {bs_delta_numpy(100, 100, 1, 0.2, 0.05):>12.4f} {abs(delta_autograd - bs_delta_numpy(100, 100, 1, 0.2, 0.05)):>12.6f}")
print(f"{'Gamma':<10} {gamma_autograd:>12.6f} {bs_gamma_numpy(100, 100, 1, 0.2, 0.05):>12.6f} {abs(gamma_autograd - bs_gamma_numpy(100, 100, 1, 0.2, 0.05)):>12.8f}")
print(f"{'Vega':<10} {vega_autograd:>12.4f} {bs_vega_numpy(100, 100, 1, 0.2, 0.05):>12.4f} {abs(vega_autograd - bs_vega_numpy(100, 100, 1, 0.2, 0.05)):>12.6f}")

Autograd matches analytical Greeks to machine precision. This is the core insight: any differentiable pricing function gives us Greeks for free.

In [None]:
# Generate training data for NN pricer
N = 100_000

# Sample inputs uniformly from realistic ranges
S_train = np.random.uniform(50, 150, N)
K_train = np.random.uniform(50, 150, N)
T_train = np.random.uniform(0.05, 3.0, N)
sigma_train = np.random.uniform(0.05, 0.6, N)
r_train = np.random.uniform(0.0, 0.1, N)

# Compute BS prices (labels)
C_train = bs_call_numpy(S_train, K_train, T_train, sigma_train, r_train)

# Use log-moneyness as feature
log_m = np.log(S_train / K_train)

# Stack features: [log(S/K), T, sigma, r]
X = np.column_stack([log_m, T_train, sigma_train, r_train]).astype(np.float32)
# Normalize price by strike
y = (C_train / K_train).astype(np.float32)

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")
print(f"y range: [{y.min():.4f}, {y.max():.4f}]")

In [None]:
# Define NN pricer
class OptionPricer(nn.Module):
    def __init__(self, input_dim=4, hidden=128):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden),
            nn.ReLU(),
            nn.Linear(hidden, hidden),
            nn.ReLU(),
            nn.Linear(hidden, 64),
            nn.ReLU(),
            nn.Linear(64, 1),
            nn.Softplus(),  # ensures positive output
        )
    
    def forward(self, x):
        return self.net(x).squeeze(-1)

model = OptionPricer().to(device)
print(f"Parameters: {sum(p.numel() for p in model.parameters()):,}")

In [None]:
# Train the NN pricer
X_tensor = torch.tensor(X).to(device)
y_tensor = torch.tensor(y).to(device)

# Train/val split
n_train = int(0.9 * N)
X_tr, X_val = X_tensor[:n_train], X_tensor[n_train:]
y_tr, y_val = y_tensor[:n_train], y_tensor[n_train:]

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5)

batch_size = 4096
n_epochs = 50
train_losses, val_losses = [], []

for epoch in range(n_epochs):
    # Training
    model.train()
    perm = torch.randperm(n_train)
    epoch_loss = 0
    n_batches = 0
    
    for i in range(0, n_train, batch_size):
        idx = perm[i:i+batch_size]
        pred = model(X_tr[idx])
        loss = nn.MSELoss()(pred, y_tr[idx])
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        n_batches += 1
    
    train_losses.append(epoch_loss / n_batches)
    
    # Validation
    model.eval()
    with torch.no_grad():
        val_pred = model(X_val)
        val_loss = nn.MSELoss()(val_pred, y_val).item()
    val_losses.append(val_loss)
    scheduler.step(val_loss)
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch+1:3d} | Train MSE: {train_losses[-1]:.6f} | Val MSE: {val_loss:.6f}")

plt.plot(train_losses, label='Train')
plt.plot(val_losses, label='Validation')
plt.title('NN Pricer Training')
plt.xlabel('Epoch')
plt.ylabel('MSE Loss')
plt.yscale('log')
plt.legend()
plt.show()

In [None]:
# Compare NN vs BS prices
model.eval()

# Test on new data
S_test = np.linspace(60, 140, 200)
K_test = 100.0
T_test = 1.0
sigma_test = 0.2
r_test = 0.05

# BS prices
bs_prices = bs_call_numpy(S_test, K_test, T_test, sigma_test, r_test)

# NN prices
log_m_test = np.log(S_test / K_test)
X_test_nn = np.column_stack([
    log_m_test,
    np.full_like(S_test, T_test),
    np.full_like(S_test, sigma_test),
    np.full_like(S_test, r_test),
]).astype(np.float32)

with torch.no_grad():
    nn_prices = model(torch.tensor(X_test_nn).to(device)).cpu().numpy() * K_test

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(S_test, bs_prices, label='Black-Scholes', linewidth=2)
axes[0].plot(S_test, nn_prices, '--', label='Neural Network', linewidth=2)
axes[0].set_title('Call Price: BS vs NN')
axes[0].set_xlabel('Spot Price S')
axes[0].set_ylabel('Call Price')
axes[0].legend()

axes[1].plot(S_test, bs_prices - nn_prices)
axes[1].set_title('Pricing Error (BS - NN)')
axes[1].set_xlabel('Spot Price S')
axes[1].set_ylabel('Error ($)')
axes[1].axhline(0, color='gray', linestyle='--', linewidth=0.5)

plt.tight_layout()
plt.show()

print(f"Mean absolute error: ${np.abs(bs_prices - nn_prices).mean():.4f}")
print(f"Max absolute error:  ${np.abs(bs_prices - nn_prices).max():.4f}")

In [None]:
# Neural Greeks via autograd
def nn_greeks(model, S_vals, K, T, sigma, r):
    """Compute Greeks from the NN pricer using autograd."""
    S_t = torch.tensor(S_vals, dtype=torch.float32, requires_grad=True).to(device)
    log_m = torch.log(S_t / K)
    T_t = torch.full_like(S_t, T)
    sigma_t = torch.full_like(S_t, sigma)
    r_t = torch.full_like(S_t, r)
    
    x = torch.stack([log_m, T_t, sigma_t, r_t], dim=1)
    price = model(x) * K  # un-normalize
    
    # Delta = dC/dS
    delta = torch.autograd.grad(price.sum(), S_t, create_graph=True)[0]
    # Gamma = d2C/dS2
    gamma = torch.autograd.grad(delta.sum(), S_t)[0]
    
    return (
        price.detach().cpu().numpy(),
        delta.detach().cpu().numpy(),
        gamma.detach().cpu().numpy(),
    )

nn_price, nn_delta, nn_gamma = nn_greeks(model, S_test, 100.0, 1.0, 0.2, 0.05)

# Compare
fig, axes = plt.subplots(1, 3, figsize=(16, 4))

axes[0].plot(S_test, bs_prices, label='BS')
axes[0].plot(S_test, nn_price, '--', label='NN')
axes[0].set_title('Price')
axes[0].legend()

bs_deltas = bs_delta_numpy(S_test, 100, 1.0, 0.2, 0.05)
axes[1].plot(S_test, bs_deltas, label='BS')
axes[1].plot(S_test, nn_delta, '--', label='NN')
axes[1].set_title('Delta')
axes[1].legend()

bs_gammas = bs_gamma_numpy(S_test, 100, 1.0, 0.2, 0.05)
axes[2].plot(S_test, bs_gammas, label='BS')
axes[2].plot(S_test, nn_gamma, '--', label='NN')
axes[2].set_title('Gamma')
axes[2].legend()

for ax in axes:
    ax.set_xlabel('Spot Price S')

plt.tight_layout()
plt.show()

---
## Key Takeaways

1. **Black-Scholes is a starting point** — real markets have stochastic vol, jumps, skew
2. **NNs can learn any pricing function** — from BS, Heston, or raw market data
3. **Autograd Greeks are free** — just differentiate the pricing network
4. **Speed advantage is massive** — microseconds vs seconds for Monte Carlo
5. **No-arbitrage constraints** are important — use softplus, add penalty terms
6. **Deep Hedging** is the frontier — learn hedging end-to-end without pricing

### Next: Seminar
Hands-on with PyTorch BS, NN pricing, and Greek computation.