# Kalman Filter Pairs Trading - Introduction

This notebook introduces the Kalman Filter approach to pairs trading.

## 1. What is Pairs Trading?

Pairs trading is a market-neutral strategy that:
- Identifies two **cointegrated** assets
- Trades the **spread** between them
- Profits from **mean reversion**

### Key Concepts

**Cointegration**: Two assets are cointegrated if their price spread is stationary.

**Spread**: $S(t) = P_A(t) - \beta \cdot P_B(t)$

**Hedge Ratio** ($\beta$): The optimal ratio for trading the pair.

## 2. Why Kalman Filter?

Traditional pairs trading uses **static** hedge ratios from linear regression.

The Kalman Filter provides:
- **Dynamic** hedge ratio estimation
- Adapts to changing market conditions
- Online learning without retraining
- Probabilistic framework

## 3. Mathematical Framework

### State Space Model

**State Equation** (how beta evolves):
$$\beta(t) = \beta(t-1) + w(t)$$

**Observation Equation** (how prices relate):
$$P_A(t) = \beta(t) \cdot P_B(t) + \alpha(t) + v(t)$$

Where:
- $w(t) \sim N(0, Q)$ is process noise
- $v(t) \sim N(0, R)$ is observation noise

In [None]:
# Setup
import sys
import os
sys. path.append('. .')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from src.kalman_filter import KalmanFilterRegression
from src.data_manager import download_data

# Plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

%matplotlib inline

## 4. Example:  GLD vs GDX

Let's demonstrate with GLD (Gold ETF) and GDX (Gold Miners ETF).

In [None]:
# Download data
df = download_data('GLD', 'GDX', '2020-01-01', '2024-01-01')
print(f"Downloaded {len(df)} observations")

# Display
df.head()

In [None]:
# Plot prices
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Prices
axes[0].plot(df. index, df['asset1'], label='GLD', linewidth=2)
axes[0].set_title('GLD Price', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Price ($)')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(df.index, df['asset2'], label='GDX', color='orange', linewidth=2)
axes[1].set_title('GDX Price', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Price ($)')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt. tight_layout()
plt.show()

## 5. Static vs Dynamic Hedge Ratio

In [None]:
from sklearn.linear_model import LinearRegression

# Static hedge ratio (OLS)
X = df['asset2']. values.reshape(-1, 1)
y = df['asset1'].values

ols_model = LinearRegression()
ols_model.fit(X, y)
static_beta = ols_model.coef_[0]

print(f"Static Hedge Ratio (OLS): {static_beta:.4f}")

# Dynamic hedge ratio (Kalman Filter)
kf = KalmanFilterRegression(delta=1e-4)

dynamic_betas = []
for i in range(len(df)):
    beta, alpha, spread = kf.update(df['asset2'].iloc[i], df['asset1'].iloc[i])
    dynamic_betas.append(beta)

df['dynamic_beta'] = dynamic_betas

print(f"Dynamic Hedge Ratio (mean): {np.mean(dynamic_betas):.4f}")
print(f"Dynamic Hedge Ratio (std):  {np.std(dynamic_betas):.4f}")

In [None]:
# Plot comparison
fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(df.index, df['dynamic_beta'], label='Dynamic (Kalman)', linewidth=2)
ax.axhline(y=static_beta, color='red', linestyle='--', linewidth=2, label='Static (OLS)')

ax.set_title('Static vs Dynamic Hedge Ratio', fontsize=14, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Beta')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Spread Analysis

In [None]:
# Calculate spreads
static_spread = df['asset1'] - static_beta * df['asset2']
dynamic_spread = df['asset1'] - df['dynamic_beta'] * df['asset2']

# Plot
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

axes[0].plot(df.index, static_spread, linewidth=1)
axes[0].set_title('Static Spread', fontsize=12, fontweight='bold')
axes[0]. axhline(y=static_spread.mean(), color='red', linestyle='--', alpha=0.5)
axes[0].grid(True, alpha=0.3)

axes[1].plot(df.index, dynamic_spread, color='green', linewidth=1)
axes[1].set_title('Dynamic Spread', fontsize=12, fontweight='bold')
axes[1].axhline(y=dynamic_spread.mean(), color='red', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Date')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Statistics
print("\nSpread Statistics:")
print(f"Static Spread  - Mean: {static_spread.mean():.4f}, Std: {static_spread. std():.4f}")
print(f"Dynamic Spread - Mean: {dynamic_spread.mean():.4f}, Std: {dynamic_spread.std():.4f}")

## 7. Next Steps

In the next notebooks, we'll cover:
- **Notebook 2**: Finding tradeable pairs
- **Notebook 3**: Building and backtesting strategies
- **Notebook 4**: Multi-pair portfolio optimization

## Summary

✅ Kalman Filter provides **adaptive** hedge ratios

✅ Better captures **time-varying** relationships

✅ More **robust** to regime changes

✅ **Online learning** without retraining