# Kalman Filters and Cointegration: Dynamic Spread Strategies in Pairs Trading

18/09/25

Single Stock Trading:

- Traditional valuation methods (PER, DCF, etc) are long-term and will not work well with short term trading.

- We are exposed to market-wide movements (beta). Even if our fundamentals are correct, these kind of events can cause selloffs.

- Price movements follow a random walk, meaning that price changes are unpredictable and have no clear mean reverting behavior.

- The variance of a stock price grows linearly over time in a random walk. Small forecasting errors compound over time, making it difficult to time entries and exits.

    - $dS_t=\mu S_t dt + \sigma S_t dW_t$

    - $dWt \sim N(0,dt)$

    - $E[dW_t]=0$

    - $Var(dW_t)=E[(dWt)^2]=dt$

- Many ML models assume stationarity, however, stock prices follow a non-stationary process, meaning historical patterns may not hold in the future.

- Unlike traditional ML problems, there's no clear "ground truth" for whether a stock will go up or down. Market conditions change, making training data quickly outdated (data drift).

Possible Workaround. Pairs Trading:

- Instead of trying to predict absolute price movements, we can focus on the spread between two cointegrated assets.

- If the spread follows a mean-reverting process, we can trade based on statistical arbitrage instead of directional bets.

    - $dS_t=\theta (\mu - S_t)dt + \sigma dW_t$

What is "Pairs Trading"?:

- Pairs Trading is a market-neutral trading strategy that involves taking a long position in one asset and a short position in another, typically related, asset.

- The idea is to exploit the relative price movements between the two assets rather than their absolute price movements.

- These assets must be historically correlated or cointegrated, meaning that, hopefully, the prices will revert to their historical relationship after any divergence.

Required Conditions:

- Economic relation: Cointegration often arises when two assets are influenced by common economic factors or have an inherent economic relationship.

- Non-stationarity: Both series must be individually non-stationary.

- Linear combination is stationary: This implies that while the series may deviate from each other, these deviations have a stable, mean-reverting property.

Advantages:

- Market Neutrality: As mentioned, the strategy is designed to be insensitive to overall market movements, which can provide protection in volatile or bearish markets.

- Diversification: By focusing on relative price movements, pairs trading can offer diversification benefits, especially when added to a portfolio of traditional directional trades.

Risks:

- Model Risk: The strategy relies heavily on the assumption that the relationship between the two assets will hold in the future as it did in the past.

- Execution Risk: Slippage, transaction costs, and other market frictions can erode profitability, especially in high-frequency implementations.

- Liquidity Risk: If one or both assets in the pair become illiquid, it may be difficult to exit positions without significant loss.

# Introduction to cointegration:

Cointegration vs Correlation:

- Correlation measures the strength and direction of a linear relationship between two variables over a given period. It is a short-term metric that does not necessarily imply any long-term relationship.

- Cointegration captures the long-term equilibrium relationship between two series. This implies that they will converge again in the long term, even if the series diverges in the short term.

Formal Introduction - Cointegration P1:

- Let $X_t$ and $Y_t$ be two time series. Cointegration is defined in terms of integrated processes and stationarity.

- A time series $X_t$ is said to be integrated of order d:
  
    - $X_t \sim I(d)$

- If it becomes stationary after differencing d times.
  
    - If $X_t \sim I(1)$, $X_t$ is non-stationary, but the first difference $\Delta X_t = X_t - X_{t-1}$ is stationary.

- Most stock prices are assumed to be $I(1)$ processes.

Formal Introduction - Cointegration P2:

- Two time series $X_t$ and $Y_t$ are cointegrated if:
    
    - Each series is individually $I(1)$.
    
    - There is a linear combination that is $1(0)$.
        
        - $Z_t= X_t-\Beta Y_t$
    
    - This means that the spread $Z_t$ does not drift infinitely and instead reverses to a mean.
    
    - $\Beta$ is our secret weapon here, if we can estimate it correctly, then we're making money.

Cointegration Tests. Engle-Granger:

- The Engle-Granger two-step method is one of the simplest and most widely used techniques for testing cointegration between two time series. This process happens in two steps:

    - Estimate $\Beta$ using ordinary least squares:
        
        - $Y_t = \alpha + \Beta X_t + \epsilon_t$
        
    - Test if the residuals are stationary (Augmented Dickey-Fuller test):

        - P-Value: $0.00908$

    - If ADF confirms stationarity, the assets are cointegrated. This is the same approach we follow when analyzing the residuals of a Machine Learning model.

Spread: $Z_t=G_t - \Beta M_t$

How can we adjust the parameters dynamically?:

- First, we need to make them time-varying.
   
    - From:
        
        - $G_t = \alpha + \Beta M_t + \epsilon_t$

    - To:

        - $G_t = \alpha_t + \Beta_t M_t + \epsilon_t$

- Then, we'll need a model that can update itself in real-time:

    - We could fit another linear regression model using the whole dataset (reactive), or create a rolling linear regression (noisy vs slow).

    - A better idea: The Kalman Filter!

# Kalman Filter

What is a Kalman Filter?:

- It is a recursive Bayesian estimation algorithm that dynamically adjusts parameters as new data comes in.

- Unlike OLS, which gives a single fixed estimate, the Kalman Filter will:

    - Predict the next hedge ratio based on past values.

    - Update the estimate when new price data arrives.

    - Refines the uncertainty around its estimate over time.

- This model is just one example of an online learning algorithm!

How does it work?:

- The filter operates on a cycle of predicting and updating.

- Predict:

    - Estimate the next $\alpha$ and $\Beta$ using a system's known dynamics (state transition).

- Update:

    - When there is a new measurement, it will correct the original estimate balancing prior belief vs new market data.