# Kalman Filter Mean Reversion Strategy: EWA vs. EWC

<u>Goal:</u> Crear una estrategia de mean-reversion entre dos activos que puede que no esten cointegrados pero que fundamentalmente tiene sentido que durante un corto periodo de tiempo evolucione junto. 

Dinamicas del modelo: 


\begin{cases}
y_t = x_t \times \beta _t + \epsilon _t \\
\beta _t = \beta _ {t-1} + \omega _t
\end{cases}

- Lo diferente con respecto a una estrategia donde tu ratio de posicionamiento en los dos activos se basa unica y exclusivamente en una regresion lineal donde ves lo alejado que esta el precio en terminos de un Z-Score movil es que, por un lado tu $\beta$ no va a ser tan volatil entre un dia y otro ya que estas controlando la varianza de la variable, y ademas tu regresion lineal se va a basar en toda la informacion observada anteriormente. Tambien vamos a utilizar un <i>Z-Score modificado</i>, el moving average --> intercept de la regresion lineal (ver el intercept como la media del spread) y el moving standard deviation va a ser la varianza del error esperado entre el $y$ teorico y la observacion del precio, esto nos indica cuantas desviaciones estandar esta la observacion en cuanto a lo que podriamos esperar


## Update formulae

We observe price at instant $t$: 

$$ \hat{\beta}_{t|t} = \hat{\beta}_{t|t-1} + K_t (y_t - x_t \hat{\beta}_{t|t-1})$$

Using this we can have a formula of the forecast error between $\hat{\beta}_{t|t}$ and $\beta_t$

$$R_{t|t} = R_{t|t-1} - K_t x_t R_{t|t-1}$$

And then, __the forecast error at time $t$__
$$R_{t+1|t} = R_{t|t} + V_{\omega}$$


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# import statsmodels.formula.api as sm
import statsmodels.tsa.stattools as ts
import os

## Step 1: Retrieving data

We retrieve data for asset 1 and asset 2, here asset 1 will be EWA and asset 2 will be EWC. 
 __Daily prices for EWA and EWC__

In [4]:
df = pd.read_csv(r'/Users/educontreras/PycharmProjects/Quantitative-Finance/Algorithmic Trading - E. Chan/data/inputData_EWA_EWC.csv')
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m%d').dt.date  # remove HH:MM:SS
df.set_index('Date', inplace=True)

x = df['EWA']
y = df['EWC']

## Step 2: Manipulate data

We will add a constant to asset1 in order to accomodate possible offset/intercept in the regression.

__Now x has two columns: EWA price and 1s__

In [6]:
x = np.array(ts.add_constant(x))[:, [1, 0]]

array([[16.1 ,  1.  ],
       [15.98,  1.  ],
       [16.1 ,  1.  ],
       ...,
       [22.98,  1.  ],
       [23.12,  1.  ],
       [22.93,  1.  ]])

Then we have to define the $\delta$ that will be responsible for the variance of the error in the betas. 

- $\delta = 1$ allows fastest changes in $\beta$
- $\delta = 0$ allows no change --> Traditional linear regression

In [7]:
delta = 0.0001


Then we will define the vector where we will have the measurement prediction: 

$$\hat{y}_t = x_t \times \hat{\beta}_{t|t-1}$$

In [8]:
yhat = np.full(y.shape[0], np.nan)

We will define the rest of the variables: 
- R: Will contain the forecast error from t to t+1
- P: Will contain the expected estimation error in t knowing t measure
- $\beta$: dimension $2 \times N$ with $N$ the number of obervations.
- $V_{\omega}$: Error of hidden variable in updating time step. $2 \times 2$ matrix
- $V_{\epsilon}$: Measure error in linear regression. Real number

In [10]:
#Expected forecast error from t to t+1
R = np.zeros((2, 2))

#Expected forecast error in t once we have updated beta. Reduction of R by Kalman gain
P = R.copy()

#beta will contain the forecast in t for t+1. In this case is the same as Bt|t
beta = np.full((2, x.shape[0]), np.nan)
beta[:, 0] = 0
#Error at time updating
Vw = delta / (1 - delta) * np.eye(2)

#Linear regression white noise
Ve = 0.001
