# Ornstein-Uhlenbeck Process

## 1. Methodology

In this notebook, we attempt at fitting the Ornstein-Uhlenbeck process to the given dataset. The fit is performed using Maximum Likelihood Estimation (MLE), a coarse description of this method is that given a density family, it attempts at fitting a parametrized version of a specific density. As such, it is a parametric method, highly prone to faulty assumptions. A major benefit however is the extractive nature of this method; if assumed the data correctly follows a specific process (which can be further tested using hypothesis testing), the obtained process/density will describe the data set completely, i.e. an analytical expression can be derived for the dataset. Further, assuming the data is i.i.d., the MLE method will obtain a consistent, invariant and efficient estimate.  

The main "routine" for MLE can be described as follows: 

- Choose a distribution family, such as the Gaussian distribution, described by 2 parameters $\theta = (\mu, \sigma)$, $f(x) = \frac{1}{\sigma \sqrt{2\pi}} \exp ({- \frac{1}{2}(\frac{x - \mu}{\sigma})^2 )}$
- Given data points $x_1, \ldots, x_n$, consider the joint density $\mathcal{L}_n(\theta) = \mathcal{L}_n(\theta, \textbf{x}) = f_n(\textbf{x}, \theta) = \prod_{i = 1}^{n} f_i(x_i, \theta) $
- Solve the optimization problem as $\theta^{*} = \argmax_{\theta} \mathcal{L}_n(\theta, \textbf{x})$ 

The fitted distribution is as such $\theta^{*} = (\mu^{*}, \sigma^{*}) = f(x) = \frac{1}{\sigma^{*} \sqrt{2\pi}} \exp ({- \frac{1}{2}(\frac{x - \mu^{*}}{\sigma^{*}}})^2)$

## 2. Implementation

In this notebook, and likewise for the subsequent, the MLE routine is used as above. The implementation is described in more detail in the README.md, but to give a brief overview, the code can be splitted up to 
- A folder fit, which describes the statistical aspect, i.e. the density, the likelihood and maximum likelihood estimation
- A folder models, which describes the stochastic processes such as Ornstein-Uhlenbeck and CIR
- A folder optm, which deals with the optimization aspect of the MLE estimation, using scipy
- A folder sim to simulate each step of the obtained process 

## 3. Code

We import all necessary libraries, the work flow is fairly simple, we
- Read the data and strip it accordingly to prepare it for the fit
- We initialize the parameters with a guess to prepare the optimization
- We initialize the Ornstein Uhlenbeck process 
- We initialize the MLE method to implement the optimization

In [7]:
import numpy as np

In [6]:
from ing.models.ou import OrnsteinUhlenbeck
from ing.fit.transition_density import ExactDensity
from ing.fit.mle_estimator import MLE

from ing.utils.data_utils import read_excel_to_series, strip_data

In [2]:
FILE_PATH = "../data/spread.xlsx"
COLUMN = "Spread"

In [3]:
df = read_excel_to_series(file_path=FILE_PATH)
df

Unnamed: 0.1,Unnamed: 0,Spread
0,0,20.380000
1,1,20.400000
2,2,20.412500
3,3,20.418125
4,4,20.428125
...,...,...
29177,29177,22.083125
29178,29178,22.073125
29179,29179,22.087500
29180,29180,22.133125


In [33]:
spread = strip_data(df=df, column=COLUMN)
spread

array([20.38    , 20.4     , 20.4125  , ..., 22.0875  , 22.133125,
       22.103125])

Below, the guess and parameter boundaries 

In [70]:
param_bounds = [(0.001, 50), (0.001, 50), (0.001, 50)]
guess = np.array([0.001, 0.001, 0.002])

We set the time discretization accordingly, to allocate for the 5-minute data (using the fact that there are 252 trading days, 12 5-minute intervals per hour and 8 hours in a trading day)

In [71]:
dt = 1.0 / (252 * 12 * 8)
model = OrnsteinUhlenbeck()

Below, the MLE is prepared, which we pipe in to all of the above

In [72]:
exact_est = MLE(
    sample=spread, param_bounds=param_bounds, dt=dt, density=ExactDensity(model=model)
).estimate_params(guess)

Initial Params: [0.001 0.001 0.002]
Initial Likelihood: -1897499.1568547834
`xtol` termination condition is satisfied.
Number of iterations: 123, function evaluations: 448, CG iterations: 238, optimality: 2.35e-04, constraint violation: 0.00e+00, execution time: 0.38 s.
Final Params: [23.21632412 21.82021766  8.07299293]
Final Likelihood: 44653.51559948583


In [73]:
params = exact_est.params
alpha, kappa, sigma = params[0], params[1], params[2]
alpha, kappa, sigma

(23.21632411912182, 21.82021765656413, 8.072992925731628)

Based on the result of the MLE, we assume the spread can be fitted as an Ornstein Uhlenbeck with parameters 
$$ dS_t = \alpha(\kappa - S_t)dt + \sigma dW_t $$
$$ dS_t = 23.22(21.82 - S_t)dt + 8.07dW_t $$ 

We can simulate this process using the Euler-Maruyama scheme, the most elementary discretization scheme as follows 
$$ S_{t_{n + 1}} = S_{t_{n}} + \alpha (\kappa - S_{t_{n}}) \Delta t  + \sigma \sqrt{\Delta t} * \Delta W_n  $$ 

The discretization scheme is provided in scheme.py, with accompanying simulation in simulation.py 

In [74]:
model_fit = OrnsteinUhlenbeck()
model_fit.params = params

<ing.models.ou.OrnsteinUhlenbeck at 0x1666d2310>