## Calibrating the Ornstein-Uhlenbeck (Vasicek) model


### The OU Equation

The stochastic differentialequation (SDE) for the Ornstein-Uhlenbeck process is given by:

$$ {dS}_{t}= \lambda(\mu - {S}_{t}){dt} + \sigma{dW}_{t}$$

with $\lambda$ the mean reversion rate, $\mu$ the long-term mean, $\sigma$ the volatility

### Calibration using least squres regression 

The relationship between consecutive observation ${S}_{i}, {S}_{i+1}$ in linear with a iid normal random term $\epsilon$:

$$ {S}_{i+1} = {aS}_{i} + {b} + {\epsilon}$$

The relationship between the linear fit and model parameters is given by:

$$ {a} = {e}^{{-\lambda} {\delta}} $$

$$ {b} = {\mu}(1- {e}^{{-\lambda} {\delta}}) $$

$$ {sd}({\epsilon}) = {\sigma} \sqrt{\frac{1- {e}^{{-2\lambda} {\delta}}}{2\lambda}} $$

rewriting these equations gives,

$$ \lambda = - \frac{\ln{a}}{\delta} $$

$$ \mu = \frac{b}{1-a}$$

$$ \sigma = {sd}(\epsilon) \sqrt{\frac{-2\ln{a}}{\delta(1-a^2)}} $$

[source](https://www.statisticshowto.com/wp-content/uploads/2016/01/Calibrating-the-Ornstein.pdf) 

In [3]:
import pandas as pd 

df = pd.read_csv('sample_set.csv')

In [4]:
import statsmodels.api as sm
X = df['t1']
y = df['orig']
X = sm.add_constant(X)
results = sm.OLS(y,X).fit()
print (results.summary())

                            OLS Regression Results                            
Dep. Variable:                   orig   R-squared:                       0.905
Model:                            OLS   Adj. R-squared:                  0.905
Method:                 Least Squares   F-statistic:                 1.042e+05
Date:                Mon, 25 Jan 2021   Prob (F-statistic):               0.00
Time:                        09:59:57   Log-Likelihood:                 48802.
No. Observations:               10934   AIC:                        -9.760e+04
Df Residuals:                   10932   BIC:                        -9.759e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0003   3.41e-05    -10.231      0.0

  return ptp(axis=axis, out=out, **kwargs)


In [9]:
def OU_calibration(df, spread):
    A = spread[0]
    B = spread[1]
    df['spread'] = np.log((df[A] / df[B]))
    df['shift'] = df.spread.shift()
    df.loc[0, 'shift'] = 0
    X = df['shift']
    y = df['spread']
    X = sm.add_constant(X)
    results = sm.OLS(y,X).fit()
    order = A, B
    
    if results.params[0] < 0:
        df['spread'] = np.log((df[B] / df[A]))
        df['shift'] = df.spread.shift()
        df.loc[0, 'shift'] = 0
        X = df['shift']
        y = df['spread']
        X = sm.add_constant(X)
        results = sm.OLS(y,X).fit()
        order = B, A
        
    
    return results.params[0], results.params[1], results.bse[1], order

In [10]:
OU_calibration(df, 'CO1', 'CO2')


(0.00035012565107297875,
 0.9513944476006523,
 0.0029479606003722,
 ('CO2', 'CO1'))

In [11]:
results.summary()

0,1,2,3
Dep. Variable:,orig,R-squared:,0.905
Model:,OLS,Adj. R-squared:,0.905
Method:,Least Squares,F-statistic:,104200.0
Date:,"Mon, 25 Jan 2021",Prob (F-statistic):,0.0
Time:,14:35:03,Log-Likelihood:,48802.0
No. Observations:,10934,AIC:,-97600.0
Df Residuals:,10932,BIC:,-97590.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0003,3.41e-05,-10.231,0.000,-0.000,-0.000
t1,0.9515,0.003,322.855,0.000,0.946,0.957

0,1,2,3
Omnibus:,1407.574,Durbin-Watson:,1.973
Prob(Omnibus):,0.0,Jarque-Bera (JB):,17141.238
Skew:,-0.062,Prob(JB):,0.0
Kurtosis:,9.133,Cond. No.,111.0


In [13]:
results.params[1]

0.9514566855487407