In [1]:
import numpy as np
import statsmodels.api as sm
import pandas as pd
from scipy.linalg import toeplitz

In [2]:
df = pd.read_csv('market_price_autoregression_df.csv')

## Overview

This notebook provides an Ordinary Least Squares (OLS) estimate of a linear model for the market price parameters. The OLS estimate will provide a basis for considering the goodness of fit for the model, hypothesis testing for the estimate coefficients of each parameter, and a test for autocorrelation.

## Hypotheses

The main hypotheses involve the estimate coefficents on each parameter. The Null Hypothesis for each is that there is no effect of the parameter on `y`. Our hypotheses for each coefficient is that there is an effect that is greater than 0 - positive or negative.

Secondarily, we want to look at autocorrelation or the correlation between the residual/error terms in the model. Autocorrelation can bias our t-values and throw off our hypothesis testing. Fortunately, `statsmodels` provides the Durbin-Watson test for autocorrelation. The value is always between 0 and 4 where 2 indicates neither positive nor negative autocorrelation.

## Method

We use the Ordinary Least Squares (OLS) estimate of the linear model from the `statsmodel` library in Python. The input and output variables are summarized below.

In [6]:
input_vars = ['p', 
                'e_hat', 'e_star', 
                'cumsum_e_hat', 'cumsum_e_star', 
                'delta_e_hat', 'delta_e_star']
output_var = ['y']

input_data = df[input_vars]
input_data = sm.add_constant(input_data)
output_data = df[output_var]


### Summary Statistics

In [7]:
df.describe()

Unnamed: 0,p,p_star,p_hat,e_hat,e_star,cumsum_e_hat,cumsum_e_star,delta_e_hat,delta_e_star,y
count,651.0,651.0,651.0,651.0,651.0,651.0,651.0,651.0,651.0,651.0
mean,1.000402,1.0,1.000311,-9.1e-05,-0.000402,0.048712,-0.575856,-3e-06,-2.1e-05,1.000386
std,0.01353,0.0,0.009929,0.010085,0.01353,0.05848,0.500354,0.010654,0.010567,0.013516
min,0.942696,1.0,0.961951,-0.034673,-0.047816,-0.13404,-1.275038,-0.040274,-0.038972,0.942696
25%,0.993778,1.0,0.995523,-0.005684,-0.007697,0.012256,-0.899639,-0.006203,-0.006203,0.993778
50%,1.000268,1.0,1.001078,0.000537,-0.000268,0.045755,-0.795639,0.000211,7.9e-05,1.000268
75%,1.007697,1.0,1.006522,0.005245,0.006222,0.082637,-0.115707,0.005274,0.005233,1.007649
max,1.047816,1.0,1.026005,0.057322,0.057304,0.261073,0.402621,0.058177,0.058372,1.047816


## OLS Model Results

In [8]:
ols_model = sm.OLS(output_data, input_data)
results = ols_model.fit()
results.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.551
Model:,OLS,Adj. R-squared:,0.547
Method:,Least Squares,F-statistic:,131.6
Date:,"Mon, 21 Sep 2020",Prob (F-statistic):,2e-108
Time:,13:15:00,Log-Likelihood:,2139.1
No. Observations:,651,AIC:,-4264.0
Df Residuals:,644,BIC:,-4233.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.2055,0.042,4.931,0.000,0.124,0.287
p,0.7922,0.041,19.370,0.000,0.712,0.872
e_hat,0.6923,0.103,6.693,0.000,0.489,0.895
e_star,-0.5866,0.083,-7.104,0.000,-0.749,-0.424
cumsum_e_hat,0.0841,0.021,3.973,0.000,0.043,0.126
cumsum_e_star,0.0032,0.001,3.879,0.000,0.002,0.005
delta_e_hat,-0.3155,0.234,-1.349,0.178,-0.775,0.144
delta_e_star,0.4858,0.245,1.981,0.048,0.004,0.967

0,1,2,3
Omnibus:,81.285,Durbin-Watson:,2.029
Prob(Omnibus):,0.0,Jarque-Bera (JB):,459.796
Skew:,-0.373,Prob(JB):,1.43e-100
Kurtosis:,7.049,Cond. No.,4.1e+16


## Summary

- The OLS estimate shows an R-squared of 0.551, which suggests the model inputs explain about 55% of the variance. 

- The estimated coefficents show statistical significance at the 5% level for all but one of the parameters, which suggests we can reject the null hypothesis for those parameters. 

- Finally, the Durbin-Watson test of correlation between the residuals/error term shows a value close to 2, which suggests the autocorrelation is low enough to not correct.