In [1]:
import numpy as np
import statsmodels.api as sm
import pandas as pd
from scipy.linalg import toeplitz

In [2]:
df = pd.read_csv('market_price_autoregression_df.csv')

## Overview

This notebook provides an Ordinary Least Squares (OLS) estimate of a linear model for the market price parameters. The OLS estimate will provide a basis for considering the goodness of fit for the model, hypothesis testing for the estimate coefficients of each parameter, and a test for autocorrelation.

## Hypotheses

The main hypotheses involve the estimate coefficents on each parameter. The Null Hypothesis for each is that there is no effect of the parameter on `y`. Our hypotheses for each coefficient is that there is an effect that is greater than 0 - positive or negative.

Secondarily, we want to look at autocorrelation or the correlation between the residual/error terms in the model. Autocorrelation can bias our t-values and throw off our hypothesis testing. Fortunately, `statsmodels` provides the Durbin-Watson test for autocorrelation. The value is always between 0 and 4 where 2 indicates neither positive nor negative autocorrelation.

## Method

We use the Ordinary Least Squares (OLS) estimate of the linear model from the `statsmodel` library in Python. The input and output variables are summarized below.

In [3]:
input_vars = ['p', 
                'e_hat', 'e_star', 
                'cumsum_e_hat', 'cumsum_e_star', 
                'delta_e_hat', 'delta_e_star']
output_var = ['y']

input_data = df[input_vars]
output_data = df[output_var]
output_data = sm.add_constant(output_data)

### Summary Statistics

In [4]:
df.describe()

Unnamed: 0,p,p_star,p_hat,e_hat,e_star,cumsum_e_hat,cumsum_e_star,delta_e_hat,delta_e_star,y
count,651.0,651.0,651.0,651.0,651.0,651.0,651.0,651.0,651.0,651.0
mean,1.000402,1.0,1.000311,-9.1e-05,-0.000402,0.048712,-0.575856,-3e-06,-2.1e-05,1.000386
std,0.01353,0.0,0.009929,0.010085,0.01353,0.05848,0.500354,0.010654,0.010567,0.013516
min,0.942696,1.0,0.961951,-0.034673,-0.047816,-0.13404,-1.275038,-0.040274,-0.038972,0.942696
25%,0.993778,1.0,0.995523,-0.005684,-0.007697,0.012256,-0.899639,-0.006203,-0.006203,0.993778
50%,1.000268,1.0,1.001078,0.000537,-0.000268,0.045755,-0.795639,0.000211,7.9e-05,1.000268
75%,1.007697,1.0,1.006522,0.005245,0.006222,0.082637,-0.115707,0.005274,0.005233,1.007649
max,1.047816,1.0,1.026005,0.057322,0.057304,0.261073,0.402621,0.058177,0.058372,1.047816


## OLS Model Results

In [5]:
ols_model = sm.OLS(output_data, input_data)
results = ols_model.fit()
results.summary()

ValueError: shapes (651,2) and (651,2) not aligned: 2 (dim 1) != 651 (dim 0)

## Summary

- The OLS estimate shows an R-squared of 0.551, which suggests the model inputs explain about 55% of the variance. 

- The estimated coefficents show statistical significance at the 5% level for all but one of the parameters, which suggests we can reject the null hypothesis for those parameters. 

- Finally, the Durbin-Watson test of correlation between the residuals/error term shows a value close to 2, which suggests the autocorrelation is low enough to not correct.