# Chapter 11

## Assumptions for this chapter
1. {$x_{t1}, x_{t2}, ..., x_{tk}, y_t$, $t = 1, 2, ..., n$} is stationary, weakly dependent and follows the linear model $y_t = \beta_0 + \beta_1 x_{t1} + \beta_2 x_{t2} + ... \beta_k x_{tk} + u_t$ (where $u_t$ is the sequence of errors).
2. No perfect collinearity.
3. The explanatory variables are contemporaneously exogenous ($E(u_t | x_{t1}, ..., x_{tk}) = 0$). This is different than the previous chapter where exogenity was required for all periods.
4. The errors are contemporaneously homoskedastic ($Var(u_t | (x_{t1}, x_{t2}, ..., x_{tk})) = \sigma^2$). Again, contemporary instead of over all time periods.
5. No serial correlation (For all $t \neq s, E(u_t u_s | x_t, x_s) = 0$).

Normality is no longer assumed in this chapter. Weak dependence (correlation between $x_t, x_{t+h}$ go to $0$ "sufficiently quickly" as $h$ increases withou bound) is what we need for this to work.

In [1]:
import pandas as pd
import statsmodels.api as sm
import numpy as np

In [2]:
# Exercise 1
hseinv = pd.read_stata("stata/HSEINV.DTA")
hseinv.linvpc.corr(hseinv.linvpc.shift(1))

0.639124587909112

In [3]:
# We know from chapter 10 that we can use the residuals from a regression on the time trend to detrend
resids = sm.OLS(hseinv.linvpc, sm.add_constant(hseinv.t)).fit().resid
resids.corr(resids.shift(1))

0.4847401433757272

In [4]:
hseinv.lprice.corr(hseinv.lprice.shift(1))

0.9491585505000759

In [5]:
resids = sm.OLS(hseinv.lprice, sm.add_constant(hseinv.t)).fit().resid
resids.corr(resids.shift(1))

0.8215255620225328

In [6]:
X = sm.add_constant(hseinv[["gprice", "t"]])
sm.OLS(hseinv.linvpc, X, missing = "drop").fit().summary()

0,1,2,3
Dep. Variable:,linvpc,R-squared:,0.51
Model:,OLS,Adj. R-squared:,0.484
Method:,Least Squares,F-statistic:,19.77
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,1.31e-06
Time:,01:17:53,Log-Likelihood:,30.09
No. Observations:,41,AIC:,-54.18
Df Residuals:,38,BIC:,-49.04
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.8533,0.040,-21.177,0.000,-0.935,-0.772
gprice,3.8786,0.958,4.049,0.000,1.939,5.818
t,0.0080,0.002,5.038,0.000,0.005,0.011

0,1,2,3
Omnibus:,2.131,Durbin-Watson:,0.93
Prob(Omnibus):,0.345,Jarque-Bera (JB):,1.137
Skew:,-0.3,Prob(JB):,0.566
Kurtosis:,3.553,Cond. No.,1270.0


In [7]:
hseinv["linvpc_dt"] = sm.OLS(hseinv.linvpc, sm.add_constant(hseinv.t)).fit().resid
sm.OLS(hseinv.linvpc_dt, X, missing = "drop").fit().summary()

0,1,2,3
Dep. Variable:,linvpc_dt,R-squared:,0.303
Model:,OLS,Adj. R-squared:,0.266
Method:,Least Squares,F-statistic:,8.242
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,0.00106
Time:,01:17:53,Log-Likelihood:,30.09
No. Observations:,41,AIC:,-54.18
Df Residuals:,38,BIC:,-49.04
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0120,0.040,-0.297,0.768,-0.094,0.070
gprice,3.8786,0.958,4.049,0.000,1.939,5.818
t,-0.0001,0.002,-0.068,0.946,-0.003,0.003

0,1,2,3
Omnibus:,2.131,Durbin-Watson:,0.93
Prob(Omnibus):,0.345,Jarque-Bera (JB):,1.137
Skew:,-0.3,Prob(JB):,0.566
Kurtosis:,3.553,Cond. No.,1270.0


In [8]:
X = sm.add_constant(hseinv[["gprice", "t"]])
sm.OLS(hseinv.ginvpc, X, missing = "drop").fit().summary()

0,1,2,3
Dep. Variable:,ginvpc,R-squared:,0.047
Model:,OLS,Adj. R-squared:,-0.003
Method:,Least Squares,F-statistic:,0.9473
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,0.397
Time:,01:17:53,Log-Likelihood:,22.986
No. Observations:,41,AIC:,-39.97
Df Residuals:,38,BIC:,-34.83
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0059,0.048,0.124,0.902,-0.091,0.103
gprice,1.5665,1.139,1.375,0.177,-0.740,3.873
t,3.698e-05,0.002,0.019,0.985,-0.004,0.004

0,1,2,3
Omnibus:,1.185,Durbin-Watson:,1.765
Prob(Omnibus):,0.553,Jarque-Bera (JB):,0.463
Skew:,0.197,Prob(JB):,0.793
Kurtosis:,3.34,Cond. No.,1270.0


C1.i The autocorrelation for $log(invpc)$ is 0.64, and after detrending is 0.49. The autocorrelation for $log(price)$ is 0.95 and 0.82 after detrending. By the more conservative test ($\hat{\rho}$ > 0.8), both versions of $log(price)$ suggest a unit root. By the $\hat{\rho} > 0.9$ test, only the series without detrending has the unit root (though we should still be cautious around $log(price)$.

C1.ii Estimates are reported above. With a log-log model the coefficient is a measure of price elasticity. Specifically, housing investment per capita increases by 3.88% for each 1% increase in growth of housing prices.

C1.iii $R^2$ falls from 0.510 to 0.303.

C1.iv All of the significance of the model is lost and $R^2$ now falls to 0.047. The estimate for $\Delta log(price)$ has fallen substantially. The time trend is no longer significant, though this should not be surprising since differencing should eliminate the time trend.

In [9]:
# Exercise 2
earns = pd.read_stata("stata/EARNS.DTA")
X = sm.add_constant(earns[["goutphr", "goutph_1"]])
sm.OLS(earns.ghrwage, X, missing = "drop").fit().summary()

0,1,2,3
Dep. Variable:,ghrwage,R-squared:,0.493
Model:,OLS,Adj. R-squared:,0.465
Method:,Least Squares,F-statistic:,17.51
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,4.88e-06
Time:,01:17:53,Log-Likelihood:,108.63
No. Observations:,39,AIC:,-211.3
Df Residuals:,36,BIC:,-206.3
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0104,0.005,-2.294,0.028,-0.020,-0.001
goutphr,0.7284,0.167,4.356,0.000,0.389,1.068
goutph_1,0.4576,0.166,2.763,0.009,0.122,0.794

0,1,2,3
Omnibus:,1.236,Durbin-Watson:,1.187
Prob(Omnibus):,0.539,Jarque-Bera (JB):,0.639
Skew:,0.303,Prob(JB):,0.726
Kurtosis:,3.162,Cond. No.,75.0


In [10]:
earns["hint"] = earns.goutph_1 - earns.goutphr
X = sm.add_constant(earns[["goutphr", "hint"]])
model = sm.OLS(earns.ghrwage, X, missing = "drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,ghrwage,R-squared:,0.493
Model:,OLS,Adj. R-squared:,0.465
Method:,Least Squares,F-statistic:,17.51
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,4.88e-06
Time:,01:17:53,Log-Likelihood:,108.63
No. Observations:,39,AIC:,-211.3
Df Residuals:,36,BIC:,-206.3
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0104,0.005,-2.294,0.028,-0.020,-0.001
goutphr,1.1860,0.203,5.838,0.000,0.774,1.598
hint,0.4576,0.166,2.763,0.009,0.122,0.794

0,1,2,3
Omnibus:,1.236,Durbin-Watson:,1.187
Prob(Omnibus):,0.539,Jarque-Bera (JB):,0.639
Skew:,0.303,Prob(JB):,0.726
Kurtosis:,3.162,Cond. No.,95.0


In [11]:
print("t-statistic for for null of theta = 1:", (model.params[1] - 1) / model.bse[1])

t-statistic for for null of theta = 1: 0.9156135280692587


In [12]:
X = sm.add_constant(earns[["goutphr", "goutph_1", "goutph_2"]])
sm.OLS(earns.ghrwage, X, missing = "drop").fit().summary()

0,1,2,3
Dep. Variable:,ghrwage,R-squared:,0.515
Model:,OLS,Adj. R-squared:,0.472
Method:,Least Squares,F-statistic:,12.04
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,1.58e-05
Time:,01:17:53,Log-Likelihood:,107.97
No. Observations:,38,AIC:,-207.9
Df Residuals:,34,BIC:,-201.4
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.0113,0.005,-2.331,0.026,-0.021,-0.001
goutphr,0.7464,0.162,4.622,0.000,0.418,1.075
goutph_1,0.3740,0.167,2.246,0.031,0.036,0.712
goutph_2,0.0653,0.160,0.409,0.685,-0.259,0.390

0,1,2,3
Omnibus:,1.512,Durbin-Watson:,1.142
Prob(Omnibus):,0.47,Jarque-Bera (JB):,0.768
Skew:,0.324,Prob(JB):,0.681
Kurtosis:,3.256,Cond. No.,77.1


C2.i The lagged value is statistically significant at the 1% level

C2.ii Using the hint we estimate theta directly. The t-statistic is 0.92 and so we have no evidence to reject the null hypothesis that $\beta_1 + \beta_2 = 1$

C2.iii Estimating the model with the second lag produces a small coefficient and p-value of 0.685. There does not appear to be any reason to include the additional lag.

In [13]:
# Exercise 3
nyse = pd.read_stata("stata/NYSE.DTA")
nyse["return_1_sq"] = nyse.return_1 ** 2

X = sm.add_constant(nyse[["return_1", "return_1_sq"]])
model = sm.OLS(nyse["return"], X, missing = "drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,return,R-squared:,0.006
Model:,OLS,Adj. R-squared:,0.003
Method:,Least Squares,F-statistic:,2.16
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,0.116
Time:,01:17:53,Log-Likelihood:,-1490.3
No. Observations:,689,AIC:,2987.0
Df Residuals:,686,BIC:,3000.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.2255,0.087,2.586,0.010,0.054,0.397
return_1,0.0486,0.039,1.254,0.210,-0.027,0.125
return_1_sq,-0.0097,0.007,-1.385,0.167,-0.024,0.004

0,1,2,3
Omnibus:,100.674,Durbin-Watson:,1.988
Prob(Omnibus):,0.0,Jarque-Bera (JB):,577.296
Skew:,-0.493,Prob(JB):,4.3799999999999994e-126
Kurtosis:,7.375,Cond. No.,13.6


In [14]:
model.f_test("return_1 = return_1_sq = 0") # This is also reported in the summary

<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[2.16026656]]), p=0.11607808808035032, df_denom=686, df_num=2>

In [15]:
nyse["return_1Xreturn_2"] = nyse.return_1 * nyse.return_1.shift(1)

X = sm.add_constant(nyse[["return_1", "return_1Xreturn_2"]])
model = sm.OLS(nyse["return"], X, missing = "drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,return,R-squared:,0.005
Model:,OLS,Adj. R-squared:,0.002
Method:,Least Squares,F-statistic:,1.802
Date:,"Fri, 12 Mar 2021",Prob (F-statistic):,0.166
Time:,01:17:53,Log-Likelihood:,-1488.9
No. Observations:,688,AIC:,2984.0
Df Residuals:,685,BIC:,2997.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.1732,0.081,2.139,0.033,0.014,0.332
return_1,0.0687,0.039,1.751,0.080,-0.008,0.146
return_1Xreturn_2,0.0113,0.010,1.132,0.258,-0.008,0.031

0,1,2,3
Omnibus:,119.326,Durbin-Watson:,1.996
Prob(Omnibus):,0.0,Jarque-Bera (JB):,672.91
Skew:,-0.637,Prob(JB):,7.579999999999999e-147
Kurtosis:,7.675,Cond. No.,8.36


C3.i Results reported above

C3.ii The F-test is reported in the summary but directly carried out in the following cell. The F-test is not significant at the 10% level and so we have no evidence that the expected value of $return_t$ depends on $return_{t-1}$ (specifically, if both $\beta_1$ and $\beta_2$ are equal to zero, then the expectation does not depend on the the lagged term. One concern is that the null assumes the values are 0, and so we are assuming that $E(return_t | return_{t-1})$ does not depend on $return_{t-1}$).

C3.iii The test is similar as before. The individual parameters are not significant and the F-test is not significant, even at the 10% level. Again, we fail to reject the null that the coefficients are anything other than zero, and so past returns do not appear to affect present returns.

C3.iv From these models we find no evidence that weekly stock returns can be predicted based on past stock returns. The $R^2$ is small, and the F-tests did not provide us any reason to think previous returns had any effect on present returns.