# Chapter 10

## Assumptions for this chapter

1. Linear in parameters (same as for cross-sections)
2. No perfect collinearity (again as with cross-sections)
3. Zero conditional mean. $E(u_t|X)=0$ for $t = 1, 2, \dots, n$

For unbiasedness. Note that Assumption 3 is a strong assumption since cross-sections are contemporaneously exogenous but assumption three requires that the error be uncorrelated with the independent variables in all time periods (including past and future. A policy that is informed by observations made today would violate this assumption and since it is in the future could not be controlled for).

Statements about variation require two additional assumptions:

4. Homoskedasticity
5. No serial correlation $Corr(u_t, u_s | X) = 0$ for $t \neq s$

Assumption 5 was not required in the cross-sectional case since we assumed a random sample. If assumptions 1-5 are satisfied, OLS will have the same finite sample properties as in the cross-sectional case. Inference requires a normality assumption (errors $u_t$ are independent and identially distributed as Normal with mean 0 and variance $\sigma^2$).

In [1]:
import pandas as pd
import statsmodels.api as sm
import numpy as np

In [2]:
# Exercise 1
intdef = pd.read_stata("stata/intdef.dta")
intdef["aft_1979"] = (intdef.year > 1979).astype("int32")

y = intdef.i3
X = sm.add_constant(intdef[["inf", "def", "aft_1979"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                     i3   R-squared:                       0.664
Model:                            OLS   Adj. R-squared:                  0.644
Method:                 Least Squares   F-statistic:                     34.18
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           2.41e-12
Time:                        01:59:58   Log-Likelihood:                -107.46
No. Observations:                  56   AIC:                             222.9
Df Residuals:                      52   BIC:                             231.0
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.2962      0.425      3.047      0.0

C1. Adding a dummy variable for observations after 1979 produces a large and statistically significant estimate. This implies that after 1979 there was a roughly $1 \frac{1}{2}$ percent increase in the interest rate on three-month T-bills, even accounting for changes in inflation or the deficit. It is also worth noting that the coefficient for inflation remains the same but the effect of the deficit is muted with the inclusion of the dummy.

In [3]:
# Exercise 2
barium = pd.read_stata("stata/BARIUM.DTA")

y = barium.lchnimp
X = sm.add_constant(barium[["lchempi", "lgas", "lrtwex", "befile6", "affile6", "afdec6", "t"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                lchnimp   R-squared:                       0.362
Model:                            OLS   Adj. R-squared:                  0.325
Method:                 Least Squares   F-statistic:                     9.951
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           8.36e-10
Time:                        01:59:58   Log-Likelihood:                -109.21
No. Observations:                 131   AIC:                             234.4
Df Residuals:                     123   BIC:                             257.4
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -2.3675     20.782     -0.114      0.9

In [4]:
model.f_test("lchempi = lgas = lrtwex = befile6 = affile6 = afdec6 = 0")

<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[0.54024983]]), p=0.7767277194928033, df_denom=123, df_num=6>

In [5]:
X = sm.add_constant(barium[["lchempi", "lgas", "lrtwex", "befile6", "affile6", "afdec6", "t", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)
print(model.f_test("lchempi = lgas = lrtwex = befile6 = affile6 = afdec6 = 0"))
print(model.f_test("feb = mar = apr = may = jun = jul = aug = sep = oct = nov = dec = 0"))

                            OLS Regression Results                            
Dep. Variable:                lchnimp   R-squared:                       0.411
Model:                            OLS   Adj. R-squared:                  0.316
Method:                 Least Squares   F-statistic:                     4.334
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           6.19e-07
Time:                        01:59:59   Log-Likelihood:                -103.98
No. Observations:                 131   AIC:                             246.0
Df Residuals:                     112   BIC:                             300.6
Df Model:                          18                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         27.3001     31.397      0.870      0.3

C2.i The inclusion of the time trend ($t$) removes the statistical significance from all of the variables.

C2.ii The F-test produces an F-statistic of about 0.54 with a p-value of about 0.78. There is no evidence to suggest these variables are jointly significant.

C2.iii The addition of monthly variables does not produce any statistical significance (except for the linear time trend which was already significant). The coefficients have changed, notably sign changes for gas and the exchange rate, though we do not have evidence that these coefficients are any different from zero. A test for joint significance produces similar results as before with a p-value of about 0.79. The month dummies are also not jointly significant (F-statistic 0.85, p-value 0.59).

In [6]:
# Exercise 3
prminwage = pd.read_stata("stata/PRMINWGE.DTA")

y = prminwage.lprepop
X = sm.add_constant(prminwage[["lmincov", "lusgnp", "t", "lprgnp"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                lprepop   R-squared:                       0.889
Model:                            OLS   Adj. R-squared:                  0.876
Method:                 Least Squares   F-statistic:                     66.23
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           2.68e-15
Time:                        01:59:59   Log-Likelihood:                 78.659
No. Observations:                  38   AIC:                            -147.3
Df Residuals:                      33   BIC:                            -139.1
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -6.6634      1.258     -5.298      0.0

C3. The log of Puerto Rico's GNP is large and statistically significant when added into the equation. The dependent variable is a log and so we say the elasticity of the employment/population ratio with respect to Puerto Rico's GNP is 0.29. The magnitude of the minimum wage effect has increased (in absolute terms), moving from -0.169 to -0.212.

In [7]:
# Exercise 4
fertil3 = pd.read_stata("stata/FERTIL3.DTA")
fertil3_clean = fertil3[["gfr", "pe", "pe_1", "pe_2", "ww2", "pill"]].dropna()
fertil3_clean["pe1-pe"] = fertil3_clean.pe_1 - fertil3_clean.pe
fertil3_clean["pe2-pe"] = fertil3_clean.pe_2 - fertil3_clean.pe

y = fertil3_clean.gfr
X = sm.add_constant(fertil3_clean[["pe", "pe1-pe", "pe2-pe", "ww2", "pill"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                    gfr   R-squared:                       0.499
Model:                            OLS   Adj. R-squared:                  0.459
Method:                 Least Squares   F-statistic:                     12.73
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           1.35e-08
Time:                        01:59:59   Log-Likelihood:                -282.26
No. Observations:                  70   AIC:                             576.5
Df Residuals:                      64   BIC:                             590.0
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         95.8705      3.282     29.211      0.0

C4. Using the method described in example 10.4 we build the regression above. The coefficient on $pe$ is 0.101, the same as written in the example. The standard error of $pe$ is 0.030, the result we needed to find.

In [8]:
# Exercise 5
ezanders = pd.read_stata("stata/EZANDERS.DTA")
ezanders_clean = ezanders[["luclms", "ez", "year", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]].dropna()

y = ezanders_clean.luclms
X = sm.add_constant(ezanders_clean[["year", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)
print(model.f_test("feb = mar = apr = may = jun = jul = aug = sep = oct = nov = dec = 0"))

                            OLS Regression Results                            
Dep. Variable:                 luclms   R-squared:                       0.647
Model:                            OLS   Adj. R-squared:                  0.602
Method:                 Least Squares   F-statistic:                     14.36
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           1.73e-16
Time:                        01:59:59   Log-Likelihood:                -45.808
No. Observations:                 107   AIC:                             117.6
Df Residuals:                      94   BIC:                             152.4
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        339.4264     29.662     11.443      0.0

In [9]:
X = sm.add_constant(ezanders_clean[["ez", "year", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)
print("The estimated (percentage) fall after ez designation is: ", 100 * (1 - np.exp(model.params.ez)))

                            OLS Regression Results                            
Dep. Variable:                 luclms   R-squared:                       0.688
Model:                            OLS   Adj. R-squared:                  0.644
Method:                 Least Squares   F-statistic:                     15.76
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           3.08e-18
Time:                        01:59:59   Log-Likelihood:                -39.232
No. Observations:                 107   AIC:                             106.5
Df Residuals:                      93   BIC:                             143.9
Df Model:                          13                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        170.2854     56.022      3.040      0.0

C5.i The coefficient on the linear time trend implies that monthly unemployment claims fell 17% a month! This is significant but this is an incredible number suggesting something went wrong. The F-test on the joint significance of the monthly dummies implies seasonality.

C5.ii $ez$'s coefficient is -0.508. Using the formula the decrease is 39.8%

C5.iii If we want to attribute the fall to the creation of an enterprise zone we would have to assume that time trends and seasonality are the only other factors affecting unemployment.

In [10]:
# Exercise 6
fertil3 = pd.read_stata("stata/FERTIL3.DTA")
fertil3_clean = fertil3[["gfr", "t", "tsq", "tcu", "pe", "ww2", "pill"]].dropna()

y = fertil3_clean.gfr
X = sm.add_constant(fertil3_clean[["t", "tsq"]])
model = sm.OLS(y, X).fit()
fertil3_clean["resid"] = model.resid

In [11]:
y = fertil3_clean.resid
X = sm.add_constant(fertil3_clean[["pe", "ww2", "pill", "t", "tsq"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                  resid   R-squared:                       0.602
Model:                            OLS   Adj. R-squared:                  0.571
Method:                 Least Squares   F-statistic:                     19.92
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           4.72e-12
Time:                        01:59:59   Log-Likelihood:                -269.95
No. Observations:                  72   AIC:                             551.9
Df Residuals:                      66   BIC:                             565.6
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         17.0357      4.361      3.907      0.0

In [12]:
y = fertil3_clean.gfr
X = sm.add_constant(fertil3_clean[["pe", "ww2", "pill", "t", "tsq", "tcu"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                    gfr   R-squared:                       0.840
Model:                            OLS   Adj. R-squared:                  0.826
Method:                 Least Squares   F-statistic:                     57.07
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           5.11e-24
Time:                        01:59:59   Log-Likelihood:                -250.57
No. Observations:                  72   AIC:                             515.1
Df Residuals:                      65   BIC:                             531.1
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const        142.7955      4.338     32.919      0.0

C6.ii The $R^2$ of this regression is 0.602 compared to the 0.727 from the original regression. The detrended regression explains quite a bit of the variation, but less than the original.

C6.iii The coefficient for the cubed term is small (negative) but statistically significant.

In [13]:
# Exercise 7
consump = pd.read_stata("stata/consump.dta")
consump_clean = consump[["gc", "gy"]].dropna()

y = consump_clean.gc
X = sm.add_constant(consump_clean[["gy"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                     gc   R-squared:                       0.679
Model:                            OLS   Adj. R-squared:                  0.669
Method:                 Least Squares   F-statistic:                     71.81
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           6.75e-10
Time:                        02:00:00   Log-Likelihood:                 127.22
No. Observations:                  36   AIC:                            -250.4
Df Residuals:                      34   BIC:                            -247.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0081      0.002      4.254      0.0

In [14]:
consump_clean = consump[["gc", "gy", "gy_1", "r3"]].dropna()

y = consump_clean.gc
X = sm.add_constant(consump_clean[["gy", "gy_1"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                     gc   R-squared:                       0.695
Model:                            OLS   Adj. R-squared:                  0.676
Method:                 Least Squares   F-statistic:                     36.51
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           5.52e-09
Time:                        02:00:00   Log-Likelihood:                 124.26
No. Observations:                  35   AIC:                            -242.5
Df Residuals:                      32   BIC:                            -237.9
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0064      0.002      2.811      0.0

In [15]:
y = consump_clean.gc
X = sm.add_constant(consump_clean[["gy", "gy_1", "r3"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                     gc   R-squared:                       0.696
Model:                            OLS   Adj. R-squared:                  0.666
Method:                 Least Squares   F-statistic:                     23.62
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           3.77e-08
Time:                        02:00:00   Log-Likelihood:                 124.28
No. Observations:                  35   AIC:                            -240.6
Df Residuals:                      31   BIC:                            -234.3
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0065      0.002      2.740      0.0

C7.i The simple regression implies that a 1% increase in income growth is associated with a .57% increase in consumption growth. The value is statistically significant.

C7.ii Adding the lagged term does not produce an estimate statistically different from 0. We don't have evidence that past changes in income affect present day consumption.

C7.iii Similar to (ii), the addition is not statistically significant, though the coefficient is also very small in practical terms too. We have little evidence to suggest that past changes in income affect present day consumption and even less to suggest that the real interest rate changes change it.

In [16]:
# Exercise 8
fertil3 = pd.read_stata("stata/FERTIL3.DTA")
fertil3_clean = fertil3[["gfr", "pe", "pe_1", "pe_2", "pe_3", "pe_4", "ww2", "pill"]].dropna()

y = fertil3_clean.gfr
X = sm.add_constant(fertil3_clean[["pe", "pe_1", "pe_2", "pe_3", "pe_4", "ww2", "pill"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                    gfr   R-squared:                       0.537
Model:                            OLS   Adj. R-squared:                  0.483
Method:                 Least Squares   F-statistic:                     9.934
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           3.63e-08
Time:                        02:00:00   Log-Likelihood:                -270.07
No. Observations:                  68   AIC:                             556.1
Df Residuals:                      60   BIC:                             573.9
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         92.5016      3.325     27.816      0.0

In [17]:
model.f_test("pe_3 = pe_4 = 0")

<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[0.06221168]]), p=0.9397444862854532, df_denom=60, df_num=2>

In [18]:
fertil3_clean["pe1-pe"] = fertil3_clean.pe_1 - fertil3_clean.pe
fertil3_clean["pe2-pe"] = fertil3_clean.pe_2 - fertil3_clean.pe
fertil3_clean["pe3-pe"] = fertil3_clean.pe_3 - fertil3_clean.pe
fertil3_clean["pe4-pe"] = fertil3_clean.pe_4 - fertil3_clean.pe

X = sm.add_constant(fertil3_clean[["pe", "pe1-pe", "pe2-pe", "pe3-pe", "pe4-pe", "ww2", "pill"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                    gfr   R-squared:                       0.537
Model:                            OLS   Adj. R-squared:                  0.483
Method:                 Least Squares   F-statistic:                     9.934
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           3.63e-08
Time:                        02:00:00   Log-Likelihood:                -270.07
No. Observations:                  68   AIC:                             556.1
Df Residuals:                      60   BIC:                             573.9
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         92.5016      3.325     27.816      0.0

In [19]:
fertil3_clean["z0"] = fertil3_clean.pe + fertil3_clean.pe_1 + fertil3_clean.pe_2 + fertil3_clean.pe_3 + fertil3_clean.pe_4
fertil3_clean["z1"] = fertil3_clean.pe_1 + (2 * fertil3_clean.pe_2) + (3 * fertil3_clean.pe_3) + (4 * fertil3_clean.pe_4)
fertil3_clean["z2"] = fertil3_clean.pe_1 + (4 * fertil3_clean.pe_2) + (9 * fertil3_clean.pe_3) + (16 * fertil3_clean.pe_4)

X = sm.add_constant(fertil3_clean[["z0", "z1", "z2", "ww2", "pill"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                    gfr   R-squared:                       0.536
Model:                            OLS   Adj. R-squared:                  0.499
Method:                 Least Squares   F-statistic:                     14.35
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           2.52e-09
Time:                        02:00:00   Log-Likelihood:                -270.10
No. Observations:                  68   AIC:                             552.2
Df Residuals:                      62   BIC:                             565.5
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         92.5606      3.264     28.360      0.0

In [20]:
d0 = model.params["z0"]
d1 = model.params["z0"] + model.params["z1"] + model.params["z2"]
d2 = model.params["z0"] + (2 * model.params["z1"]) + (4 * model.params["z2"])
d3 = model.params["z0"] + (3 * model.params["z1"]) + (9 * model.params["z2"])
d4 = model.params["z0"] + (4 * model.params["z1"]) + (16 * model.params["z2"])

LRP = sum([d0, d1, d2, d3, d4])
print("Estimated LRP is", LRP)

Estimated LRP is 0.12369571208970734


C8.i The F-Test has a p-value of over 0.9.  We can't say the additional lags are jointly significant.

C8.ii From the regression we get an estimate of 0.1242 with a standard error of 0.03. This is higher than 0.101 from equation 10.19.

C8.iii The estimated LRP from the polynomial distributed lag model is 0.1237, which is only 0.

In [21]:
# Exercise 9
volat = pd.read_stata("stata/VOLAT.DTA")
volat_clean = volat[["rsp500", "pcip", "i3"]].dropna()

y = volat_clean.rsp500
X = sm.add_constant(volat_clean[["pcip", "i3"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                 rsp500   R-squared:                       0.012
Model:                            OLS   Adj. R-squared:                  0.008
Method:                 Least Squares   F-statistic:                     3.334
Date:                Mon, 31 Aug 2020   Prob (F-statistic):             0.0364
Time:                        02:00:00   Log-Likelihood:                -2845.4
No. Observations:                 557   AIC:                             5697.
Df Residuals:                     554   BIC:                             5710.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         18.8431      3.275      5.754      0.0

C9.i We'd expect $\beta_1$ to be greater than 0 since more production should mean greater returns (profits, dividends), while $\beta_2$ should be negative since alternative investments like bonds become relatively more attractive.

C9.ii The coefficient for $pcip$ is small but implies that each percent increase of production increases stock market returns by 0.036 percent. The coefficient for $i3$ says that a 1 percent increase in the interest rate decreases returns by 1.36 percent. 

C9.iii Only $i3$ is statistically significant.

C9.iv This doesn't really imply the S&P 500 is predictable. The variables are contemporaneous and so predicting the S&P 500 requires us to predict interest rates, which isn't any easier.

In [22]:
# Exercise 10
intdef = pd.read_stata("stata/intdef.dta")

intdef["inf"].corr(intdef["def"])

0.0974705908843133

In [23]:
intdef_clean = intdef[["i3", "inf", "def", "inf_1", "def_1"]].dropna()

y = intdef_clean.i3
X = sm.add_constant(intdef_clean[["inf", "def", "inf_1", "def_1"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)
print("The estimated LRP is:", model.params["inf"] + model.params["inf_1"])
print(model.f_test("inf_1 = def_1 = 0"))

                            OLS Regression Results                            
Dep. Variable:                     i3   R-squared:                       0.685
Model:                            OLS   Adj. R-squared:                  0.660
Method:                 Least Squares   F-statistic:                     27.18
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           5.19e-12
Time:                        02:00:00   Log-Likelihood:                -103.28
No. Observations:                  55   AIC:                             216.6
Df Residuals:                      50   BIC:                             226.6
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.6115      0.401      4.021      0.0

C10.i The correlation between $inf$ and $def$ is 0.098, suggesting a weak -- if any -- correlation between inflation and the deficit rate (surprising).

C10.ii The results are reported above.

C10.iii The estimated LRP is 0.725 compared to the 0.606 from 10.15. Higher, but not remarkably so.

C10.iv The two lag terms are jointly significant at the 1% (let alone the 5% level).

In [24]:
# Exercise 11
traffic2 = pd.read_stata("stata/TRAFFIC2.DTA")
traffic2.loc[traffic2.beltlaw == 1, ["year", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]].head()

Unnamed: 0,year,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec
60,1986,0,0,0,0,0,0,0,0,0,0,0
61,1986,1,0,0,0,0,0,0,0,0,0,0
62,1986,0,1,0,0,0,0,0,0,0,0,0
63,1986,0,0,1,0,0,0,0,0,0,0,0
64,1986,0,0,0,1,0,0,0,0,0,0,0


In [25]:
traffic2.loc[traffic2.spdlaw == 1, ["year", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]].head()

Unnamed: 0,year,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec
76,1987,0,0,0,1,0,0,0,0,0,0,0
77,1987,0,0,0,0,1,0,0,0,0,0,0
78,1987,0,0,0,0,0,1,0,0,0,0,0
79,1987,0,0,0,0,0,0,1,0,0,0,0
80,1987,0,0,0,0,0,0,0,1,0,0,0


In [26]:
traffic2_clean = traffic2[["ltotacc", "t", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]].dropna()

y = traffic2_clean.ltotacc
X = sm.add_constant(traffic2_clean[["t", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)
print(model.f_test("feb = mar = apr = may = jun = jul = aug = sep = oct = nov = dec = 0"))

                            OLS Regression Results                            
Dep. Variable:                ltotacc   R-squared:                       0.810
Model:                            OLS   Adj. R-squared:                  0.783
Method:                 Least Squares   F-statistic:                     30.83
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           2.99e-26
Time:                        02:00:01   Log-Likelihood:                 163.53
No. Observations:                 100   AIC:                            -301.1
Df Residuals:                      87   BIC:                            -267.2
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         10.4555      0.019    552.549      0.0

In [27]:
traffic2_clean = traffic2[["ltotacc", "wkends", "unem", "spdlaw", "beltlaw", "t", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]].dropna()


y = traffic2_clean.ltotacc
X = sm.add_constant(traffic2_clean[["wkends", "unem", "spdlaw", "beltlaw", "t", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                ltotacc   R-squared:                       0.917
Model:                            OLS   Adj. R-squared:                  0.901
Method:                 Least Squares   F-statistic:                     57.26
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           6.30e-38
Time:                        02:00:01   Log-Likelihood:                 205.00
No. Observations:                 100   AIC:                            -376.0
Df Residuals:                      83   BIC:                            -331.7
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         10.6186      0.065    164.351      0.0

In [28]:
traffic2.prcfat.mean()

0.8856360912322998

In [29]:
traffic2_clean = traffic2[["prcfat", "wkends", "unem", "spdlaw", "beltlaw", "t", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]].dropna()


y = traffic2_clean.prcfat
X = sm.add_constant(traffic2_clean[["wkends", "unem", "spdlaw", "beltlaw", "t", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                 prcfat   R-squared:                       0.732
Model:                            OLS   Adj. R-squared:                  0.680
Method:                 Least Squares   F-statistic:                     14.14
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           1.88e-17
Time:                        02:00:01   Log-Likelihood:                 153.25
No. Observations:                 100   AIC:                            -272.5
Df Residuals:                      83   BIC:                            -228.2
Df Model:                          16                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.0477      0.108      9.665      0.0

C11.i California's seat belt law took effect Jan 1986. The highway speed limit changed May 1987.

C11.ii The coefficient for t tells us that total accidents increase by about 0.3% each month. We have evidence of seasonality, the test of joint significance is significant at the 1% level.

C11.iii Unemployment is negative and significant. This doesn't seem out of place since unemployment is likely to mean less driving.

C11.iv) The two coefficients seem to be the opposite of what we'd expect. Increased speeds are associated with a reduction in accidents while the seatbelt law is associated with an increase in accidents. If it can't be attributed to some other associated change a possible explanation would be that mandatory safety has people acting more carelessly, while a speed increase forces people to pay closer attention (though this isn't the most convincing account).

C11.v The average of $prcfat$ is 0.89. On average less than 1% of traffic accidents produce a fatality. I don't have a strong prior on what this should be, but given that traffic accidents are likely to involve a large number of scenarios, a low number makes sense.

C11.vi The coefficients for the two laws make more sense now (speed increases fatalities and seatbelts decrease fatalities), though the seatbelt law is not statistically significant.

In [30]:
# Exercise 12
phillips = pd.read_stata("stata/phillips.dta")

y = phillips.inf
X = sm.add_constant(phillips[["unem"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                    inf   R-squared:                       0.062
Model:                            OLS   Adj. R-squared:                  0.045
Method:                 Least Squares   F-statistic:                     3.579
Date:                Mon, 31 Aug 2020   Prob (F-statistic):             0.0639
Time:                        02:00:01   Log-Likelihood:                -139.43
No. Observations:                  56   AIC:                             282.9
Df Residuals:                      54   BIC:                             286.9
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.0536      1.548      0.681      0.4

In [31]:
y = phillips[phillips.year > 1996].inf
X = sm.add_constant(phillips.loc[phillips.year > 1996, ["unem"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

part_1 = model.params["unem"]

                            OLS Regression Results                            
Dep. Variable:                    inf   R-squared:                       0.204
Model:                            OLS   Adj. R-squared:                  0.044
Method:                 Least Squares   F-statistic:                     1.278
Date:                Mon, 31 Aug 2020   Prob (F-statistic):              0.310
Time:                        02:00:01   Log-Likelihood:                -5.4596
No. Observations:                   7   AIC:                             14.92
Df Residuals:                       5   BIC:                             14.81
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          4.1609      1.651      2.521      0.0

  warn("omni_normtest is not valid with less than 8 observations; %i "


In [32]:
y = phillips[phillips.year < 1997].inf
X = sm.add_constant(phillips.loc[phillips.year < 1997, ["unem"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

part_2 = model.params["unem"]
print("The weighted average is: ", (7 / 56) * part_1 + (49 / 56) * part_2)

                            OLS Regression Results                            
Dep. Variable:                    inf   R-squared:                       0.053
Model:                            OLS   Adj. R-squared:                  0.033
Method:                 Least Squares   F-statistic:                     2.616
Date:                Mon, 31 Aug 2020   Prob (F-statistic):              0.112
Time:                        02:00:01   Log-Likelihood:                -124.43
No. Observations:                  49   AIC:                             252.9
Df Residuals:                      47   BIC:                             256.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.4236      1.719      0.828      0.4

C12.i There are 56 years of observations in the data set.

C12.ii The coefficient is larger. The extra years don't seem to give us any more evidence for a relationship between unemployment and inflation.

C12.iii For the limited years we see a tradeoff between unemployment but this is not a precise estimate (standard error of 0.334, p-value of 0.310).

C12.iv Based on the past results, we have no reason to believe that the estimate will be equal to a weighted average of the estimates from the subsets.

In [33]:
# Exercise 13
minwage = pd.read_stata("stata/minwage.dta")
minwage_clean = minwage[["gwage232", "gmwage", "gcpi"]].dropna()

y = minwage_clean.gwage232
X = sm.add_constant(minwage_clean[["gmwage", "gcpi"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:               gwage232   R-squared:                       0.293
Model:                            OLS   Adj. R-squared:                  0.290
Method:                 Least Squares   F-statistic:                     125.8
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           1.91e-46
Time:                        02:00:01   Log-Likelihood:                 2091.6
No. Observations:                 611   AIC:                            -4177.
Df Residuals:                     608   BIC:                            -4164.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0022      0.000      5.185      0.0

In [34]:
minwage_clean = minwage[["gwage232", "gmwage", "gmwage_1", "gmwage_2", "gmwage_3", "gmwage_4", "gmwage_5", "gmwage_6", "gmwage_7", "gmwage_8", "gmwage_9", "gmwage_10", "gmwage_11", "gmwage_12", "gcpi"]].dropna()

y = minwage_clean.gwage232
X = sm.add_constant(minwage_clean[["gmwage", "gmwage_1", "gmwage_2", "gmwage_3", "gmwage_4", "gmwage_5", "gmwage_6", "gmwage_7", "gmwage_8", "gmwage_9", "gmwage_10", "gmwage_11", "gmwage_12", "gcpi"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)
print(model.f_test("gmwage_1 = gmwage_2 = gmwage_3 = gmwage_4 = gmwage_5 = gmwage_6 = gmwage_7 = gmwage_8 = gmwage_9 = gmwage_10 = gmwage_11 = gmwage_12 = 0"))
print("The sum of all gmwage coefficients is: ", model.params[1:-1].sum())

                            OLS Regression Results                            
Dep. Variable:               gwage232   R-squared:                       0.326
Model:                            OLS   Adj. R-squared:                  0.310
Method:                 Least Squares   F-statistic:                     20.18
Date:                Mon, 31 Aug 2020   Prob (F-statistic):           1.04e-41
Time:                        02:00:01   Log-Likelihood:                 2069.8
No. Observations:                 599   AIC:                            -4110.
Df Residuals:                     584   BIC:                            -4044.
Df Model:                          14                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0021      0.000      4.901      0.0

In [35]:
minwage_clean = minwage[["gemp232", "gmwage", "gcpi"]].dropna()

y = minwage_clean.gemp232
X = sm.add_constant(minwage_clean[["gmwage", "gcpi"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)

                            OLS Regression Results                            
Dep. Variable:                gemp232   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.003
Method:                 Least Squares   F-statistic:                  0.003774
Date:                Mon, 31 Aug 2020   Prob (F-statistic):              0.996
Time:                        02:00:01   Log-Likelihood:                 1567.0
No. Observations:                 611   AIC:                            -3128.
Df Residuals:                     608   BIC:                            -3115.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0004      0.001     -0.446      0.6

In [36]:
minwage_clean = minwage[["gemp232", "gmwage", "gmwage_1", "gmwage_2", "gmwage_3", "gmwage_4", "gmwage_5", "gmwage_6", "gmwage_7", "gmwage_8", "gmwage_9", "gmwage_10", "gmwage_11", "gmwage_12", "gcpi"]].dropna()

y = minwage_clean.gemp232
X = sm.add_constant(minwage_clean[["gmwage", "gmwage_1", "gmwage_2", "gmwage_3", "gmwage_4", "gmwage_5", "gmwage_6", "gmwage_7", "gmwage_8", "gmwage_9", "gmwage_10", "gmwage_11", "gmwage_12", "gcpi"]])
model = sm.OLS(y, X).fit()
model_summary = model.summary()
print(model_summary)
print(model.f_test("gmwage = gmwage_1 = gmwage_2 = gmwage_3 = gmwage_4 = gmwage_5 = gmwage_6 = gmwage_7 = gmwage_8 = gmwage_9 = gmwage_10 = gmwage_11 = gmwage_12 = 0"))
print("The sum of all gmwage coefficients is: ", model.params[1:-1].sum())

                            OLS Regression Results                            
Dep. Variable:                gemp232   R-squared:                       0.022
Model:                            OLS   Adj. R-squared:                 -0.001
Method:                 Least Squares   F-statistic:                    0.9448
Date:                Mon, 31 Aug 2020   Prob (F-statistic):              0.510
Time:                        02:00:01   Log-Likelihood:                 1545.2
No. Observations:                 599   AIC:                            -3060.
Df Residuals:                     584   BIC:                            -2995.
Df Model:                          14                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0002      0.001     -0.196      0.8

C13.i The coefficient on gmwage implies that a 1% increase in the minimum wage increases monthly wage growth in sector 232 by 0.151%

C13.ii The individual significance of these lagged terms are inconsistent at best and their magnitude is small. A test for joint significance fails to reject the null at 5% (but is significant at the 10% level). The total effect is about 0.04 higher and marginally significant.

C13.iii The coefficients and t-statistics are small and the $R^2$ is 0. We have no evidence to suggest minimum wage growth has a contemporaneous effect on $gemp232$.

C13.iv The F-test for joint significance has a p-value of 0.44. The lags are larger than $gmwage$ on their own but only one is individually significant. We can't say there is a statistically significant relationship between the minimum wage and employment growth in the short or long run for sector 232.