# Chapter 15

In [1]:
import pandas as pd
import statsmodels.api as sm
import numpy as np
from linearmodels.iv import IV2SLS

In [2]:
# Exercise 1
wage2 = pd.read_stata("./stata/WAGE2.DTA")
X = sm.add_constant(wage2[["sibs"]])
model = sm.OLS(wage2.lwage, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.023
Model:,OLS,Adj. R-squared:,0.022
Method:,Least Squares,F-statistic:,22.31
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,2.68e-06
Time:,05:22:14,Log-Likelihood:,-506.59
No. Observations:,935,AIC:,1017.0
Df Residuals:,933,BIC:,1027.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,6.8611,0.022,310.771,0.000,6.818,6.904
sibs,-0.0279,0.006,-4.723,0.000,-0.039,-0.016

0,1,2,3
Omnibus:,22.562,Durbin-Watson:,1.726
Prob(Omnibus):,0.0,Jarque-Bera (JB):,26.529
Skew:,-0.306,Prob(JB):,1.74e-06
Kurtosis:,3.553,Cond. No.,6.33


In [3]:
X = sm.add_constant(wage2[["brthord"]])
model = sm.OLS(wage2.educ, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,educ,R-squared:,0.042
Model:,OLS,Adj. R-squared:,0.041
Method:,Least Squares,F-statistic:,37.29
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,1.55e-09
Time:,05:22:14,Log-Likelihood:,-1861.9
No. Observations:,852,AIC:,3728.0
Df Residuals:,850,BIC:,3737.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,14.1494,0.129,109.962,0.000,13.897,14.402
brthord,-0.2826,0.046,-6.106,0.000,-0.373,-0.192

0,1,2,3
Omnibus:,77.09,Durbin-Watson:,1.784
Prob(Omnibus):,0.0,Jarque-Bera (JB):,53.992
Skew:,0.506,Prob(JB):,1.89e-12
Kurtosis:,2.294,Cond. No.,5.28


In [4]:
IV2SLS(wage2.lwage, np.ones((wage2.shape[0], 1)), wage2.educ, wage2.brthord).fit(cov_type="unadjusted")

Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,lwage,R-squared:,-0.0286
Estimator:,IV-2SLS,Adj. R-squared:,-0.0298
No. Observations:,852,F-statistic:,16.667
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:14,Distribution:,chi2(1)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exog,5.0304,0.4324,11.633,0.0000,4.1828,5.8780
educ,0.1306,0.0320,4.0825,0.0000,0.0679,0.1934


In [5]:
X = sm.add_constant(wage2[["sibs", "brthord"]])
model = sm.OLS(wage2.educ, X, missing="drop").fit()
# We need the fitted values for part vi (first stage results)
educ_hat = model.fittedvalues
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,educ,R-squared:,0.058
Model:,OLS,Adj. R-squared:,0.056
Method:,Least Squares,F-statistic:,26.29
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,8.33e-12
Time:,05:22:14,Log-Likelihood:,-1854.6
No. Observations:,852,AIC:,3715.0
Df Residuals:,849,BIC:,3729.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,14.2965,0.133,107.260,0.000,14.035,14.558
sibs,-0.1529,0.040,-3.834,0.000,-0.231,-0.075
brthord,-0.1527,0.057,-2.675,0.008,-0.265,-0.041

0,1,2,3
Omnibus:,69.406,Durbin-Watson:,1.795
Prob(Omnibus):,0.0,Jarque-Bera (JB):,51.104
Skew:,0.497,Prob(JB):,8e-12
Kurtosis:,2.329,Cond. No.,8.46


In [6]:
X = sm.add_constant(wage2[["sibs"]])
IV2SLS(wage2.lwage, X, wage2.educ, wage2.brthord).fit(cov_type="unadjusted")

  x = pd.concat(x[::order], 1)
Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,lwage,R-squared:,-0.0543
Estimator:,IV-2SLS,Adj. R-squared:,-0.0568
No. Observations:,852,F-statistic:,21.871
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:14,Distribution:,chi2(2)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,4.9385,1.0538,4.6863,0.0000,2.8731,7.0040
sibs,0.0021,0.0173,0.1217,0.9031,-0.0319,0.0361
educ,0.1370,0.0745,1.8376,0.0661,-0.0091,0.2831


In [7]:
educ_hat.corr(wage2.sibs)

-0.929481763991843

C1.i When plugging in sibs in as a proxy for education, the coefficent is negative, sufficient to show it is not the same as the IV (0.122, from the text). Specifically this regression says that, without controlling for other factors, an additional sibling is associated with a little under 3% less in wages (though we shouldn't place too much emphasis on this as sibs is almost certianly correlated with other factors)

C1.ii One intuitive reason why education might be negatively associated with birth order is simply the matter of resource constraints (older children may have money set aside for education while younger children may not). The regression shows a statistically significant negative relationship between education and birth order, with each new position in the rank corresponding to a little over a quarter of a year less in education.

C1.iii Results reported above. When using birth order as an instrument for education, we see a larger coefficient for education.

C1.iv We are assuming that $\pi_2 \neq 0$ (that is, brthord isn't zero). The regression allows us to reject the null hypothesis that this value is zero, and so the identification assumption holds.

C1.v Results above. The standard error is considerably higher than the one estimated previously (0.0745 vs. 0.0320), enough to make it lose significance. Sibs has a small standard error, but also a small coefficient, meaning we have no evidence that it is significantly different from zero.

C1.vi There is a strong negative correlation between the fitted values of educ and sibs (-0.93). Recalling from section 3 that multicollinearity is a problem for 2SLS, we have an explanation for our large standard errors.

In [8]:
# Exercise 2
fertil2 = pd.read_stata("./stata/FERTIL2.DTA")
X = sm.add_constant(fertil2[["educ", "age", "agesq"]])
model = sm.OLS(fertil2.children, X, missing="drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,children,R-squared:,0.569
Model:,OLS,Adj. R-squared:,0.568
Method:,Least Squares,F-statistic:,1915.0
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0
Time:,05:22:15,Log-Likelihood:,-7835.6
No. Observations:,4361,AIC:,15680.0
Df Residuals:,4357,BIC:,15700.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-4.1383,0.241,-17.200,0.000,-4.610,-3.667
educ,-0.0906,0.006,-15.298,0.000,-0.102,-0.079
age,0.3324,0.017,20.088,0.000,0.300,0.365
agesq,-0.0026,0.000,-9.651,0.000,-0.003,-0.002

0,1,2,3
Omnibus:,203.406,Durbin-Watson:,1.868
Prob(Omnibus):,0.0,Jarque-Bera (JB):,715.951
Skew:,0.017,Prob(JB):,3.4100000000000003e-156
Kurtosis:,4.985,Cond. No.,10700.0


In [9]:
X = sm.add_constant(fertil2[["frsthalf", "age", "agesq"]])
model = sm.OLS(fertil2.educ, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,educ,R-squared:,0.108
Model:,OLS,Adj. R-squared:,0.107
Method:,Least Squares,F-statistic:,175.2
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,3.01e-107
Time:,05:22:15,Log-Likelihood:,-11905.0
No. Observations:,4361,AIC:,23820.0
Df Residuals:,4357,BIC:,23840.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,9.6929,0.598,16.207,0.000,8.520,10.865
frsthalf,-0.8523,0.113,-7.554,0.000,-1.073,-0.631
age,-0.1080,0.042,-2.568,0.010,-0.190,-0.026
agesq,-0.0005,0.001,-0.730,0.466,-0.002,0.001

0,1,2,3
Omnibus:,51.991,Durbin-Watson:,1.352
Prob(Omnibus):,0.0,Jarque-Bera (JB):,56.555
Skew:,0.233,Prob(JB):,5.24e-13
Kurtosis:,3.306,Cond. No.,10500.0


In [10]:
X = sm.add_constant(fertil2[["age", "agesq"]])
IV2SLS(fertil2.children, X, fertil2.educ, fertil2.frsthalf).fit(cov_type="unadjusted")

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,children,R-squared:,0.5502
Estimator:,IV-2SLS,Adj. R-squared:,0.5499
No. Observations:,4361,F-statistic:,5300.2
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:15,Distribution:,chi2(3)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,-3.3878,0.5479,-6.1833,0.0000,-4.4617,-2.3139
age,0.3236,0.0179,18.128,0.0000,0.2886,0.3586
agesq,-0.0027,0.0003,-9.5589,0.0000,-0.0032,-0.0021
educ,-0.1715,0.0532,-3.2264,0.0013,-0.2757,-0.0673


In [11]:
X = sm.add_constant(fertil2[["educ", "age", "agesq", "electric", "tv", "bicycle"]])
model = sm.OLS(fertil2.children, X, missing="drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,children,R-squared:,0.576
Model:,OLS,Adj. R-squared:,0.575
Method:,Least Squares,F-statistic:,984.9
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0
Time:,05:22:15,Log-Likelihood:,-7789.3
No. Observations:,4356,AIC:,15590.0
Df Residuals:,4349,BIC:,15640.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-4.3898,0.240,-18.267,0.000,-4.861,-3.919
educ,-0.0767,0.006,-12.075,0.000,-0.089,-0.064
age,0.3402,0.016,20.692,0.000,0.308,0.372
agesq,-0.0027,0.000,-10.010,0.000,-0.003,-0.002
electric,-0.3027,0.076,-3.974,0.000,-0.452,-0.153
tv,-0.2531,0.091,-2.768,0.006,-0.432,-0.074
bicycle,0.3179,0.049,6.440,0.000,0.221,0.415

0,1,2,3
Omnibus:,196.639,Durbin-Watson:,1.891
Prob(Omnibus):,0.0,Jarque-Bera (JB):,675.502
Skew:,0.009,Prob(JB):,2.07e-147
Kurtosis:,4.929,Cond. No.,10800.0


In [12]:
X = sm.add_constant(fertil2[["age", "agesq", "electric", "tv", "bicycle"]])
IV2SLS(fertil2.children, X, fertil2.educ, fertil2.frsthalf).fit(cov_type="unadjusted")

  x = pd.concat(x[::order], 1)
Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,children,R-squared:,0.5577
Estimator:,IV-2SLS,Adj. R-squared:,0.5571
No. Observations:,4356,F-statistic:,5539.2
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:16,Distribution:,chi2(6)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,-3.5913,0.6446,-5.5717,0.0000,-4.8547,-2.3280
age,0.3281,0.0190,17.231,0.0000,0.2908,0.3655
agesq,-0.0027,0.0003,-9.8509,0.0000,-0.0033,-0.0022
electric,-0.1065,0.1658,-0.6424,0.5206,-0.4316,0.2185
tv,-0.0026,0.2091,-0.0122,0.9902,-0.4123,0.4072
bicycle,0.3321,0.0515,6.4499,0.0000,0.2312,0.4330
educ,-0.1640,0.0655,-2.5045,0.0123,-0.2923,-0.0357


C2.i Results above. Holding age fixed, each year of education results in a little less than .1 fewer children. For a group of 100 women who get another year of education, we would expect them to have about 9 fewer children.

C2.ii Regressing on the reduce form produces an estimate that allows us to reject the null hypothesis that the coefficient for firsthalf is zero. Since we have assumed it is uncorrelated with the error, this shows us we have a reasonable IV.

C2.iii Results above. The effect is larger than OLS while retaining significance (though it does have a wider confidence interval)

C3.iv Results above. The magnitude of education falls after these factors are accounted for (though it is still large). TV ownership is associated with fewer children (0.25 fewer for OLS, 0.0026 for 2SLS), likely because it is a proxy for higher incomes which are associated with fewer children.

In [13]:
# Exercise 3
card = pd.read_stata("./stata/CARD.DTA")
X = sm.add_constant(card[["nearc4"]])
model = sm.OLS(card.IQ, X, missing="drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,IQ,R-squared:,0.006
Model:,OLS,Adj. R-squared:,0.005
Method:,Least Squares,F-statistic:,12.13
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.000507
Time:,05:22:16,Log-Likelihood:,-8556.6
No. Observations:,2061,AIC:,17120.0
Df Residuals:,2059,BIC:,17130.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,100.6106,0.627,160.347,0.000,99.380,101.841
nearc4,2.5962,0.745,3.483,0.001,1.134,4.058

0,1,2,3
Omnibus:,32.638,Durbin-Watson:,1.737
Prob(Omnibus):,0.0,Jarque-Bera (JB):,33.8
Skew:,-0.311,Prob(JB):,4.57e-08
Kurtosis:,3.083,Cond. No.,3.47


In [14]:
X = sm.add_constant(card[["nearc4", "smsa66", "reg662", "reg663", "reg664",
                          "reg665", "reg666", "reg667", "reg668", "reg669"]])
model = sm.OLS(card.IQ, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,IQ,R-squared:,0.063
Model:,OLS,Adj. R-squared:,0.058
Method:,Least Squares,F-statistic:,13.7
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,1.2099999999999999e-23
Time:,05:22:16,Log-Likelihood:,-8496.0
No. Observations:,2061,AIC:,17010.0
Df Residuals:,2050,BIC:,17080.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,104.7735,1.625,64.477,0.000,101.587,107.960
nearc4,0.3479,0.814,0.427,0.669,-1.249,1.945
smsa66,1.0892,0.809,1.347,0.178,-0.497,2.675
reg662,1.0993,1.650,0.666,0.505,-2.136,4.335
reg663,-1.5593,1.623,-0.961,0.337,-4.742,1.624
reg664,-0.5425,1.916,-0.283,0.777,-4.301,3.216
reg665,-8.4755,1.666,-5.089,0.000,-11.742,-5.209
reg666,-7.4212,1.974,-3.760,0.000,-11.292,-3.550
reg667,-8.3944,1.830,-4.588,0.000,-11.983,-4.806

0,1,2,3
Omnibus:,19.4,Durbin-Watson:,1.838
Prob(Omnibus):,0.0,Jarque-Bera (JB):,19.682
Skew:,-0.236,Prob(JB):,5.32e-05
Kurtosis:,3.085,Cond. No.,20.8


C3.i The original regression was reported to include regional dummies, and it would seem like 'talented' people may be concentrated in regions near four year colleges. While something of a hand waving argument, I would allow that near4 may not be uncorrelated with unobserved factors.

C3.ii Regressing near4 on IQ produces a positive and significant coefficient, that is, growing up near a 4 year college is positively correlated with IQ.

C3.iii near4 becomes smaller and statistically insignificant once the regional dummies are accounted for. Essentially the relationship between growing up near a 4 year college and IQ vanishes once other factors are accounted for.

C3.iv We should control for smsa66 and the regional dummies. We've previously used IQ as a proxy for unobserved ability, and so given that these factors are correlated with something we would expect to affect wages, failing to account for them would violate the assumptions we rely on.

In [15]:
# Exercise 4
intdef = pd.read_stata("./stata/intdef.dta")
intdef_1 = intdef[1:]
X = sm.add_constant(intdef_1[["inf"]])
model = sm.OLS(intdef_1.i3, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,i3,R-squared:,0.546
Model:,OLS,Adj. R-squared:,0.538
Method:,Least Squares,F-statistic:,63.86
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,1.16e-10
Time:,05:22:16,Log-Likelihood:,-113.31
No. Observations:,55,AIC:,230.6
Df Residuals:,53,BIC:,234.6
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,2.3208,0.423,5.491,0.000,1.473,3.169
inf,0.6981,0.087,7.991,0.000,0.523,0.873

0,1,2,3
Omnibus:,4.544,Durbin-Watson:,0.577
Prob(Omnibus):,0.103,Jarque-Bera (JB):,5.45
Skew:,0.006,Prob(JB):,0.0656
Kurtosis:,4.542,Cond. No.,8.05


In [16]:
IV2SLS(intdef.i3, np.ones((intdef.shape[0], 1)), intdef.inf, intdef.inf_1).fit(cov_type="unadjusted")

Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,i3,R-squared:,0.4996
Estimator:,IV-2SLS,Adj. R-squared:,0.4902
No. Observations:,55,F-statistic:,46.993
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:16,Distribution:,chi2(1)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exog,1.5426,0.5688,2.7119,0.0067,0.4277,2.6575
inf,0.9025,0.1316,6.8551,0.0000,0.6444,1.1605


In [17]:
X = sm.add_constant(intdef[["cinf"]])
model = sm.OLS(intdef.ci3, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,ci3,R-squared:,0.152
Model:,OLS,Adj. R-squared:,0.136
Method:,Least Squares,F-statistic:,9.533
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.00321
Time:,05:22:16,Log-Likelihood:,-90.202
No. Observations:,55,AIC:,184.4
Df Residuals:,53,BIC:,188.4
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0230,0.172,0.134,0.894,-0.321,0.367
cinf,0.2212,0.072,3.088,0.003,0.077,0.365

0,1,2,3
Omnibus:,0.441,Durbin-Watson:,1.792
Prob(Omnibus):,0.802,Jarque-Bera (JB):,0.595
Skew:,0.163,Prob(JB):,0.743
Kurtosis:,2.608,Cond. No.,2.4


In [18]:
X = sm.add_constant(intdef.cinf.shift(1))
model = sm.OLS(intdef.cinf, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,cinf,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.019
Method:,Least Squares,F-statistic:,0.007553
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.931
Time:,05:22:17,Log-Likelihood:,-115.58
No. Observations:,54,AIC:,235.2
Df Residuals:,52,BIC:,239.1
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0636,0.286,0.223,0.825,-0.510,0.637
cinf,-0.0103,0.118,-0.087,0.931,-0.248,0.227

0,1,2,3
Omnibus:,5.888,Durbin-Watson:,1.795
Prob(Omnibus):,0.053,Jarque-Bera (JB):,8.171
Skew:,0.175,Prob(JB):,0.0168
Kurtosis:,4.873,Cond. No.,2.42


C4.i Results reported above

C4.ii The IV increases the magnitude of the coefficient considerably, although in this case we're likely more interested in how close it is to 1 (in which case it is very close).

C4.iii The first differenced estimate is the smallest we've seen so far (by a large margin).

C4.iv the differenced variables show no indication of correlation and so are unsuitable as an IV.

In [19]:
# Exercise 5
X = sm.add_constant(card[["nearc4", "exper", "expersq", "black", "smsa", 
                          "south", "smsa66", "reg662", "reg663", "reg664", 
                          "reg665", "reg666", "reg667", "reg668", "reg669"]])
card["v_2"] = sm.OLS(card.educ, X, missing="drop").fit().resid

X = sm.add_constant(card[["v_2", "educ", "exper", "expersq", "black", "smsa", 
                          "south", "smsa66", "reg662", "reg663", "reg664", 
                          "reg665", "reg666", "reg667", "reg668", "reg669"]])
model = sm.OLS(card.lwage, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.3
Model:,OLS,Adj. R-squared:,0.296
Method:,Least Squares,F-statistic:,80.21
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,9.21e-218
Time:,05:22:17,Log-Likelihood:,-1288.2
No. Observations:,3010,AIC:,2610.0
Df Residuals:,2993,BIC:,2713.0
Df Model:,16,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,3.6662,0.887,4.135,0.000,1.928,5.405
v_2,-0.0571,0.053,-1.081,0.280,-0.161,0.046
educ,0.1315,0.053,2.496,0.013,0.028,0.235
exper,0.1083,0.023,4.774,0.000,0.064,0.153
expersq,-0.0023,0.000,-7.303,0.000,-0.003,-0.002
black,-0.1468,0.052,-2.841,0.005,-0.248,-0.045
smsa,0.1118,0.030,3.684,0.000,0.052,0.171
south,-0.1447,0.026,-5.531,0.000,-0.196,-0.093
smsa66,0.0185,0.021,0.895,0.371,-0.022,0.059

0,1,2,3
Omnibus:,60.106,Durbin-Watson:,1.881
Prob(Omnibus):,0.0,Jarque-Bera (JB):,71.758
Skew:,-0.283,Prob(JB):,2.62e-16
Kurtosis:,3.503,Cond. No.,16900.0


In [20]:
X = sm.add_constant(card[["exper", "expersq", "black", "smsa", 
                          "south", "smsa66", "reg662", "reg663", "reg664", 
                          "reg665", "reg666", "reg667", "reg668", "reg669"]])
model = IV2SLS(card.lwage, X, card.educ, card[["nearc2", "nearc4"]]).fit(cov_type="unadjusted")
model

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1702
Estimator:,IV-2SLS,Adj. R-squared:,0.1660
No. Observations:,3010,F-statistic:,709.89
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:17,Distribution:,chi2(15)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,3.2367,0.8826,3.6674,0.0002,1.5069,4.9665
exper,0.1188,0.0227,5.2237,0.0000,0.0742,0.1634
expersq,-0.0024,0.0003,-6.7990,0.0000,-0.0030,-0.0017
black,-0.1233,0.0520,-2.3702,0.0178,-0.2252,-0.0213
smsa,0.1008,0.0314,3.2051,0.0014,0.0391,0.1624
south,-0.1432,0.0284,-5.0476,0.0000,-0.1988,-0.0876
smsa66,0.0151,0.0223,0.6762,0.4989,-0.0286,0.0587
reg662,0.1027,0.0392,2.6220,0.0087,0.0259,0.1796
reg663,0.1499,0.0383,3.9157,0.0001,0.0749,0.2250


In [21]:
model.wooldridge_overid

Wooldridge's score test of overidentification
H0: Model is not overidentified.
Statistic: 1.2689
P-value: 0.2600
Distributed: chi2(1)
WaldTestStatistic, id: 0x7f054202dca0

C5.i The coefficient for v is large (as is the difference between the two regressions) but not statistically significant.

C5.ii The coefficient on education gets even larger.

C5.iii This is built into linearmodels. We fail to reject the null and so do not have evidence that the instruments are endogenous.

In [22]:
# Exercise 6
murder = pd.read_stata("./stata/MURDER.DTA")
murder[(murder.year == 93) & (murder.exec > 0)].sort_values("exec", ascending=False)

Unnamed: 0,id,state,year,mrdrte,exec,unem,d90,d93,cmrdrte,cexec,cunem,cexec_1,cunem_1
131,44,TX,93,11.9,34,7.0,0,1,-2.200001,23.0,0.8,-11.0,-2.2
140,47,VA,93,8.3,11,5.0,0,1,-0.5,8.0,0.7,-1.0,0.1
29,10,FL,93,8.9,7,7.0,0,1,-1.8,-1.0,1.1,1.0,0.6
77,26,MO,93,11.3,6,6.4,0,1,2.5,1.0,0.7,5.0,-0.6
8,3,AZ,93,8.6,3,6.2,0,1,0.900001,3.0,0.9,0.0,-0.9
32,11,GA,93,11.4,3,5.8,0,1,-0.400001,1.0,0.4,-7.0,-0.1
2,1,AL,93,11.6,2,7.5,0,1,0.0,-3.0,0.7,3.0,-1.0
11,4,AR,93,10.2,2,6.2,0,1,-0.1,0.0,-0.7,2.0,-1.2
14,5,CA,93,13.1,2,9.2,0,1,1.200001,2.0,3.6,0.0,-0.2
56,19,LA,93,20.299999,2,7.4,0,1,3.099998,-2.0,1.2,-5.0,-5.8


In [23]:
murder_pooled = murder[murder.year.isin([90, 93])]
X = sm.add_constant(murder_pooled[["d93", "exec", "unem"]])
model = sm.OLS(murder_pooled.mrdrte, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,mrdrte,R-squared:,0.102
Model:,OLS,Adj. R-squared:,0.074
Method:,Least Squares,F-statistic:,3.695
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0144
Time:,05:22:18,Log-Likelihood:,-379.81
No. Observations:,102,AIC:,767.6
Df Residuals:,98,BIC:,778.1
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-5.2780,4.428,-1.192,0.236,-14.065,3.509
d93,-2.0674,2.145,-0.964,0.337,-6.323,2.189
exec,0.1277,0.263,0.485,0.629,-0.395,0.650
unem,2.5289,0.782,3.235,0.002,0.978,4.080

0,1,2,3
Omnibus:,148.824,Durbin-Watson:,1.062
Prob(Omnibus):,0.0,Jarque-Bera (JB):,5022.506
Skew:,5.38,Prob(JB):,0.0
Kurtosis:,35.649,Cond. No.,28.0


In [24]:
murder_change = murder[murder.year == 93]
X = sm.add_constant(murder_change[["cexec", "cunem"]])
model = sm.OLS(murder_change.cmrdrte, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,cmrdrte,R-squared:,0.11
Model:,OLS,Adj. R-squared:,0.073
Method:,Least Squares,F-statistic:,2.959
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0614
Time:,05:22:18,Log-Likelihood:,-74.693
No. Observations:,51,AIC:,155.4
Df Residuals:,48,BIC:,161.2
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.4133,0.209,1.974,0.054,-0.008,0.834
cexec,-0.1038,0.043,-2.392,0.021,-0.191,-0.017
cunem,-0.0666,0.159,-0.420,0.677,-0.386,0.252

0,1,2,3
Omnibus:,0.134,Durbin-Watson:,2.223
Prob(Omnibus):,0.935,Jarque-Bera (JB):,0.05
Skew:,0.067,Prob(JB):,0.976
Kurtosis:,2.927,Cond. No.,5.71


In [25]:
X = sm.add_constant(murder[["cexec_1"]])
sm.OLS(murder.cexec, X, missing="drop").fit().summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,cexec,R-squared:,0.456
Model:,OLS,Adj. R-squared:,0.444
Method:,Least Squares,F-statistic:,41.0
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,5.59e-08
Time:,05:22:18,Log-Likelihood:,-120.46
No. Observations:,51,AIC:,244.9
Df Residuals:,49,BIC:,248.8
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.3499,0.370,0.946,0.349,-0.393,1.093
cexec_1,-1.0824,0.169,-6.403,0.000,-1.422,-0.743

0,1,2,3
Omnibus:,27.088,Durbin-Watson:,2.011
Prob(Omnibus):,0.0,Jarque-Bera (JB):,120.223
Skew:,1.109,Prob(JB):,7.83e-27
Kurtosis:,10.187,Cond. No.,2.21


In [26]:
X = sm.add_constant(murder_change[["cunem"]])
IV2SLS(murder_change.cmrdrte, X, murder_change.cexec, murder_change.cexec_1).fit(cov_type="unadjusted")

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,cmrdrte,R-squared:,0.1096
Estimator:,IV-2SLS,Adj. R-squared:,0.0725
No. Observations:,51,F-statistic:,2.7816
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.2489
Time:,05:22:18,Distribution:,chi2(2)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,0.4110,0.2051,2.0036,0.0451,0.0090,0.8130
cunem,-0.0667,0.1540,-0.4334,0.6647,-0.3685,0.2350
cexec,-0.1001,0.0624,-1.6040,0.1087,-0.2224,0.0222


C6.i 16 states executed at least one prisoner between 1991-1993. Texas had the most executions.

C6.ii Results above. The coefficient is positive, which has the counterintuitive interpretation of saying that executions encourage murder, but the results is significant and so the more accurate interpretation would be that there is no evidence of a deterrent effect.

C6.iii Results above. In the differenced regression the coefficient for executions is negative and significant, suggesting there is a deterrent effect (which is about .1, so 10 executions would deter 1 murder per 100,000). Unemployment loses its significance.

C6.iv The differenced execution and its lag are negatively correlated, and the coefficient is significant. The interpretation would be that an increase in the murder rate in the current period is associated with a roughly equal decrease in the next period.

C6.v The coefficient on executions is quite similar but the standard error is larger, losing the significance.

In [27]:
# Exercise 7
phillips = pd.read_stata("./stata/phillips.dta")
X = sm.add_constant(phillips[["unem_1"]])
model = sm.OLS(phillips.unem, X, missing="drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,unem,R-squared:,0.566
Model:,OLS,Adj. R-squared:,0.558
Method:,Least Squares,F-statistic:,69.12
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,3.54e-11
Time:,05:22:18,Log-Likelihood:,-76.946
No. Observations:,55,AIC:,157.9
Df Residuals:,53,BIC:,161.9
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.4897,0.520,2.864,0.006,0.446,2.533
unem_1,0.7424,0.089,8.314,0.000,0.563,0.921

0,1,2,3
Omnibus:,9.49,Durbin-Watson:,1.666
Prob(Omnibus):,0.009,Jarque-Bera (JB):,8.954
Skew:,0.922,Prob(JB):,0.0114
Kurtosis:,3.714,Cond. No.,23.1


In [28]:
IV2SLS(phillips.cinf, np.ones((phillips.shape[0], 1)), phillips.unem, phillips.unem_1).fit(cov_type="unadjusted")

Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,cinf,R-squared:,0.0457
Estimator:,IV-2SLS,Adj. R-squared:,0.0277
No. Observations:,55,F-statistic:,0.2148
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.6430
Time:,05:22:18,Distribution:,chi2(1)
Cov. Estimator:,unadjusted,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exog,0.6338,1.6259,0.3898,0.6967,-2.5529,3.8205
unem,-0.1304,0.2815,-0.4635,0.6430,-0.6821,0.4212


C7.i If the supply shock is correlated with unemployment then our estimate is biased and inconsistent.

C7.ii If the shocks are unpredictable given past information, past unemployment is a good IV for current unemployment as they will be correlated (while it will be uncorrelated with the shock).

C7.iii Unemployment and past unemployment are significantly correlated.

C7.iv Results above. The coefficient in the example (11.19) is -0.543 and so the result from the IV estimation is considerably smaller in magnitude and is not statistically different from 0.

In [29]:
# Exercise 8
subs = pd.read_stata("./stata/401ksubs.dta")
X = sm.add_constant(subs[["p401k", "inc", "incsq", "age", "agesq"]])
model = sm.OLS(subs.pira, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,pira,R-squared:,0.18
Model:,OLS,Adj. R-squared:,0.18
Method:,Least Squares,F-statistic:,406.9
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0
Time:,05:22:18,Log-Likelihood:,-4530.3
No. Observations:,9275,AIC:,9073.0
Df Residuals:,9269,BIC:,9115.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.1977,0.069,-2.880,0.004,-0.332,-0.063
p401k,0.0537,0.010,5.606,0.000,0.035,0.072
inc,0.0087,0.001,16.983,0.000,0.008,0.010
incsq,-2.28e-05,4.03e-06,-5.653,0.000,-3.07e-05,-1.49e-05
age,-0.0016,0.003,-0.479,0.632,-0.008,0.005
agesq,0.0001,3.82e-05,3.068,0.002,4.24e-05,0.000

0,1,2,3
Omnibus:,940.477,Durbin-Watson:,1.966
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1208.938
Skew:,0.871,Prob(JB):,3.0399999999999998e-263
Kurtosis:,2.69,Cond. No.,65000.0


In [30]:
X = sm.add_constant(subs[["e401k", "inc", "incsq", "age", "agesq"]])
model = sm.OLS(subs.p401k, X, missing="drop").fit(cov_type="HC3")
subs["v"] = model.resid # Used in part vi
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,p401k,R-squared:,0.596
Model:,OLS,Adj. R-squared:,0.596
Method:,Least Squares,F-statistic:,1912.0
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0
Time:,05:22:19,Log-Likelihood:,-1488.4
No. Observations:,9275,AIC:,2989.0
Df Residuals:,9269,BIC:,3032.0
Df Model:,5,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.0591,0.046,1.280,0.201,-0.031,0.150
e401k,0.6888,0.008,86.179,0.000,0.673,0.705
inc,0.0011,0.000,3.223,0.001,0.000,0.002
incsq,1.841e-06,2.69e-06,0.683,0.494,-3.44e-06,7.12e-06
age,-0.0047,0.002,-2.103,0.035,-0.009,-0.000
agesq,5.204e-05,2.57e-05,2.022,0.043,1.61e-06,0.000

0,1,2,3
Omnibus:,2096.661,Durbin-Watson:,2.009
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3985.187
Skew:,-1.403,Prob(JB):,0.0
Kurtosis:,4.561,Cond. No.,65000.0


In [31]:
X = sm.add_constant(subs[["inc", "incsq", "age", "agesq"]])
IV2SLS(subs.pira, X, subs.p401k, subs.e401k).fit(cov_type="robust")

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,pira,R-squared:,0.1789
Estimator:,IV-2SLS,Adj. R-squared:,0.1785
No. Observations:,9275,F-statistic:,2063.8
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:19,Distribution:,chi2(5)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,-0.2073,0.0654,-3.1719,0.0015,-0.3354,-0.0792
inc,0.0090,0.0005,18.321,0.0000,0.0080,0.0100
incsq,-2.414e-05,3.881e-06,-6.2192,0.0000,-3.174e-05,-1.653e-05
age,-0.0011,0.0032,-0.3530,0.7241,-0.0075,0.0052
agesq,0.0001,3.832e-05,2.9243,0.0035,3.696e-05,0.0002
p401k,0.0207,0.0132,1.5650,0.1176,-0.0052,0.0466


In [32]:
X = sm.add_constant(subs[["v", "p401k", "inc", "incsq", "age", "agesq"]])
model = sm.OLS(subs.pira, X, missing="drop").fit(cov_type="HC3")
model.summary()

0,1,2,3
Dep. Variable:,pira,R-squared:,0.181
Model:,OLS,Adj. R-squared:,0.181
Method:,Least Squares,F-statistic:,354.2
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0
Time:,05:22:19,Log-Likelihood:,-4522.8
No. Observations:,9275,AIC:,9060.0
Df Residuals:,9268,BIC:,9110.0
Df Model:,6,,
Covariance Type:,HC3,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.2073,0.065,-3.175,0.001,-0.335,-0.079
v,0.0748,0.019,3.914,0.000,0.037,0.112
p401k,0.0207,0.013,1.565,0.118,-0.005,0.047
inc,0.0090,0.000,18.255,0.000,0.008,0.010
incsq,-2.414e-05,3.9e-06,-6.187,0.000,-3.18e-05,-1.65e-05
age,-0.0011,0.003,-0.354,0.724,-0.008,0.005
agesq,0.0001,3.83e-05,2.929,0.003,3.71e-05,0.000

0,1,2,3
Omnibus:,937.141,Durbin-Watson:,1.965
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1206.034
Skew:,0.87,Prob(JB):,1.3e-262
Kurtosis:,2.696,Cond. No.,65000.0


C8.i Results above. There appears to be a positive and significant relationship between participation in a 401k and participation in an IRA.

C8.ii OLS would assume that elements in the error term are uncorrelated with p401k. While we do include age and income, participation in one of these savings accounts likely indicates an overall propensity to save which is not presently accounted for (that is, if I want to save I am likely to seek out both a 401k and an IRA, rather than participation in a 401k making me realize what a good idea an IRA is)

C8.iii Eligibility for a 401k being correlated with participation in a 401k is without question. For it to be a valid IV, however, it needs to be exogenous. This is murkier, especially because I'm not entirely sure what eligibility entails. If eligibility entails selection for an employer, it is reasonable to believe savers will select employers that allow for participation. If eligibility is unrelated to propensity to save, then it would be a valid instrument.

C8.iv Results above. There is a positive and statistically significant correlation between the two variables, even when correcting for heteroskedasticity.

C8.v Results above. The coefficient for p401k drops considerably (less than half of the OLS estimate), and is no longer significant.

C8.vi Using the residuals from the reduced form equation we find a positive and significant coefficient on the residuals, rejecting the null hypothesis that p401k is exogenous (using heteroskedasticity robust standard errors).

In [33]:
# Exercise 9
wage2 = pd.read_stata("./stata/WAGE2.DTA")
X = sm.add_constant(wage2[["exper", "tenure", "black"]])
IV2SLS(wage2.lwage, X, wage2.educ, wage2.sibs).fit()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1685
Estimator:,IV-2SLS,Adj. R-squared:,0.1650
No. Observations:,935,F-statistic:,101.47
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:19,Distribution:,chi2(4)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,5.2160,0.5174,10.081,0.0000,4.2019,6.2301
exper,0.0209,0.0083,2.5215,0.0117,0.0047,0.0372
tenure,0.0115,0.0028,4.1894,0.0000,0.0061,0.0170
black,-0.1833,0.0501,-3.6606,0.0003,-0.2815,-0.0852
educ,0.0936,0.0318,2.9414,0.0033,0.0312,0.1560


In [34]:
X = sm.add_constant(wage2[["sibs", "exper", "tenure", "black"]])
model = sm.OLS(wage2.educ, X).fit()
wage2["educ_hat"] = model.fittedvalues

X = sm.add_constant(wage2[["educ_hat", "exper", "tenure", "black"]])
model = sm.OLS(wage2.lwage, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.089
Model:,OLS,Adj. R-squared:,0.085
Method:,Least Squares,F-statistic:,22.75
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,5.99e-18
Time:,05:22:19,Log-Likelihood:,-474.0
No. Observations:,935,AIC:,958.0
Df Residuals:,930,BIC:,982.2
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.2160,0.569,9.170,0.000,4.100,6.332
educ_hat,0.0936,0.035,2.653,0.008,0.024,0.163
exper,0.0209,0.009,2.383,0.017,0.004,0.038
tenure,0.0115,0.003,4.027,0.000,0.006,0.017
black,-0.1833,0.052,-3.494,0.000,-0.286,-0.080

0,1,2,3
Omnibus:,19.516,Durbin-Watson:,1.717
Prob(Omnibus):,0.0,Jarque-Bera (JB):,29.824
Skew:,-0.182,Prob(JB):,3.34e-07
Kurtosis:,3.795,Cond. No.,844.0


In [35]:
X = sm.add_constant(wage2[["sibs"]])
model = sm.OLS(wage2.educ, X).fit()
wage2["educ_tilde"] = model.fittedvalues

X = sm.add_constant(wage2[["educ_tilde", "exper", "tenure", "black"]])
model = sm.OLS(wage2.lwage, X).fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.089
Model:,OLS,Adj. R-squared:,0.085
Method:,Least Squares,F-statistic:,22.75
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,5.99e-18
Time:,05:22:19,Log-Likelihood:,-474.0
No. Observations:,935,AIC:,958.0
Df Residuals:,930,BIC:,982.2
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,5.7710,0.360,16.014,0.000,5.064,6.478
educ_tilde,0.0700,0.026,2.653,0.008,0.018,0.122
exper,-0.0004,0.003,-0.126,0.900,-0.007,0.006
tenure,0.0140,0.003,5.193,0.000,0.009,0.019
black,-0.2416,0.042,-5.819,0.000,-0.323,-0.160

0,1,2,3
Omnibus:,19.516,Durbin-Watson:,1.717
Prob(Omnibus):,0.0,Jarque-Bera (JB):,29.824
Skew:,-0.182,Prob(JB):,3.34e-07
Kurtosis:,3.795,Cond. No.,536.0


C9.i Results above

C9.ii Results above. The coefficients are identical but the standard errors are (slightly) larger in the manual procedure.

C9.iii Results above. The coefficient for education falls by a noticeable amount.

In [36]:
# Exercise 10
htv = pd.read_stata("./stata/HTV.DTA")
X = sm.add_constant(htv[["educ"]])
model = sm.OLS(htv.lwage, X).fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.162
Model:,OLS,Adj. R-squared:,0.161
Method:,Least Squares,F-statistic:,236.6
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,5.8e-49
Time:,05:22:20,Log-Likelihood:,-995.16
No. Observations:,1230,AIC:,1994.0
Df Residuals:,1228,BIC:,2005.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,1.0923,0.087,12.513,0.000,0.921,1.264
educ,0.1014,0.007,15.383,0.000,0.088,0.114

0,1,2,3
Omnibus:,57.083,Durbin-Watson:,1.884
Prob(Omnibus):,0.0,Jarque-Bera (JB):,109.206
Skew:,-0.324,Prob(JB):,1.93e-24
Kurtosis:,4.308,Cond. No.,75.0


In [37]:
X = sm.add_constant(htv[["ctuit"]])
model = sm.OLS(htv.educ, X).fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,educ,R-squared:,0.0
Model:,OLS,Adj. R-squared:,-0.001
Method:,Least Squares,F-statistic:,0.3505
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.554
Time:,05:22:20,Log-Likelihood:,-2797.8
No. Observations:,1230,AIC:,5600.0
Df Residuals:,1228,BIC:,5610.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,13.0384,0.067,194.117,0.000,12.907,13.170
ctuit,-0.0494,0.084,-0.592,0.554,-0.213,0.114

0,1,2,3
Omnibus:,43.5,Durbin-Watson:,1.538
Prob(Omnibus):,0.0,Jarque-Bera (JB):,49.199
Skew:,0.428,Prob(JB):,2.07e-11
Kurtosis:,3.475,Cond. No.,1.25


In [38]:
X = sm.add_constant(htv[["educ", "exper", "expersq", "ne", "nc", "west",
                         "ne18", "nc18", "west18", "urban", "urban18"]])
model = sm.OLS(htv.lwage, X).fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.219
Model:,OLS,Adj. R-squared:,0.212
Method:,Least Squares,F-statistic:,30.98
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,4.15e-58
Time:,05:22:20,Log-Likelihood:,-951.82
No. Observations:,1230,AIC:,1928.0
Df Residuals:,1218,BIC:,1989.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.5075,0.241,-2.104,0.036,-0.981,-0.034
educ,0.1371,0.009,15.096,0.000,0.119,0.155
exper,0.1124,0.027,4.217,0.000,0.060,0.165
expersq,-0.0030,0.001,-2.548,0.011,-0.005,-0.001
ne,-0.0168,0.086,-0.195,0.845,-0.186,0.152
nc,-0.0174,0.071,-0.245,0.807,-0.157,0.122
west,0.0175,0.081,0.217,0.828,-0.141,0.176
ne18,0.1564,0.087,1.805,0.071,-0.014,0.326
nc18,0.0114,0.073,0.156,0.876,-0.131,0.154

0,1,2,3
Omnibus:,63.263,Durbin-Watson:,1.929
Prob(Omnibus):,0.0,Jarque-Bera (JB):,114.802
Skew:,-0.375,Prob(JB):,1.1799999999999999e-25
Kurtosis:,4.295,Cond. No.,2310.0


In [39]:
X = sm.add_constant(htv[["ctuit", "exper", "expersq", "ne", "nc", "west",
                         "ne18", "nc18", "west18", "urban", "urban18"]])
model = sm.OLS(htv.educ, X).fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,educ,R-squared:,0.509
Model:,OLS,Adj. R-squared:,0.504
Method:,Least Squares,F-statistic:,114.7
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,2.96e-179
Time:,05:22:20,Log-Likelihood:,-2360.8
No. Observations:,1230,AIC:,4746.0
Df Residuals:,1218,BIC:,4807.0
Df Model:,11,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,21.2425,0.459,46.230,0.000,20.341,22.144
ctuit,-0.1652,0.060,-2.771,0.006,-0.282,-0.048
exper,-0.8738,0.080,-10.889,0.000,-1.031,-0.716
expersq,0.0157,0.004,4.196,0.000,0.008,0.023
ne,-0.3746,0.270,-1.386,0.166,-0.905,0.156
nc,-0.1415,0.224,-0.631,0.528,-0.582,0.299
west,0.6220,0.253,2.456,0.014,0.125,1.119
ne18,0.6533,0.272,2.399,0.017,0.119,1.188
nc18,0.2322,0.229,1.014,0.311,-0.217,0.682

0,1,2,3
Omnibus:,13.232,Durbin-Watson:,1.668
Prob(Omnibus):,0.001,Jarque-Bera (JB):,13.427
Skew:,0.255,Prob(JB):,0.00121
Kurtosis:,3.031,Cond. No.,1440.0


In [40]:
X = sm.add_constant(htv[["exper", "expersq", "ne", "nc", "west", "ne18",
                         "nc18", "west18", "urban", "urban18"]])
IV2SLS(htv.lwage, X, htv.educ, htv.ctuit).fit()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1196
Estimator:,IV-2SLS,Adj. R-squared:,0.1117
No. Observations:,1230,F-statistic:,106.34
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:20,Distribution:,chi2(11)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,-2.8942,2.4682,-1.1726,0.2409,-7.7317,1.9433
exper,0.2094,0.1024,2.0450,0.0409,0.0087,0.4101
expersq,-0.0048,0.0021,-2.2559,0.0241,-0.0089,-0.0006
ne,0.0289,0.1183,0.2442,0.8070,-0.2030,0.2608
nc,0.0029,0.0874,0.0331,0.9736,-0.1684,0.1742
west,-0.0543,0.1229,-0.4419,0.6585,-0.2953,0.1866
ne18,0.0761,0.1389,0.5475,0.5841,-0.1962,0.3483
nc18,-0.0209,0.0948,-0.2205,0.8255,-0.2067,0.1649
west18,0.0235,0.1178,0.1991,0.8422,-0.2075,0.2544


C10.i Results above. The confidence interval suggests a return of about 8.7% to 11.5%

C10.ii ctuit appears uncorrelated with education. This means it is inappropriate as an IV (even if it is exogenous).

C10.iii The coefficient is higher with a suggested return of a little under 14%

C10.iv Results above. ctuit is statistically significant in the reduced form.

C10.v Results above. The confidence interval is now about 2.1% to 47.9%. While statistically significant, this cannot hope to provide any insight on the returns to education.

C10.vi The IV procedure from part v is not convincing, due to the use of an unsatisfactory IV and a conclusion of no practical use.

In [41]:
# Exercise 11
voucher = pd.read_stata("./stata/voucher.dta")
print("No vouchers:", voucher[voucher.selectyrs == 0].shape[0])
print("4 Years:", voucher[voucher.selectyrs == 4].shape[0])
print("4 Attendance:", voucher[voucher.choiceyrs == 4].shape[0])

No vouchers: 468
4 Years: 108
4 Attendance: 56


In [42]:
X = sm.add_constant(voucher[["selectyrs"]])
model = sm.OLS(voucher.choiceyrs, X).fit()
model.summary()

0,1,2,3
Dep. Variable:,choiceyrs,R-squared:,0.79
Model:,OLS,Adj. R-squared:,0.79
Method:,Least Squares,F-statistic:,3713.0
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.0
Time:,05:22:20,Log-Likelihood:,-857.61
No. Observations:,990,AIC:,1719.0
Df Residuals:,988,BIC:,1729.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.0199,0.025,0.809,0.419,-0.028,0.068
selectyrs,0.7668,0.013,60.931,0.000,0.742,0.792

0,1,2,3
Omnibus:,456.732,Durbin-Watson:,1.452
Prob(Omnibus):,0.0,Jarque-Bera (JB):,3219.518
Skew:,-1.986,Prob(JB):,0.0
Kurtosis:,10.891,Cond. No.,2.98


In [43]:
X = sm.add_constant(voucher[["choiceyrs"]])
model = sm.OLS(voucher.mnce, X).fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,mnce,R-squared:,0.012
Model:,OLS,Adj. R-squared:,0.011
Method:,Least Squares,F-statistic:,12.22
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,0.000494
Time:,05:22:20,Log-Likelihood:,-4406.2
No. Observations:,990,AIC:,8816.0
Df Residuals:,988,BIC:,8826.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,46.2344,0.851,54.348,0.000,44.565,47.904
choiceyrs,-1.8370,0.526,-3.495,0.000,-2.868,-0.806

0,1,2,3
Omnibus:,4.388,Durbin-Watson:,1.899
Prob(Omnibus):,0.111,Jarque-Bera (JB):,4.356
Skew:,0.162,Prob(JB):,0.113
Kurtosis:,3.006,Cond. No.,2.48


In [44]:
X = sm.add_constant(voucher[["mnce", "black", "hispanic", "female"]])
model = sm.OLS(voucher.choiceyrs, X).fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,choiceyrs,R-squared:,0.092
Model:,OLS,Adj. R-squared:,0.088
Method:,Least Squares,F-statistic:,24.87
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,1.22e-19
Time:,05:22:21,Log-Likelihood:,-1582.1
No. Observations:,990,AIC:,3174.0
Df Residuals:,985,BIC:,3199.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.2824,0.147,1.916,0.056,-0.007,0.572
mnce,-0.0020,0.002,-1.065,0.287,-0.006,0.002
black,0.8909,0.108,8.228,0.000,0.678,1.103
hispanic,1.1214,0.137,8.200,0.000,0.853,1.390
female,0.1046,0.077,1.367,0.172,-0.046,0.255

0,1,2,3
Omnibus:,111.906,Durbin-Watson:,0.495
Prob(Omnibus):,0.0,Jarque-Bera (JB):,150.702
Skew:,0.954,Prob(JB):,1.89e-33
Kurtosis:,3.112,Cond. No.,256.0


In [45]:
X = sm.add_constant(voucher[["black", "hispanic", "female"]])
IV2SLS(voucher.mnce, X, voucher.choiceyrs, voucher.selectyrs).fit()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,mnce,R-squared:,0.0864
Estimator:,IV-2SLS,Adj. R-squared:,0.0827
No. Observations:,990,F-statistic:,80.276
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:21,Distribution:,chi2(4)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,57.068,1.8772,30.400,0.0000,53.389,60.747
black,-16.317,1.9476,-8.3779,0.0000,-20.134,-12.500
hispanic,-13.775,2.4128,-5.7093,0.0000,-18.504,-9.0464
female,1.3197,1.2779,1.0327,0.3017,-1.1850,3.8244
choiceyrs,-0.2413,0.5918,-0.4078,0.6834,-1.4012,0.9186


In [46]:
X = sm.add_constant(voucher[["choiceyrs", "black", "hispanic", "female", "mnce90"]])
model = sm.OLS(voucher.mnce, X, missing="drop").fit()
model.summary()

0,1,2,3
Dep. Variable:,mnce,R-squared:,0.424
Model:,OLS,Adj. R-squared:,0.415
Method:,Least Squares,F-statistic:,47.34
Date:,"Sun, 27 Mar 2022",Prob (F-statistic):,1.28e-36
Time:,05:22:21,Log-Likelihood:,-1372.4
No. Observations:,328,AIC:,2757.0
Df Residuals:,322,BIC:,2780.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,22.1529,3.620,6.119,0.000,15.030,29.275
choiceyrs,0.4106,0.736,0.558,0.577,-1.037,1.858
black,-8.3052,2.546,-3.262,0.001,-13.314,-3.296
hispanic,-4.1050,3.362,-1.221,0.223,-10.720,2.510
female,-0.8828,1.776,-0.497,0.619,-4.377,2.611
mnce90,0.6204,0.048,12.817,0.000,0.525,0.716

0,1,2,3
Omnibus:,2.283,Durbin-Watson:,2.279
Prob(Omnibus):,0.319,Jarque-Bera (JB):,1.993
Skew:,-0.162,Prob(JB):,0.369
Kurtosis:,3.201,Cond. No.,257.0


In [47]:
X = sm.add_constant(voucher[["black", "hispanic", "female", "mnce90"]])
IV2SLS(voucher.mnce, X, voucher.choiceyrs, voucher.selectyrs).fit()

  x = pd.concat(x[::order], 1)
Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,mnce,R-squared:,0.4173
Estimator:,IV-2SLS,Adj. R-squared:,0.4082
No. Observations:,328,F-statistic:,258.25
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:21,Distribution:,chi2(5)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,21.539,3.5860,6.0064,0.0000,14.511,28.567
black,-9.0671,2.5561,-3.5473,0.0004,-14.077,-4.0573
hispanic,-5.0037,3.4392,-1.4549,0.1457,-11.745,1.7371
female,-1.0205,1.7732,-0.5755,0.5650,-4.4960,2.4550
mnce90,0.6288,0.0469,13.418,0.0000,0.5370,0.7207
choiceyrs,1.7994,0.9379,1.9186,0.0550,-0.0388,3.6376


In [48]:
X = sm.add_constant(voucher[["black", "hispanic", "female"]])
endog = voucher[["choiceyrs1", "choiceyrs2", "choiceyrs3", "choiceyrs4"]]
iv = voucher[["selectyrs1", "selectyrs2", "selectyrs3", "selectyrs4"]]
IV2SLS(voucher.mnce, X, endog, iv).fit()

0,1,2,3
Dep. Variable:,mnce,R-squared:,0.0850
Estimator:,IV-2SLS,Adj. R-squared:,0.0785
No. Observations:,990,F-statistic:,83.853
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:21,Distribution:,chi2(7)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,56.886,1.8951,30.017,0.0000,53.171,60.600
black,-16.297,2.0161,-8.0834,0.0000,-20.249,-12.346
hispanic,-13.366,2.5683,-5.2041,0.0000,-18.400,-8.3321
female,1.3664,1.2790,1.0683,0.2854,-1.1404,3.8732
choiceyrs1,0.3900,2.4611,0.1585,0.8741,-4.4337,5.2136
choiceyrs2,0.7737,3.9191,0.1974,0.8435,-6.9077,8.4550
choiceyrs3,-4.2848,3.5596,-1.2037,0.2287,-11.261,2.6919
choiceyrs4,2.4071,4.0715,0.5912,0.5544,-5.5728,10.387


In [49]:
X = sm.add_constant(voucher[["black", "hispanic", "female", "mnce90"]])
endog = voucher[["choiceyrs1", "choiceyrs2", "choiceyrs3", "choiceyrs4"]]
iv = voucher[["selectyrs1", "selectyrs2", "selectyrs3", "selectyrs4"]]
IV2SLS(voucher.mnce, X, endog, iv).fit()

0,1,2,3
Dep. Variable:,mnce,R-squared:,0.4067
Estimator:,IV-2SLS,Adj. R-squared:,0.3918
No. Observations:,328,F-statistic:,253.82
Date:,"Sun, Mar 27 2022",P-value (F-stat),0.0000
Time:,05:22:21,Distribution:,chi2(8)
Cov. Estimator:,robust,,
,,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,20.911,3.7536,5.5710,0.0000,13.554,28.268
black,-8.3469,2.6815,-3.1128,0.0019,-13.602,-3.0913
hispanic,-3.5895,3.7491,-0.9574,0.3383,-10.938,3.7585
female,-1.0311,1.8038,-0.5716,0.5676,-4.5664,2.5042
mnce90,0.6357,0.0492,12.926,0.0000,0.5393,0.7321
choiceyrs1,-2.1590,4.1552,-0.5196,0.6033,-10.303,5.9851
choiceyrs2,1.4931,3.0414,0.4909,0.6235,-4.4679,7.4541
choiceyrs3,1.0816,6.1742,0.1752,0.8609,-11.020,13.183
choiceyrs4,13.926,7.3728,1.8888,0.0589,-0.5246,28.376


C11.i 468 students were never awarded a voucher, 108 had a voucher available for 4 years, 56 attended a choice school for 4 years.

C11.ii Results above. selectyrs is positively and significantly correlated with choiceyrs. This makes sense since being selected for a voucher should mean a student can attend a choice school. If selectyrs is exogenous then it is a suitable IV. Given that selection was done by lottery, this should be the case (acknowledging the caveats in the problem's introduction)

C11.iii Results above. There is a negative and significant correlation between mnce and choiceyrs. This is curious since we would not expect math scores to decrease with attendance at a choice school (why is it choice?). Adding demographic dummies shrinks the coefficient which remains negative, but practically small and not statistically significant.

C11.iv Endogenity means that choiceyrs is correlated with some factor in the error (which would be factors that affect math scores). The concept of a 'choice school' is somewhat ambiguous (i.e. is choice only obtainable through voucher or is choice simply a desireable school). There is one observation that has a positive value for choiceyrs and 0 selectyrs and 2 which have choiceyrs greater than selectyrs, meaning it is possible selection isn't a necessary condition for choice. It is also possible that math scores affect the ability to get into a choice school (parents lobbying for selection) which would make choiceyrs endogenous.

C11.v The IV produces a negative coefficient for choiceyrs, and so does not produce a positive effect of attending a choice school (though it is not statistically significant and so it is better to say we find no evidence of an effect). black and hispanic have large and significant effects, while female does not.

C11.vi Results above. $\beta_1$ is considerably higher in the IV than OLS, though both are only marginally significant (specifically significant at the 10% level). The IV estimate appears to increase the math score by about 2 percent (assuming it's measured in percent) for each year in a choice school.

C11.vii The inclusion of mnce90 has dropped the number of observations to about 1/3 of our original estimation.

C11.viii Results above. The results find no evidence of an effect of school choice on math scores, with the strongest effect being a curious result of a negative result for the 3 year category (still not statistically significant). Adding mnce90 back in produces an interesting result of a positive, significant, and meaningful effect of 4 year participation (i.e. complete participation), but this significance is at the 10% level and is not what the question was asking (but still an interesting result). The results from the first IV make sense if we don't believe school choice doesn't affect outcomes, but the coefficient for the 3rd dummy is suspicious. The results from the regression that includes mnce90 would be more consistent with what I would expect.