# Chapter 14

In [1]:
import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
from linearmodels import FirstDifferenceOLS, PanelOLS, RandomEffects
from scipy import stats

In [2]:
# Exercise 1
# Copying from ch13.5 and updating to use linearmodels
rental = pd.read_stata("./stata/RENTAL.DTA")
rental_panel = rental.set_index(["city", "year"])
X = sm.add_constant(rental_panel[["y90", "lpop", "lavginc", "pctstu"]])
model = sm.OLS(rental_panel.lrent, X, missing="drop").fit()
model.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lrent,R-squared:,0.861
Model:,OLS,Adj. R-squared:,0.857
Method:,Least Squares,F-statistic:,190.9
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,9.41e-52
Time:,16:06:14,Log-Likelihood:,86.161
No. Observations:,128,AIC:,-162.3
Df Residuals:,123,BIC:,-148.1
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-0.5688,0.535,-1.063,0.290,-1.628,0.490
y90,0.2622,0.035,7.543,0.000,0.193,0.331
lpop,0.0407,0.023,1.807,0.073,-0.004,0.085
lavginc,0.5714,0.053,10.762,0.000,0.466,0.677
pctstu,0.0050,0.001,4.949,0.000,0.003,0.007

0,1,2,3
Omnibus:,34.539,Durbin-Watson:,1.236
Prob(Omnibus):,0.0,Jarque-Bera (JB):,58.256
Skew:,1.255,Prob(JB):,2.24e-13
Kurtosis:,5.15,Cond. No.,1620.0


In [3]:
X = rental_panel[["y90", "lpop", "lavginc", "pctstu"]]
model = FirstDifferenceOLS(rental_panel.lrent, X).fit()
model

  df.index = df.index.set_levels(final_levels, [0, 1])


0,1,2,3
Dep. Variable:,lrent,R-squared:,0.9765
Estimator:,FirstDifferenceOLS,R-squared (Between):,0.9391
No. Observations:,64,R-squared (Within):,0.9765
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.9392
Time:,16:06:14,Log-likelihood,65.272
Cov. Estimator:,Unadjusted,,
,,F-statistic:,624.15
Entities:,64,P-value,0.0000
Avg Obs:,2.0000,Distribution:,"F(4,60)"
Min Obs:,2.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
y90,0.3855,0.0368,10.469,0.0000,0.3119,0.4592
lpop,0.0722,0.0883,0.8178,0.4167,-0.1045,0.2490
lavginc,0.3100,0.0665,4.6627,0.0000,0.1770,0.4429
pctstu,0.0112,0.0041,2.7114,0.0087,0.0029,0.0195


In [4]:
model = PanelOLS(rental_panel.lrent, X, entity_effects=True).fit()
model

0,1,2,3
Dep. Variable:,lrent,R-squared:,0.9765
Estimator:,PanelOLS,R-squared (Between):,0.9391
No. Observations:,128,R-squared (Within):,0.9765
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.9392
Time:,16:06:14,Log-likelihood,219.27
Cov. Estimator:,Unadjusted,,
,,F-statistic:,624.15
Entities:,64,P-value,0.0000
Avg Obs:,2.0000,Distribution:,"F(4,60)"
Min Obs:,2.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
y90,0.3855,0.0368,10.469,0.0000,0.3119,0.4592
lpop,0.0722,0.0883,0.8178,0.4167,-0.1045,0.2490
lavginc,0.3100,0.0665,4.6627,0.0000,0.1770,0.4429
pctstu,0.0112,0.0041,2.7114,0.0087,0.0029,0.0195


C1.i Results reported above. The $y90$ dummy is 0.262 and significant at the 1% level. This would mean that rents have increased over 10 years.

C1.ii Given that we started with an unobserved effects model it would suggest that we should not trust the standard errors. This is because the pooled OLS does not account for the unobserved effects which would mean there are endogeneity issues.

C1.iii the coefficient for $pctstu$ increases from 0.005 to 0.0112 and is still significant (though with a smaller t-statistic than the previous one). We may conclude that housing prices increase with the size of the student population, provided that there are not time varying factors that might be affecting $pctstu$ that we have not observed.

C1.iv Results above. The coefficients could be explicitly checked in the program but there are only 4 estimates to check and it is clear they are all identical.

In [5]:
# Exercise 2
crime4 = pd.read_stata("./stata/CRIME4.DTA").set_index(["county", "year"])
X = crime4[["d82", "d83", "d84", "d85", "d86", "d87", "lprbarr", "lprbconv", "lprbpris", "lavgsen", "lpolpc"]]
model = PanelOLS(crime4.lcrmrte, X, entity_effects=True).fit()
restricted_ssr = model.model_ss
model

0,1,2,3
Dep. Variable:,lcrmrte,R-squared:,0.4342
Estimator:,PanelOLS,R-squared (Between):,0.7929
No. Observations:,630,R-squared (Within):,0.4342
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.7921
Time:,16:06:14,Log-likelihood,405.58
Cov. Estimator:,Unadjusted,,
,,F-statistic:,36.911
Entities:,90,P-value,0.0000
Avg Obs:,7.0000,Distribution:,"F(11,529)"
Min Obs:,7.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
d82,0.0126,0.0215,0.5840,0.5595,-0.0297,0.0549
d83,-0.0793,0.0213,-3.7152,0.0002,-0.1212,-0.0374
d84,-0.1177,0.0216,-5.4467,0.0000,-0.1602,-0.0753
d85,-0.1120,0.0218,-5.1248,0.0000,-0.1549,-0.0690
d86,-0.0818,0.0214,-3.8189,0.0001,-0.1239,-0.0397
d87,-0.0405,0.0210,-1.9236,0.0549,-0.0818,0.0009
lprbarr,-0.3598,0.0324,-11.098,0.0000,-0.4235,-0.2961
lprbconv,-0.2859,0.0212,-13.474,0.0000,-0.3276,-0.2442
lprbpris,-0.1828,0.0325,-5.6308,0.0000,-0.2465,-0.1190


In [6]:
formula = "lcrmrte~d82 + d83 + d84 + d85 + d86 + d87 + lprbarr + lprbconv + lprbpris + lavgsen + lpolpc + lprbarr + lprbconv + lwcon + lwtuc + lwtrd + lwfir + lwser + lwmfg + lwfed + lwsta + lwloc + EntityEffects"
unrestricted_model = PanelOLS.from_formula(formula, data=crime4).fit()
unrestricted_ssr = unrestricted_model.model_ss
unrestricted_model

0,1,2,3
Dep. Variable:,lcrmrte,R-squared:,0.4575
Estimator:,PanelOLS,R-squared (Between):,0.9232
No. Observations:,630,R-squared (Within):,0.4575
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.9222
Time:,16:06:14,Log-likelihood,418.79
Cov. Estimator:,Unadjusted,,
,,F-statistic:,21.923
Entities:,90,P-value,0.0000
Avg Obs:,7.0000,Distribution:,"F(20,520)"
Min Obs:,7.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
d82,0.0189,0.0251,0.7519,0.4524,-0.0305,0.0682
d83,-0.0553,0.0330,-1.6739,0.0948,-0.1202,0.0096
d84,-0.0615,0.0411,-1.4975,0.1349,-0.1422,0.0192
d85,-0.0397,0.0562,-0.7071,0.4798,-0.1500,0.0706
d86,-0.0001,0.0680,-0.0017,0.9987,-0.1337,0.1335
d87,0.0537,0.0799,0.6722,0.5018,-0.1033,0.2107
lprbarr,-0.3564,0.0322,-11.081,0.0000,-0.4195,-0.2932
lprbconv,-0.2860,0.0211,-13.584,0.0000,-0.3273,-0.2446
lprbpris,-0.1751,0.0323,-5.4154,0.0000,-0.2387,-0.1116


In [7]:
# Frustratingly linearmodels does not seem to have an F-test so we make our own
f_stat = ((model.resid_ss - unrestricted_model.resid_ss) / unrestricted_model.resid_ss) * (unrestricted_model.df_resid / 9)
p_val = 1 - stats.f.cdf(f_stat, 9, unrestricted_model.df_resid)
p_val

0.009047103465324335

C2.i There doesn't seem to be much difference between the differenced and fixed effects models (excepting the dummies but we should have expected this before running the program). The fixed effects estimates are generally larger in magnitude (excepting log(avgsen) which was not significant), but it is likely safer to say the two approaches reach similar conclusions.

C2.ii The coefficients for the criminal justice variables don't seem to change very much (some are almost identical) with the inclusion of the income variables.

C2.iii The signs don't match what we expected given that some of the wages have positive coefficients (improved economic circumstances inexplicably leading to increases in crime). A test for join significance gave a p-value of less than 0.01 meaning that the economic variables are jointly significant.

In [8]:
# Exercise 3
jtrain = pd.read_stata("./stata/JTRAIN.DTA")
jtrain.set_index(["fcode", "year"], inplace=True)
model = PanelOLS(jtrain.hrsemp, jtrain[["d88", "d89", "grant", "grant_1", "lemploy"]], entity_effects=True).fit()
model

Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,hrsemp,R-squared:,0.4909
Estimator:,PanelOLS,R-squared (Between):,0.2473
No. Observations:,390,R-squared (Within):,0.4909
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.3195
Time:,16:06:14,Log-likelihood,-1503.7
Cov. Estimator:,Unadjusted,,
,,F-statistic:,48.206
Entities:,135,P-value,0.0000
Avg Obs:,2.8889,Distribution:,"F(5,250)"
Min Obs:,1.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
d88,-1.0987,1.9832,-0.5540,0.5801,-5.0045,2.8071
d89,4.0900,2.4811,1.6485,0.1005,-0.7965,8.9766
grant,34.228,2.8584,11.974,0.0000,28.598,39.858
grant_1,0.5041,4.1273,0.1221,0.9029,-7.6247,8.6328
lemploy,-0.1763,4.2879,-0.0411,0.9672,-8.6213,8.2688


In [9]:
model.entity_info[-1] * 3

405.0

C3.i Results above. 135 firms are used in the estimation and so if all years were available it would mean 405 observations (compared to the 390 actually used).

C3.ii Grant is large, positive and significant. It means that a grant resulted in about 34 extra hours of training for each worker over the year.

C3.iii It's not surprising that the lag for grant isn't significant since training decisions are likely going to be made based on this year's grant.

C3.iv We could check the coefficient for employees to see if large firms provide more training. Here the coefficient is small and negative (though insignificant) so we do not have any evidence that larger firms provide more training.

In [10]:
# Exercise 4
ezunem = pd.read_excel("./excel/ezunem.xls", header=None, 
                       names=["year", "uclms", "ez", "d81", "d82", "d83", "d84", 
                              "d85", "d86", "d87", "d88", "c1", "c2", "c3",
                              "c4", "c5", "c6", "c7", "c8", "c9", "c10", "c11",
                              "c12", "c13", "c14", "c15", "c16", "c17", "c18",
                              "c19", "c20", "c21", "c22", "luclms", "guclms",
                              "cez", "city"], index_col=[-1,0])
ezunem.loc[ezunem.guclms == ".", "guclms"] = np.nan
ezunem.loc[ezunem.cez == ".", "cez"] = np.nan
ezunem["guclms"] = ezunem.guclms.astype("float64")
ezunem["cez"] = ezunem.cez.astype("float64")

In [11]:
X = sm.add_constant(ezunem[["cez"]])
PanelOLS(ezunem.guclms, X, entity_effects=True).fit()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,guclms,R-squared:,0.0273
Estimator:,PanelOLS,R-squared (Between):,-0.0347
No. Observations:,176,R-squared (Within):,0.0273
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.0259
Time:,16:06:15,Log-likelihood,-56.885
Cov. Estimator:,Unadjusted,,
,,F-statistic:,4.2939
Entities:,22,P-value,0.0399
Avg Obs:,8.0000,Distribution:,"F(1,153)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,-0.1451,0.0279,-5.2032,0.0000,-0.2002,-0.0900
cez,-0.2512,0.1212,-2.0722,0.0399,-0.4906,-0.0117


In [12]:
PanelOLS(ezunem.guclms, X, entity_effects=True, time_effects=True).fit()

0,1,2,3
Dep. Variable:,guclms,R-squared:,0.0338
Estimator:,PanelOLS,R-squared (Between):,-0.0096
No. Observations:,176,R-squared (Within):,0.0258
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.0250
Time:,16:06:15,Log-likelihood,29.923
Cov. Estimator:,Unadjusted,,
,,F-statistic:,5.1002
Entities:,22,P-value,0.0254
Avg Obs:,8.0000,Distribution:,"F(1,146)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
const,-0.1485,0.0176,-8.4502,0.0000,-0.1832,-0.1138
cez,-0.1919,0.0850,-2.2584,0.0254,-0.3599,-0.0240


C4.i The real trick concerns $c_i$. Because $t$ increases one unit, we have $c_i * 1$, which is a constant.

C4.ii The effect of the enterprise zone appears to be to reduce unemployment. 

C4.iii The magnitude of the effect is slightly smaller but more precise (smaller standard error). Overall the effect seems to be the same.

In [13]:
# Exercise 5
wagepan = pd.read_stata("./stata/wagepan.dta").set_index(["nr", "year"])
wagepan

Unnamed: 0_level_0,Unnamed: 1_level_0,agric,black,bus,construc,ent,exper,fin,hisp,poorhlth,hours,...,union,lwage,d81,d82,d83,d84,d85,d86,d87,expersq
nr,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
13,1980,0,0,1,0,0,1,0,0,0,2672,...,0,1.197540,0,0,0,0,0,0,0,1
13,1981,0,0,0,0,0,2,0,0,0,2320,...,1,1.853060,1,0,0,0,0,0,0,4
13,1982,0,0,1,0,0,3,0,0,0,2940,...,0,1.344462,0,1,0,0,0,0,0,9
13,1983,0,0,1,0,0,4,0,0,0,2960,...,0,1.433213,0,0,1,0,0,0,0,16
13,1984,0,0,0,0,0,5,0,0,0,3071,...,0,1.568125,0,0,0,1,0,0,0,25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12548,1983,0,0,0,1,0,8,0,0,0,2080,...,0,1.591879,0,0,1,0,0,0,0,64
12548,1984,0,0,0,1,0,9,0,0,0,2080,...,1,1.212543,0,0,0,1,0,0,0,81
12548,1985,0,0,0,1,0,10,0,0,0,2080,...,0,1.765962,0,0,0,0,1,0,0,100
12548,1986,0,0,0,0,0,11,0,0,0,2080,...,1,1.745894,0,0,0,0,0,1,0,121


In [14]:
wagepan.columns

Index(['agric', 'black', 'bus', 'construc', 'ent', 'exper', 'fin', 'hisp',
       'poorhlth', 'hours', 'manuf', 'married', 'min', 'nrthcen', 'nrtheast',
       'occ1', 'occ2', 'occ3', 'occ4', 'occ5', 'occ6', 'occ7', 'occ8', 'occ9',
       'per', 'pro', 'pub', 'rur', 'south', 'educ', 'tra', 'trad', 'union',
       'lwage', 'd81', 'd82', 'd83', 'd84', 'd85', 'd86', 'd87', 'expersq'],
      dtype='object')

In [15]:
fe_est = PanelOLS.from_formula("lwage~exper + expersq + married + union + occ2 + occ3 + occ4 + occ5 + occ6 + occ7 + occ8 + occ9 + EntityEffects", data=wagepan).fit()
fe_est

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1802
Estimator:,PanelOLS,R-squared (Between):,0.5190
No. Observations:,4360,R-squared (Within):,0.1802
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.5042
Time:,16:06:15,Log-likelihood,-1325.9
Cov. Estimator:,Unadjusted,,
,,F-statistic:,69.657
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(12,3803)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exper,0.1155,0.0085,13.555,0.0000,0.0988,0.1322
expersq,-0.0043,0.0006,-6.9785,0.0000,-0.0054,-0.0031
married,0.0446,0.0183,2.4335,0.0150,0.0087,0.0806
union,0.0826,0.0194,4.2641,0.0000,0.0446,0.1206
occ2,-0.0120,0.0323,-0.3726,0.7094,-0.0754,0.0513
occ3,-0.0616,0.0378,-1.6317,0.1028,-0.1356,0.0124
occ4,-0.0798,0.0307,-2.5964,0.0095,-0.1400,-0.0195
occ5,-0.0294,0.0304,-0.9675,0.3334,-0.0889,0.0302
occ6,-0.0283,0.0306,-0.9237,0.3557,-0.0884,0.0318


C5.i Different occupations are likely to respond to unionization differently (different rates of unionization, different wages).

C5.ii If everyone stayed in the same occupation over the periods being used we would not need the dummies since these would be absorbed by the fixed effects.

C5.iii The coefficient for union remains significant and doesn't change very much (about 0.0026).

In [16]:
# Exercise 6
wagepan["t"] = wagepan.index.get_level_values(1) - 1979
wagepan["union_t"] = wagepan.union * wagepan.t

In [17]:
PanelOLS.from_formula("lwage~exper + expersq + married + union + union_t + EntityEffects", data=wagepan).fit()

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.1797
Estimator:,PanelOLS,R-squared (Between):,0.5653
No. Observations:,4360,R-squared (Within):,0.1797
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.5484
Time:,16:06:15,Log-likelihood,-1327.3
Cov. Estimator:,Unadjusted,,
,,F-statistic:,166.89
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(5,3810)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exper,0.1198,0.0085,14.125,0.0000,0.1032,0.1364
expersq,-0.0042,0.0006,-6.9843,0.0000,-0.0054,-0.0030
married,0.0449,0.0183,2.4546,0.0141,0.0090,0.0808
union,0.1501,0.0314,4.7779,0.0000,0.0885,0.2117
union_t,-0.0157,0.0057,-2.7416,0.0061,-0.0269,-0.0045


In [18]:
RandomEffects.from_formula("lwage ~ educ + black + hisp + exper + expersq + married + union + union_t", data=wagepan).fit()

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.7503
Estimator:,RandomEffects,R-squared (Between):,0.9565
No. Observations:,4360,R-squared (Within):,0.1789
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.9225
Time:,16:06:15,Log-likelihood,-1627.8
Cov. Estimator:,Unadjusted,,
,,F-statistic:,1634.2
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(8,4352)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
educ,0.0916,0.0026,35.754,0.0000,0.0866,0.0967
black,-0.1512,0.0475,-3.1851,0.0015,-0.2443,-0.0581
hisp,0.0054,0.0413,0.1319,0.8950,-0.0755,0.0864
exper,0.1131,0.0082,13.778,0.0000,0.0970,0.1292
expersq,-0.0040,0.0006,-6.7331,0.0000,-0.0051,-0.0028
married,0.0639,0.0167,3.8159,0.0001,0.0311,0.0967
union,0.1631,0.0300,5.4274,0.0000,0.1042,0.2220
union_t,-0.0130,0.0056,-2.3349,0.0196,-0.0239,-0.0021


C6. For both the fixed effect and random effect estimators the union effect increases and the interaction with time decreases. The interaction is negative and statistically significant. Both stories are broadly the same

In [19]:
# Exercise 7
murder = pd.read_stata("./stata/MURDER.DTA").set_index(["state", "year"])
murder_90_93 = murder[murder.index.get_level_values(1).isin([90, 93])]

In [20]:
X = sm.add_constant(murder_90_93[["d93", "exec", "unem"]])
pooled_ols = sm.OLS(murder_90_93.mrdrte, X, missing="drop").fit()
pooled_ols.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,mrdrte,R-squared:,0.102
Model:,OLS,Adj. R-squared:,0.074
Method:,Least Squares,F-statistic:,3.695
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,0.0144
Time:,16:06:15,Log-Likelihood:,-379.81
No. Observations:,102,AIC:,767.6
Df Residuals:,98,BIC:,778.1
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-5.2780,4.428,-1.192,0.236,-14.065,3.509
d93,-2.0674,2.145,-0.964,0.337,-6.323,2.189
exec,0.1277,0.263,0.485,0.629,-0.395,0.650
unem,2.5289,0.782,3.235,0.002,0.978,4.080

0,1,2,3
Omnibus:,148.824,Durbin-Watson:,1.062
Prob(Omnibus):,0.0,Jarque-Bera (JB):,5022.506
Skew:,5.38,Prob(JB):,0.0
Kurtosis:,35.649,Cond. No.,28.0


In [21]:
PanelOLS.from_formula("mrdrte~exec+unem+EntityEffects+TimeEffects", data=murder_90_93).fit()

0,1,2,3
Dep. Variable:,mrdrte,R-squared:,0.1097
Estimator:,PanelOLS,R-squared (Between):,-0.0571
No. Observations:,102,R-squared (Within):,0.0352
Date:,"Tue, Aug 17 2021",R-squared (Overall):,-0.0569
Time:,16:06:15,Log-likelihood,-78.684
Cov. Estimator:,Unadjusted,,
,,F-statistic:,2.9587
Entities:,51,P-value,0.0614
Avg Obs:,2.0000,Distribution:,"F(2,48)"
Min Obs:,2.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exec,-0.1038,0.0434,-2.3918,0.0207,-0.1911,-0.0165
unem,-0.0666,0.1587,-0.4196,0.6766,-0.3857,0.2525


In [22]:
PanelOLS.from_formula("mrdrte~exec+unem+EntityEffects+TimeEffects", data=murder_90_93).fit(cov_type="robust")

0,1,2,3
Dep. Variable:,mrdrte,R-squared:,0.1097
Estimator:,PanelOLS,R-squared (Between):,-0.0571
No. Observations:,102,R-squared (Within):,0.0352
Date:,"Tue, Aug 17 2021",R-squared (Overall):,-0.0569
Time:,16:06:16,Log-likelihood,-78.684
Cov. Estimator:,Robust,,
,,F-statistic:,2.9587
Entities:,51,P-value,0.0614
Avg Obs:,2.0000,Distribution:,"F(2,48)"
Min Obs:,2.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exec,-0.1038,0.0170,-6.1084,0.0000,-0.1380,-0.0697
unem,-0.0666,0.1469,-0.4532,0.6524,-0.3620,0.2288


In [23]:
murder[murder.index.get_level_values(1) == 93].sort_values("exec", ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,id,mrdrte,exec,unem,d90,d93,cmrdrte,cexec,cunem,cexec_1,cunem_1
state,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
TX,93,44,11.9,34,7.0,0,1,-2.200001,23.0,0.8,-11.0,-2.2
VA,93,47,8.3,11,5.0,0,1,-0.5,8.0,0.7,-1.0,0.1
FL,93,10,8.9,7,7.0,0,1,-1.8,-1.0,1.1,1.0,0.6
MO,93,26,11.3,6,6.4,0,1,2.5,1.0,0.7,5.0,-0.6
AZ,93,3,8.6,3,6.2,0,1,0.900001,3.0,0.9,0.0,-0.9
GA,93,11,11.4,3,5.8,0,1,-0.400001,1.0,0.4,-7.0,-0.1
OK,93,37,8.4,2,6.0,0,1,0.4,1.0,0.4,1.0,-1.8
NC,93,34,11.3,2,4.9,0,1,0.6,2.0,0.8,-1.0,-0.4
LA,93,19,20.299999,2,7.4,0,1,3.099998,-2.0,1.2,-5.0,-5.8
AL,93,1,11.6,2,7.5,0,1,0.0,-3.0,0.7,3.0,-1.0


In [24]:
murder_no_texas = murder_90_93[murder_90_93.index.get_level_values(0) != "TX"]
PanelOLS.from_formula("mrdrte~exec+unem+EntityEffects+TimeEffects", data=murder_no_texas).fit()

0,1,2,3
Dep. Variable:,mrdrte,R-squared:,0.0134
Estimator:,PanelOLS,R-squared (Between):,-0.0480
No. Observations:,100,R-squared (Within):,-0.0340
Date:,"Tue, Aug 17 2021",R-squared (Overall):,-0.0480
Time:,16:06:16,Log-likelihood,-77.977
Cov. Estimator:,Unadjusted,,
,,F-statistic:,0.3186
Entities:,51,P-value,0.7287
Avg Obs:,1.9608,Distribution:,"F(2,47)"
Min Obs:,0.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exec,-0.0675,0.1049,-0.6431,0.5233,-0.2785,0.1436
unem,-0.0700,0.1604,-0.4367,0.6643,-0.3927,0.2526


In [25]:
PanelOLS.from_formula("mrdrte~exec+unem+EntityEffects+TimeEffects", data=murder_no_texas).fit(cov_type="robust")

0,1,2,3
Dep. Variable:,mrdrte,R-squared:,0.0134
Estimator:,PanelOLS,R-squared (Between):,-0.0480
No. Observations:,100,R-squared (Within):,-0.0340
Date:,"Tue, Aug 17 2021",R-squared (Overall):,-0.0480
Time:,16:06:16,Log-likelihood,-77.977
Cov. Estimator:,Robust,,
,,F-statistic:,0.3186
Entities:,51,P-value,0.7287
Avg Obs:,1.9608,Distribution:,"F(2,47)"
Min Obs:,0.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exec,-0.0675,0.0791,-0.8530,0.3980,-0.2266,0.0917
unem,-0.0700,0.1462,-0.4790,0.6342,-0.3642,0.2241


In [26]:
PanelOLS.from_formula("mrdrte~exec+unem+EntityEffects+TimeEffects", data=murder).fit()

0,1,2,3
Dep. Variable:,mrdrte,R-squared:,0.0109
Estimator:,PanelOLS,R-squared (Between):,0.1249
No. Observations:,153,R-squared (Within):,0.0026
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.1178
Time:,16:06:16,Log-likelihood,-375.63
Cov. Estimator:,Unadjusted,,
,,F-statistic:,0.5391
Entities:,51,P-value,0.5850
Avg Obs:,3.0000,Distribution:,"F(2,98)"
Min Obs:,3.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
exec,-0.1383,0.1770,-0.7815,0.4364,-0.4896,0.2129
unem,0.2213,0.2964,0.7467,0.4570,-0.3668,0.8095


C7.i $\beta_1$ should have a negative coefficient if executions are a deterrent. $\beta_2$ presumably should be positive since the expectation would be for a negative relationship between income and crime (unemployment lowering income and so increasing muder). The crime relationship may not hold for violent crime.

C7.ii $\beta_1 > 0$ and so if anything there is the opposite of a deterrent effect but result is not significant and so it is better to say that there is no evidence of a deterrent effect.

C7.iii The coefficient is now negative and significant, suggesting a deterrent effect. It's a bit tough to say how practical the effect is (1/10th of a saved life isn't meaningful). 10 executions deterring 1 murder isn't the easiest effect to wrap my head around, so at the best I can say the magnitude is enough that we can't ignore it.

C7.iv Results above, the absolute value of the t-statistic grows substantially.

C7.v TX has the highest value for execution, over three times larger than the next largest (11 in VA).

C7.vi Panel/First Difference should be the same because we only have two periods. The deterrent effect is considerably smaller and no longer significant for both sets of errors. This likely means that Texas is driving the result.

C7.vii Considering all periods in the fixed effects estimation, the coefficient has the largest absolute value (though overall it is a small increase) but the t-statistic is small (-0.78). In this case it appears that adding additional data has muddied the waters.

In [27]:
# Exercise 8
mathpnl = pd.read_stata("./stata/mathpnl.dta").set_index(["distid", "year"])
X = sm.add_constant(mathpnl[["y94", "y95", "y96", "y97", "y98", "lrexpp", "lrexpp_1", "lenrol", "lunch"]])
pooled_ols = sm.OLS(mathpnl.math4, X, missing="drop").fit()
pooled_ols.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,math4,R-squared:,0.505
Model:,OLS,Adj. R-squared:,0.504
Method:,Least Squares,F-statistic:,373.3
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,0.0
Time:,16:06:16,Log-Likelihood:,-12889.0
No. Observations:,3300,AIC:,25800.0
Df Residuals:,3290,BIC:,25860.0
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-31.6616,10.301,-3.074,0.002,-51.859,-11.464
y94,6.3774,0.736,8.662,0.000,4.934,7.821
y95,18.6502,0.786,23.720,0.000,17.109,20.192
y96,18.0334,0.767,23.504,0.000,16.529,19.538
y97,15.3401,0.777,19.747,0.000,13.817,16.863
y98,30.3979,0.783,38.813,0.000,28.862,31.933
lrexpp,0.5339,2.428,0.220,0.826,-4.227,5.295
lrexpp_1,9.0492,2.305,3.925,0.000,4.529,13.569
lenrol,0.5927,0.205,2.890,0.004,0.191,0.995

0,1,2,3
Omnibus:,68.942,Durbin-Watson:,1.121
Prob(Omnibus):,0.0,Jarque-Bera (JB):,147.898
Skew:,-0.019,Prob(JB):,7.66e-33
Kurtosis:,4.036,Cond. No.,1690.0


In [28]:
sm.OLS(pooled_ols.resid, pooled_ols.resid.groupby("distid").shift(1), missing="drop").fit().summary()

0,1,2,3
Dep. Variable:,y,R-squared (uncentered):,0.245
Model:,OLS,Adj. R-squared (uncentered):,0.245
Method:,Least Squares,F-statistic:,893.1
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,3.53e-170
Time:,16:06:16,Log-Likelihood:,-10407.0
No. Observations:,2750,AIC:,20820.0
Df Residuals:,2749,BIC:,20820.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
x1,0.5043,0.017,29.886,0.000,0.471,0.537

0,1,2,3
Omnibus:,164.961,Durbin-Watson:,2.184
Prob(Omnibus):,0.0,Jarque-Bera (JB):,706.198
Skew:,-0.027,Prob(JB):,4.48e-154
Kurtosis:,5.482,Cond. No.,1.0


In [29]:
PanelOLS.from_formula("math4~lrexpp + lrexpp_1 + lenrol + lunch + EntityEffects + TimeEffects", data=mathpnl).fit()

Inputs contain missing values. Dropping rows with missing observations.


0,1,2,3
Dep. Variable:,math4,R-squared:,0.0042
Estimator:,PanelOLS,R-squared (Between):,0.9622
No. Observations:,3300,R-squared (Within):,0.0694
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.9215
Time:,16:06:16,Log-likelihood,-1.163e+04
Cov. Estimator:,Unadjusted,,
,,F-statistic:,2.9056
Entities:,550,P-value,0.0206
Avg Obs:,6.0000,Distribution:,"F(4,2741)"
Min Obs:,6.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
lrexpp,-0.4112,2.4577,-0.1673,0.8671,-5.2302,4.4079
lrexpp_1,7.0030,2.3692,2.9559,0.0031,2.3574,11.649
lenrol,0.2451,1.1004,0.2227,0.8238,-1.9126,2.4027
lunch,0.0615,0.0515,1.1955,0.2320,-0.0394,0.1624


In [30]:
mathpnl["z"] = mathpnl.lrexpp_1 - mathpnl.lrexpp
PanelOLS.from_formula("math4~lrexpp + z + lenrol + lunch + EntityEffects + TimeEffects", data=mathpnl).fit()

0,1,2,3
Dep. Variable:,math4,R-squared:,0.0042
Estimator:,PanelOLS,R-squared (Between):,0.9622
No. Observations:,3300,R-squared (Within):,0.0694
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.9215
Time:,16:06:16,Log-likelihood,-1.163e+04
Cov. Estimator:,Unadjusted,,
,,F-statistic:,2.9056
Entities:,550,P-value,0.0206
Avg Obs:,6.0000,Distribution:,"F(4,2741)"
Min Obs:,6.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
lrexpp,6.5918,2.6379,2.4989,0.0125,1.4193,11.764
z,7.0030,2.3692,2.9559,0.0031,2.3574,11.649
lenrol,0.2451,1.1004,0.2227,0.8238,-1.9126,2.4027
lunch,0.0615,0.0515,1.1955,0.2320,-0.0394,0.1624


C8.i Results above. The lagged expenditure variable is large and significant, but the contemporaneous one is not.

C8.ii Lunch is something of a measure of poverty and so a negative coefficient makes sense if you think that educational outcomes are associated with income.

C8.iii Regressing on the lagged residuals there is fairly strong evidence for serial correlation.

C8.iv Using fixed effects estimation the lagged expenditure variable is still positive and significant (though somewhat smaller).

C8.v They may be absorbed by the fixed effects. Lunch may not be perfectly absorbed by fixed effects, but it doesn't seem like something that should substantially change over the periods we are examining (similar arguent for lenrol).

C8.vi Theta is in the last regression above (coefficient for lrexpp). The standard error is about 2.5.

In [31]:
# Exercise 9
pension = pd.read_stata("./stata/PENSION.DTA")
X = sm.add_constant(pension[["choice", "prftshr", "female", "age", "educ", "finc25", "finc35", "finc50", "finc75", "finc100", "finc101", "wealth89", "stckin89", "irain89"]])
ols = sm.OLS(pension.pctstck, X, missing="drop").fit()
ols.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,pctstck,R-squared:,0.108
Model:,OLS,Adj. R-squared:,0.038
Method:,Least Squares,F-statistic:,1.542
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,0.1
Time:,16:06:17,Log-Likelihood:,-978.02
No. Observations:,194,AIC:,1986.0
Df Residuals:,179,BIC:,2035.0
Df Model:,14,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,128.5442,55.170,2.330,0.021,19.677,237.411
choice,11.7443,6.232,1.884,0.061,-0.553,24.042
prftshr,14.3361,7.231,1.982,0.049,0.066,28.606
female,1.4522,6.766,0.215,0.830,-11.898,14.803
age,-1.5006,0.777,-1.932,0.055,-3.033,0.032
educ,0.7036,1.197,0.588,0.557,-1.658,3.065
finc25,-15.2887,14.229,-1.074,0.284,-43.368,12.790
finc35,0.1880,14.693,0.013,0.990,-28.806,29.182
finc50,-3.8617,14.551,-0.265,0.791,-32.576,24.852

0,1,2,3
Omnibus:,52.568,Durbin-Watson:,1.856
Prob(Omnibus):,0.0,Jarque-Bera (JB):,9.876
Skew:,0.055,Prob(JB):,0.00717
Kurtosis:,1.9,Cond. No.,6740.0


In [32]:
ols.f_test("finc25 = finc35 = finc50 = finc75 = finc100 = finc101 = wealth89 = stckin89 = irain89 = 0")

<class 'statsmodels.stats.contrast.ContrastResults'>
<F test: F=array([[1.03087375]]), p=0.41716067612393964, df_denom=179, df_num=9>

In [33]:
pension.id.unique().size

171

In [34]:
ols_cluster = sm.OLS(pension.pctstck, X, missing="drop").fit(cov_type="cluster", cov_kwds={"groups":pension["id"]})
ols_cluster.summary()

0,1,2,3
Dep. Variable:,pctstck,R-squared:,0.108
Model:,OLS,Adj. R-squared:,0.038
Method:,Least Squares,F-statistic:,2.248
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,0.00796
Time:,16:06:17,Log-Likelihood:,-978.02
No. Observations:,194,AIC:,1986.0
Df Residuals:,179,BIC:,2035.0
Df Model:,14,,
Covariance Type:,cluster,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,128.5442,56.961,2.257,0.024,16.903,240.185
choice,11.7443,6.198,1.895,0.058,-0.403,23.892
prftshr,14.3361,8.208,1.747,0.081,-1.751,30.423
female,1.4522,6.643,0.219,0.827,-11.567,14.472
age,-1.5006,0.810,-1.853,0.064,-3.088,0.087
educ,0.7036,1.177,0.598,0.550,-1.603,3.010
finc25,-15.2887,16.441,-0.930,0.352,-47.512,16.935
finc35,0.1880,16.313,0.012,0.991,-31.784,32.160
finc50,-3.8617,15.963,-0.242,0.809,-35.148,27.425

0,1,2,3
Omnibus:,52.568,Durbin-Watson:,1.856
Prob(Omnibus):,0.0,Jarque-Bera (JB):,9.876
Skew:,0.055,Prob(JB):,0.00717
Kurtosis:,1.9,Cond. No.,6740.0


In [35]:
spouse_ids = pension.id[pension.id.duplicated()]
pension_spouses = pension[pension.id.isin(spouse_ids)]
spouses_diff = pension_spouses[pension_spouses.id.duplicated()].set_index("id") - pension_spouses[~pension_spouses.id.duplicated()].set_index("id")

X = sm.add_constant(spouses_diff[["choice", "prftshr", "female", "age", "educ"]])
ols_spouse_diff = sm.OLS(spouses_diff.pctstck, X, missing="drop").fit()
ols_spouse_diff.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,pctstck,R-squared:,0.206
Model:,OLS,Adj. R-squared:,-0.028
Method:,Least Squares,F-statistic:,0.8821
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,0.514
Time:,16:06:17,Log-Likelihood:,-110.38
No. Observations:,23,AIC:,232.8
Df Residuals:,17,BIC:,239.6
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,15.9267,10.938,1.456,0.164,-7.149,39.003
choice,2.2757,15.000,0.152,0.881,-29.372,33.923
prftshr,-9.2671,16.924,-0.548,0.591,-44.973,26.439
female,21.5510,21.485,1.003,0.330,-23.779,66.881
age,-3.5727,8.999,-0.397,0.696,-22.559,15.414
educ,-1.2203,3.429,-0.356,0.726,-8.455,6.014

0,1,2,3
Omnibus:,6.661,Durbin-Watson:,1.658
Prob(Omnibus):,0.036,Jarque-Bera (JB):,7.755
Skew:,-0.07,Prob(JB):,0.0207
Kurtosis:,5.841,Cond. No.,9.61


C9.i Results above. Choice is only significant at the 10% level. The effect is that choice translates to an 11.7 percentage point increase in stocks.

C9.ii The variables are not individually or jointly significant. F-test above.

C9.iii There are 171 families in the data set.

C9.iv The clustered errors are quite similar. This isn't too surprising given that we have 171 families and the earlier regression had 194 observations and so most people are being considered individually.

C9.v Most of the controls concern family values and so would be differenced out (the family income will be the same for both spouses) when differencing within families.

C9.vi None of the variables remain significant when differencing between spouses. We've reduced our data down to 23 observations and have eliminated a lot of variation (essentially we're reduced to within family differences) and so this is not particularly surprising.

In [36]:
# Exercise 10
airfare = pd.read_stata("./stata/airfare.dta").set_index(["id", "year"])
X = sm.add_constant(airfare[["concen", "ldist", "ldistsq", "y98", "y99", "y00"]])
pooled_ols = sm.OLS(airfare.lfare, X, missing="drop").fit()
pooled_ols.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lfare,R-squared:,0.406
Model:,OLS,Adj. R-squared:,0.405
Method:,Least Squares,F-statistic:,523.2
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,0.0
Time:,16:06:17,Log-Likelihood:,-1512.3
No. Observations:,4596,AIC:,3039.0
Df Residuals:,4589,BIC:,3084.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,6.2093,0.421,14.762,0.000,5.385,7.034
concen,0.3601,0.030,11.976,0.000,0.301,0.419
ldist,-0.9016,0.128,-7.029,0.000,-1.153,-0.650
ldistsq,0.1030,0.010,10.593,0.000,0.084,0.122
y98,0.0211,0.014,1.504,0.133,-0.006,0.049
y99,0.0378,0.014,2.696,0.007,0.010,0.065
y00,0.0999,0.014,7.112,0.000,0.072,0.127

0,1,2,3
Omnibus:,185.979,Durbin-Watson:,0.501
Prob(Omnibus):,0.0,Jarque-Bera (JB):,100.886
Skew:,0.194,Prob(JB):,1.24e-22
Kurtosis:,2.387,Cond. No.,4130.0


In [37]:
sm.OLS(airfare.lfare, X, missing="drop").fit(cov_type="HAC", cov_kwds={"maxlags": 4}).summary()

0,1,2,3
Dep. Variable:,lfare,R-squared:,0.406
Model:,OLS,Adj. R-squared:,0.405
Method:,Least Squares,F-statistic:,195.6
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,1.17e-222
Time:,16:06:17,Log-Likelihood:,-1512.3
No. Observations:,4596,AIC:,3039.0
Df Residuals:,4589,BIC:,3084.0
Df Model:,6,,
Covariance Type:,HAC,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,6.2093,0.817,7.601,0.000,4.608,7.810
concen,0.3601,0.053,6.769,0.000,0.256,0.464
ldist,-0.9016,0.244,-3.696,0.000,-1.380,-0.424
ldistsq,0.1030,0.018,5.693,0.000,0.068,0.138
y98,0.0211,0.008,2.805,0.005,0.006,0.036
y99,0.0378,0.010,3.954,0.000,0.019,0.057
y00,0.0999,0.011,9.105,0.000,0.078,0.121

0,1,2,3
Omnibus:,185.979,Durbin-Watson:,0.501
Prob(Omnibus):,0.0,Jarque-Bera (JB):,100.886
Skew:,0.194,Prob(JB):,1.24e-22
Kurtosis:,2.387,Cond. No.,4130.0


In [38]:
np.exp(abs(pooled_ols.params["ldist"]) / (2 * pooled_ols.params["ldistsq"]))

79.5087919446782

In [39]:
np.exp(airfare.ldist.min())

94.99999859689882

In [40]:
RandomEffects.from_formula("lfare~concen+ldist+ldistsq+y98+y99+y00", data=airfare).fit()

0,1,2,3
Dep. Variable:,lfare,R-squared:,0.9840
Estimator:,RandomEffects,R-squared (Between):,0.9958
No. Observations:,4596,R-squared (Within):,0.1348
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.9954
Time:,16:06:17,Log-likelihood,3741.8
Cov. Estimator:,Unadjusted,,
,,F-statistic:,4.692e+04
Entities:,1149,P-value,0.0000
Avg Obs:,4.0000,Distribution:,"F(6,4590)"
Min Obs:,4.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
concen,0.2095,0.0267,7.8481,0.0000,0.1572,0.2618
ldist,1.0363,0.0181,57.142,0.0000,1.0007,1.0719
ldistsq,-0.0444,0.0025,-17.894,0.0000,-0.0493,-0.0396
y98,0.0226,0.0045,5.0347,0.0000,0.0138,0.0313
y99,0.0368,0.0045,8.2116,0.0000,0.0280,0.0456
y00,0.0983,0.0045,21.921,0.0000,0.0895,0.1071


In [41]:
PanelOLS.from_formula("lfare~concen+EntityEffects+TimeEffects", data=airfare).fit()

0,1,2,3
Dep. Variable:,lfare,R-squared:,0.0095
Estimator:,PanelOLS,R-squared (Between):,0.0395
No. Observations:,4596,R-squared (Within):,0.0019
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.0394
Time:,16:06:17,Log-likelihood,4435.1
Cov. Estimator:,Unadjusted,,
,,F-statistic:,32.965
Entities:,1149,P-value,0.0000
Avg Obs:,4.0000,Distribution:,"F(1,3443)"
Min Obs:,4.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
concen,0.1689,0.0294,5.7415,0.0000,0.1112,0.2265


C10.i Results above. If $\Delta concen = .1$ then $\Delta lfare = 0.036$ or about a 3.6% increase.

C10.ii The 95% confidence interval on $\beta_1$ is 0.301 to 0.419. This would only be reliable if there was no serial correlation which doesn't seem all that likely. Using the HAC option we get a wider confidence interval of 0.256 to 0.464. Given that we specified a model with a time invariant component, the robust errors seem more appropriate here.

C10.iii The quadratic relationship accounts for a U-shaped relationship between distance and fare. The turning point is at about 79.5 miles. The shortest distance in the data set is about 95 miles and so the turning point is outside the range of the data.

C10.iv $\beta_1$ is considerably smaller at about 0.210

C10.v The FE estimate is a little smaller at 0.169. The results are close because the quasi-demeaned results are very close to fully demeaned results (about 10% difference).

c10.vi Population and occupation in the cities at the origin/destination are two factors that could be captured in the fixed effects (more people producing more travel, certain jobs having more travel than others).

C10.vii Given that our focus is on concentration, the fixed effects estimator is the most convincing and produces a positive, significant relationship between concentration and airfare. This seems to make intuitive sense in addition to the credibility of the estimation (though omitted time varying factors would potentially be a concern if they could be identified).

In [42]:
# Exercise 11
# There seems to be an error in the textbook. Table 14.1 does not have educ, married, and union but 14.2 does.
wagepan = pd.read_stata("./stata/wagepan.dta").set_index(["nr", "year"])
X = sm.add_constant(wagepan[["educ", "black", "hisp", "exper", "expersq", "married", "union"]])
pooled_ols = sm.OLS(wagepan.lwage, X, missing="drop").fit(cov_type="HAC", cov_kwds={"maxlags": 7})
pooled_ols.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lwage,R-squared:,0.187
Model:,OLS,Adj. R-squared:,0.185
Method:,Least Squares,F-statistic:,64.53
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,8.61e-89
Time:,16:06:17,Log-Likelihood:,-2989.2
No. Observations:,4360,AIC:,5994.0
Df Residuals:,4352,BIC:,6045.0
Df Model:,7,,
Covariance Type:,HAC,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,-0.0347,0.107,-0.326,0.745,-0.244,0.174
educ,0.0994,0.008,12.521,0.000,0.084,0.115
black,-0.1438,0.044,-3.267,0.001,-0.230,-0.058
hisp,0.0157,0.035,0.453,0.650,-0.052,0.084
exper,0.0892,0.013,6.894,0.000,0.064,0.115
expersq,-0.0028,0.001,-3.197,0.001,-0.005,-0.001
married,0.1077,0.024,4.505,0.000,0.061,0.155
union,0.1801,0.025,7.317,0.000,0.132,0.228

0,1,2,3
Omnibus:,1264.436,Durbin-Watson:,0.998
Prob(Omnibus):,0.0,Jarque-Bera (JB):,10588.544
Skew:,-1.143,Prob(JB):,0.0
Kurtosis:,10.284,Cond. No.,588.0


In [43]:
# Not sure about these HAC estimators compared to Stata
PanelOLS.from_formula("lwage~expersq+married+union+EntityEffects+TimeEffects", data=wagepan).fit(cov_type="kernel")

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.0216
Estimator:,PanelOLS,R-squared (Between):,-0.2717
No. Observations:,4360,R-squared (Within):,-0.4809
Date:,"Tue, Aug 17 2021",R-squared (Overall):,-0.2808
Time:,16:06:17,Log-likelihood,-1324.8
Cov. Estimator:,Driscoll-Kraay,,
,,F-statistic:,27.959
Entities:,545,P-value,0.0000
Avg Obs:,8.0000,Distribution:,"F(3,3805)"
Min Obs:,8.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
expersq,-0.0052,0.0004,-11.754,0.0000,-0.0061,-0.0043
married,0.0467,0.0099,4.7363,0.0000,0.0274,0.0660
union,0.0800,0.0101,7.9475,0.0000,0.0603,0.0997


C11.i All the standard errors are higher using the fully robust standard errors.

C11.ii The errors appear smaller using the robust errors in linearmodels. This seems somewhat doubtful and I'm unfamilliar with the "Driscoll-Kraay HAC estimator" which is what the module uses.

C11.iii Both sets of errors are incorrect in the presence of serial correlation but the shift seems more severe in the pooled OLS case (and intuitively it makes sense since the fixed effects are in the error). If forced, I would say pooled OLS.

In [44]:
# Exercise 12
elem94_95 = pd.read_stata("./stata/elem94_95.dta")
elem94_95.distid.value_counts()

82010.0    162
41010.0     46
33020.0     32
25010.0     32
50210.0     25
          ... 
36025.0      1
36015.0      1
35040.0      1
35030.0      1
83070.0      1
Name: distid, Length: 537, dtype: int64

In [45]:
X = sm.add_constant(elem94_95[["bs", "lenrol", "lstaff", "lunch"]])
pooled_ols = sm.OLS(elem94_95.lavgsal, X, missing="drop").fit()
pooled_ols.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lavgsal,R-squared:,0.488
Model:,OLS,Adj. R-squared:,0.487
Method:,Least Squares,F-statistic:,439.4
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,4.22e-266
Time:,16:06:17,Log-Likelihood:,689.98
No. Observations:,1848,AIC:,-1370.0
Df Residuals:,1843,BIC:,-1342.0
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,13.8315,0.110,126.055,0.000,13.616,14.047
bs,-0.5161,0.110,-4.702,0.000,-0.731,-0.301
lenrol,-0.0284,0.008,-3.360,0.001,-0.045,-0.012
lstaff,-0.6906,0.018,-37.615,0.000,-0.727,-0.655
lunch,-0.0008,0.000,-4.695,0.000,-0.001,-0.000

0,1,2,3
Omnibus:,46.324,Durbin-Watson:,0.921
Prob(Omnibus):,0.0,Jarque-Bera (JB):,105.674
Skew:,0.009,Prob(JB):,1.13e-23
Kurtosis:,4.171,Cond. No.,1460.0


In [46]:
pooled_ols_robust = sm.OLS(elem94_95.lavgsal, X, missing="drop").fit(cov_type="cluster", cov_kwds={"groups": elem94_95.distid})
pooled_ols_robust.summary()

0,1,2,3
Dep. Variable:,lavgsal,R-squared:,0.488
Model:,OLS,Adj. R-squared:,0.487
Method:,Least Squares,F-statistic:,129.8
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,1.99e-77
Time:,16:06:17,Log-Likelihood:,689.98
No. Observations:,1848,AIC:,-1370.0
Df Residuals:,1843,BIC:,-1342.0
Df Model:,4,,
Covariance Type:,cluster,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,13.8315,0.313,44.148,0.000,13.217,14.446
bs,-0.5161,0.253,-2.040,0.041,-1.012,-0.020
lenrol,-0.0284,0.025,-1.119,0.263,-0.078,0.021
lstaff,-0.6906,0.036,-19.404,0.000,-0.760,-0.621
lunch,-0.0008,0.001,-1.443,0.149,-0.002,0.000

0,1,2,3
Omnibus:,46.324,Durbin-Watson:,0.921
Prob(Omnibus):,0.0,Jarque-Bera (JB):,105.674
Skew:,0.009,Prob(JB):,1.13e-23
Kurtosis:,4.171,Cond. No.,1460.0


In [47]:
elem94_95_reduced = elem94_95[elem94_95.bs <= 0.5]
X = sm.add_constant(elem94_95_reduced[["bs", "lenrol", "lstaff", "lunch"]])
pooled_ols_reduced_robust = sm.OLS(elem94_95_reduced.lavgsal, X, missing="drop").fit(cov_type="cluster", cov_kwds={"groups": elem94_95_reduced.distid})
pooled_ols_reduced_robust.summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,lavgsal,R-squared:,0.489
Model:,OLS,Adj. R-squared:,0.488
Method:,Least Squares,F-statistic:,133.4
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,5.0400000000000003e-79
Time:,16:06:17,Log-Likelihood:,707.02
No. Observations:,1844,AIC:,-1404.0
Df Residuals:,1839,BIC:,-1376.0
Df Model:,4,,
Covariance Type:,cluster,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,13.7101,0.253,54.144,0.000,13.214,14.206
bs,-0.1865,0.273,-0.683,0.495,-0.722,0.349
lenrol,-0.0270,0.024,-1.120,0.263,-0.074,0.020
lstaff,-0.6901,0.035,-19.540,0.000,-0.759,-0.621
lunch,-0.0008,0.001,-1.450,0.147,-0.002,0.000

0,1,2,3
Omnibus:,30.79,Durbin-Watson:,0.922
Prob(Omnibus):,0.0,Jarque-Bera (JB):,53.026
Skew:,0.105,Prob(JB):,3.06e-12
Kurtosis:,3.804,Cond. No.,1640.0


In [48]:
elem94_95_panel = elem94_95_reduced[elem94_95_reduced.distid.isin((elem94_95_reduced.distid.value_counts() > 1).index[elem94_95_reduced.distid.value_counts() > 1])]

In [49]:
elem94_95_panel_means = elem94_95_panel.groupby("distid").mean()
elem94_95_panel_dists = elem94_95_panel.set_index("distid")[["lavgsal", "bs", "lenrol", "lstaff", "lunch"]] - elem94_95_panel_means[["lavgsal", "bs", "lenrol", "lstaff", "lunch"]]
X = elem94_95_panel_dists[["bs", "lenrol", "lstaff", "lunch"]]
fe_ols = sm.OLS(elem94_95_panel_dists.lavgsal, X).fit(cov_type="cluster", cov_kwds={"groups": elem94_95_panel_dists.index})
fe_ols.summary()

0,1,2,3
Dep. Variable:,lavgsal,R-squared (uncentered):,0.576
Model:,OLS,Adj. R-squared (uncentered):,0.575
Method:,Least Squares,F-statistic:,62.51
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,3.770000000000001e-37
Time:,16:06:17,Log-Likelihood:,1619.8
No. Observations:,1573,AIC:,-3232.0
Df Residuals:,1569,BIC:,-3210.0
Df Model:,4,,
Covariance Type:,cluster,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
bs,-0.5234,0.229,-2.290,0.022,-0.971,-0.075
lenrol,-0.0481,0.013,-3.645,0.000,-0.074,-0.022
lstaff,-0.6235,0.043,-14.500,0.000,-0.708,-0.539
lunch,0.0005,0.000,2.490,0.013,0.000,0.001

0,1,2,3
Omnibus:,134.425,Durbin-Watson:,2.231
Prob(Omnibus):,0.0,Jarque-Bera (JB):,688.721
Skew:,-0.202,Prob(JB):,2.79e-150
Kurtosis:,6.216,Cond. No.,777.0


C12.i The largest number of schools in a district is 162. The smallest would be a single school in a district (there are many cases of these).

C12.ii Results above. $\beta_{bs} = -0.5161$ and the standard error is 0.110

C12.iii The t-statistic is cut in half once using clustered errors.

C12.iv Once dropping the observations where $bs > 0.5$ any tradeoff vanishes as the coefficient for bs is not statistically significant with the clustered errors.

C12.v We are trying to reject $\beta_{bs} = -1$ which appears to be the case in the fixed effects approach above (acknowledging the possibility of some errors). 

C12.vi It appears that we were able to reject the null with both approaches but it seems to make sense to allow teacher compensation to vary across districts. Districts are presumably going to be able to pay different compensation packages and the composition of teaching staff is likely to be different between districts.

In [50]:
# Exercise 13
driving = pd.read_stata("./stata/driving.dta")
driving.groupby("year").mean()["totfatrte"]

year
1980    25.494583
1981    23.670208
1982    20.942499
1983    20.152916
1984    20.267500
1985    19.851458
1986    20.800417
1987    20.774792
1988    20.891666
1989    19.772291
1990    19.505209
1991    18.094791
1992    17.157917
1993    17.127708
1994    17.155209
1995    17.668541
1996    17.369375
1997    17.610624
1998    17.265417
1999    17.250416
2000    16.825624
2001    16.792707
2002    17.029583
2003    16.763542
2004    16.728958
Name: totfatrte, dtype: float32

In [51]:
year_dummies = ["d"+str(yr) for yr in range(80,100)] + ["d0"+str(yr) for yr in range(5)]
sm.OLS(driving.totfatrte, driving[year_dummies]).fit().summary()

0,1,2,3
Dep. Variable:,totfatrte,R-squared:,0.128
Model:,OLS,Adj. R-squared:,0.11
Method:,Least Squares,F-statistic:,7.164
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,1.94e-22
Time:,16:06:18,Log-Likelihood:,-3841.7
No. Observations:,1200,AIC:,7733.0
Df Residuals:,1175,BIC:,7861.0
Df Model:,24,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
d80,25.4946,0.867,29.401,0.000,23.793,27.196
d81,23.6702,0.867,27.298,0.000,21.969,25.371
d82,20.9425,0.867,24.152,0.000,19.241,22.644
d83,20.1529,0.867,23.241,0.000,18.452,21.854
d84,20.2675,0.867,23.373,0.000,18.566,21.969
d85,19.8515,0.867,22.894,0.000,18.150,21.553
d86,20.8004,0.867,23.988,0.000,19.099,22.502
d87,20.7748,0.867,23.958,0.000,19.074,22.476
d88,20.8917,0.867,24.093,0.000,19.190,22.593

0,1,2,3
Omnibus:,110.665,Durbin-Watson:,0.2
Prob(Omnibus):,0.0,Jarque-Bera (JB):,151.315
Skew:,0.73,Prob(JB):,1.39e-33
Kurtosis:,3.946,Cond. No.,1.0


In [52]:
year_dummies = ["d"+str(yr) for yr in range(81,100)] + ["d0"+str(yr) for yr in range(5)]
X = sm.add_constant(driving[["bac08", "bac10", "perse", "sbprim", "sbsecon", "sl70plus", "gdl", "perc14_24", "unem", "vehicmilespc"] + year_dummies])
sm.OLS(driving.totfatrte, X).fit().summary()

  x = pd.concat(x[::order], 1)


0,1,2,3
Dep. Variable:,totfatrte,R-squared:,0.608
Model:,OLS,Adj. R-squared:,0.596
Method:,Least Squares,F-statistic:,53.1
Date:,"Tue, 17 Aug 2021",Prob (F-statistic):,6.380000000000001e-210
Time:,16:06:18,Log-Likelihood:,-3362.1
No. Observations:,1200,AIC:,6794.0
Df Residuals:,1165,BIC:,6972.0
Df Model:,34,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-2.7161,2.476,-1.097,0.273,-7.574,2.141
bac08,-2.4985,0.538,-4.648,0.000,-3.553,-1.444
bac10,-1.4176,0.396,-3.577,0.000,-2.195,-0.640
perse,-0.6201,0.298,-2.079,0.038,-1.205,-0.035
sbprim,-0.0753,0.491,-0.153,0.878,-1.038,0.888
sbsecon,0.0673,0.429,0.157,0.875,-0.775,0.910
sl70plus,3.3479,0.445,7.521,0.000,2.474,4.221
gdl,-0.4269,0.527,-0.810,0.418,-1.461,0.607
perc14_24,0.1416,0.123,1.154,0.249,-0.099,0.382

0,1,2,3
Omnibus:,115.715,Durbin-Watson:,0.399
Prob(Omnibus):,0.0,Jarque-Bera (JB):,228.133
Skew:,0.613,Prob(JB):,2.89e-50
Kurtosis:,4.75,Cond. No.,389000.0


In [53]:
driving.set_index(["state", "year"], inplace=True)
driving_fe = PanelOLS.from_formula("totfatrte~bac08 + bac10 + perse + sbprim + sbsecon + sl70plus + gdl + perc14_24 + unem + vehicmilespc + EntityEffects + TimeEffects", data=driving).fit()
driving_fe

0,1,2,3
Dep. Variable:,totfatrte,R-squared:,0.2351
Estimator:,PanelOLS,R-squared (Between):,0.5052
No. Observations:,1200,R-squared (Within):,-0.1324
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.4890
Time:,16:06:18,Log-likelihood,-2500.5
Cov. Estimator:,Unadjusted,,
,,F-statistic:,34.354
Entities:,48,P-value,0.0000
Avg Obs:,25.000,Distribution:,"F(10,1118)"
Min Obs:,25.000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
bac08,-1.4372,0.3942,-3.6458,0.0003,-2.2107,-0.6637
bac10,-1.0627,0.2688,-3.9528,0.0001,-1.5902,-0.5352
perse,-1.1516,0.2340,-4.9217,0.0000,-1.6107,-0.6925
sbprim,-1.2274,0.3427,-3.5814,0.0004,-1.8998,-0.5550
sbsecon,-0.3497,0.2522,-1.3868,0.1658,-0.8445,0.1451
sl70plus,-0.0625,0.2693,-0.2322,0.8164,-0.5909,0.4659
gdl,-0.4118,0.2926,-1.4074,0.1596,-0.9858,0.1623
perc14_24,0.1871,0.0951,1.9676,0.0494,0.0005,0.3737
unem,-0.5718,0.0606,-9.4397,0.0000,-0.6907,-0.4530


In [54]:
(driving_fe.params[-1] * 1000)

0.9400519053926331

In [55]:
PanelOLS.from_formula("totfatrte~bac08 + bac10 + perse + sbprim + sbsecon + sl70plus + gdl + perc14_24 + unem + vehicmilespc + EntityEffects + TimeEffects", data=driving).fit(cov_type="clustered", cluster_entity=True)

0,1,2,3
Dep. Variable:,totfatrte,R-squared:,0.2351
Estimator:,PanelOLS,R-squared (Between):,0.5052
No. Observations:,1200,R-squared (Within):,-0.1324
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.4890
Time:,16:06:18,Log-likelihood,-2500.5
Cov. Estimator:,Clustered,,
,,F-statistic:,34.354
Entities:,48,P-value,0.0000
Avg Obs:,25.000,Distribution:,"F(10,1118)"
Min Obs:,25.000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
bac08,-1.4372,0.8316,-1.7283,0.0842,-3.0689,0.1944
bac10,-1.0627,0.4968,-2.1389,0.0327,-2.0375,-0.0878
perse,-1.1516,0.4487,-2.5665,0.0104,-2.0320,-0.2712
sbprim,-1.2274,0.5646,-2.1738,0.0299,-2.3353,-0.1195
sbsecon,-0.3497,0.3748,-0.9329,0.3511,-1.0852,0.3858
sl70plus,-0.0625,0.5727,-0.1092,0.9131,-1.1862,1.0612
gdl,-0.4118,0.3873,-1.0631,0.2880,-1.1718,0.3482
perc14_24,0.1871,0.1758,1.0646,0.2873,-0.1578,0.5320
unem,-0.5718,0.1217,-4.6996,0.0000,-0.8106,-0.3331


C13.i totfatrte is the total fatalities per 100,000 population. The coefficients are the same as the means but the regression was for the exercise. Fatalities appear to have fallen over time.

C13.ii The coefficients on bac08 and bac10 indicate that the fatality rate falls in the presence of these limits (the coefficient for bac08 is larger in magnitude than bac10, meaning more restrictive limits save more lives). Per se laws appear to have an effect on fatalities given that there is a negative coefficient that is statistically significant. The primary seat belt law does not show an effect since the negative coefficient is not statistically significant (t-statistic 0.1)

C13.iii Some of the effects are smaller but primary seatbelt becomes significant. State effects seem important, though over 1980-2004 the list of time invariant factors is smaller than what we might imagine. That said, the conclusions drawn from the estimates and the idea of controlling for state specific effects seems far more credible.

C13.iv A 1,000 unit increase in miles driven per capita translates to a 0.94 increase in total fatalities per 100,000. Broadly we'd see an additional fatality per 100,000 for each increase of 1000 miles (which may not still be the most intuitive explanation).

C13.v Some of the significance is lost such as in the case of bac08, but generally the results survive clustered errors.

In [56]:
# Exercise 14
airfare = pd.read_stata("./stata/airfare.dta")
concen_bar = airfare.groupby("id").mean().rename(columns={"concen": "concenbar"})["concenbar"]
airfare = pd.merge(airfare, concen_bar, how="left", left_on="id", right_index=True).set_index(["id", "year"])

In [57]:
print("Min:", concen_bar.min())
print("Max:", concen_bar.max())
print("IDs:", airfare.index.get_level_values(0).unique().size)

Min: 0.18619999289512634
Max: 0.9997000098228455
IDs: 1149


In [58]:
RandomEffects.from_formula("lfare~1+y98+y99+y00+concen+ldist+ldistsq+concenbar", data=airfare).fit()

0,1,2,3
Dep. Variable:,lfare,R-squared:,0.2302
Estimator:,RandomEffects,R-squared (Between):,0.4216
No. Observations:,4596,R-squared (Within):,0.1352
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.4068
Time:,16:06:18,Log-likelihood,3776.2
Cov. Estimator:,Unadjusted,,
,,F-statistic:,196.03
Entities:,1149,P-value,0.0000
Avg Obs:,4.0000,Distribution:,"F(7,4588)"
Min Obs:,4.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Intercept,6.2079,0.8097,7.6672,0.0000,4.6205,7.7952
y98,0.0228,0.0045,5.1302,0.0000,0.0141,0.0316
y99,0.0364,0.0044,8.1782,0.0000,0.0277,0.0451
y00,0.0978,0.0045,21.948,0.0000,0.0890,0.1065
concen,0.1689,0.0294,5.7427,0.0000,0.1112,0.2265
ldist,-0.9089,0.2471,-3.6791,0.0002,-1.3933,-0.4246
ldistsq,0.1038,0.0187,5.5416,0.0000,0.0671,0.1406
concenbar,0.2136,0.0679,3.1471,0.0017,0.0805,0.3467


In [59]:
RandomEffects.from_formula("lfare~1+y98+y99+y00+concen+concenbar", data=airfare).fit()

0,1,2,3
Dep. Variable:,lfare,R-squared:,0.1171
Estimator:,RandomEffects,R-squared (Between):,0.0576
No. Observations:,4596,R-squared (Within):,0.1352
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.0616
Time:,16:06:18,Log-likelihood,3775.8
Cov. Estimator:,Unadjusted,,
,,F-statistic:,121.76
Entities:,1149,P-value,0.0000
Avg Obs:,4.0000,Distribution:,"F(5,4590)"
Min Obs:,4.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Intercept,5.3859,0.0413,130.34,0.0000,5.3049,5.4669
y98,0.0228,0.0045,5.1309,0.0000,0.0141,0.0316
y99,0.0364,0.0044,8.1793,0.0000,0.0277,0.0451
y00,0.0978,0.0045,21.951,0.0000,0.0890,0.1065
concen,0.1689,0.0294,5.7434,0.0000,0.1112,0.2265
concenbar,-0.7090,0.0709,-9.9940,0.0000,-0.8480,-0.5699


In [60]:
RandomEffects.from_formula("lfare~1+y98+y99+y00+concen+ldist+ldistsq+concenbar", data=airfare).fit(cov_type="kernel")

0,1,2,3
Dep. Variable:,lfare,R-squared:,0.2302
Estimator:,RandomEffects,R-squared (Between):,0.4216
No. Observations:,4596,R-squared (Within):,0.1352
Date:,"Tue, Aug 17 2021",R-squared (Overall):,0.4068
Time:,16:06:18,Log-likelihood,3776.2
Cov. Estimator:,Driscoll-Kraay,,
,,F-statistic:,196.03
Entities:,1149,P-value,0.0000
Avg Obs:,4.0000,Distribution:,"F(7,4588)"
Min Obs:,4.0000,,

0,1,2,3,4,5,6
,Parameter,Std. Err.,T-stat,P-value,Lower CI,Upper CI
Intercept,6.2079,0.6703,9.2614,0.0000,4.8938,7.5220
y98,0.0228,0.0012,19.730,0.0000,0.0206,0.0251
y99,0.0364,0.0010,36.592,0.0000,0.0344,0.0383
y00,0.0978,0.0014,68.784,0.0000,0.0950,0.1006
concen,0.1689,0.1296,1.3033,0.1925,-0.0851,0.4229
ldist,-0.9089,0.1828,-4.9712,0.0000,-1.2674,-0.5505
ldistsq,0.1038,0.0112,9.2494,0.0000,0.0818,0.1259
concenbar,0.2136,0.3028,0.7055,0.4806,-0.3801,0.8073


C14.i There are 1149 different ids and so the same number of averages. The minimum value is 0.186 and the maximum is about 1

C14.ii The exercise is in the same notebook and the estimates are identical.

C14.iii $\beta_1$ stays the same but $\gamma_1$ changes substantially (0.214 to -0.709)

C14.iv The test is essentially done in our answer to part (ii). The p-value is 0.0017 meaning we reject the null hypothesis that $\gamma_1 = 0$ which practically means we should reject random effects in favour of fixed effects.

C14.v The error on $\gamma_1$ substantially increases and the resulting p-value is now 0.4806, meaning we fail to reject the null hypothesis and so potentially allow for random effects again.