# Chapter 9

In [1]:
import pandas as pd
import statsmodels.api as sm
import numpy as np

In [2]:
from statsmodels.stats.diagnostic import linear_reset

# Exercise 1
ceosal1 = pd.read_stata("stata/CEOSAL1.DTA")
ceosal1["rosneg"] = (ceosal1.ros < 0).astype('int32')

y = ceosal1.lsalary
X = sm.add_constant(ceosal1[["lsales", "roe", "rosneg"]])
model = sm.OLS(y, X).fit()
print(linear_reset(model))

<Wald test (chi2): statistic=[[2.66707016]], p-value=0.26354396347186887, df_denom=2>


In [3]:
linear_reset(model, cov_type = "HC3")

<class 'statsmodels.stats.contrast.ContrastResults'>
<Wald test (chi2): statistic=[[2.99859337]], p-value=0.22328714637257752, df_denom=2>

C1.i The p-value for the RESET test is about 0.26 which could not be considered evidence of functional form misspecification.

C1.ii When considering a heteroskedacticity robust RESET the p-value falls to about 0.22, but this is far from any value we would consider evidence of misspecification.

In [4]:
# Exercise 2
wage2 = pd.read_stata("stata/WAGE2.DTA")
wage2["educ_iq"] = wage2.educ * wage2.IQ
wage2["educ_kww"] = wage2.educ * wage2.KWW

y = wage2.lwage
X = sm.add_constant(wage2[["educ", "exper", "tenure", "married", "south", "urban", "black", "KWW", "educ_kww"]])
model = sm.OLS(y, X).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.259
Model:                            OLS   Adj. R-squared:                  0.252
Method:                 Least Squares   F-statistic:                     35.97
Date:                Sat, 30 May 2020   Prob (F-statistic):           8.91e-55
Time:                        19:28:33   Log-Likelihood:                -377.34
No. Observations:                 935   AIC:                             774.7
Df Residuals:                     925   BIC:                             823.1
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          5.3557      0.114     47.113      0.0

In [5]:
wage2["combined"] = wage2.IQ * wage2.KWW
wage2["educ_combined"] = wage2.combined * wage2.educ
X = sm.add_constant(wage2[["educ", "exper", "tenure", "married", "south", "urban", "black", "IQ", "KWW", "educ_combined"]])
model = sm.OLS(y, X).fit()
print(model.summary())
print(model.f_test("(IQ = 0), (KWW = 0)"))

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.266
Model:                            OLS   Adj. R-squared:                  0.258
Method:                 Least Squares   F-statistic:                     33.53
Date:                Sat, 30 May 2020   Prob (F-statistic):           7.48e-56
Time:                        19:28:33   Log-Likelihood:                -372.88
No. Observations:                 935   AIC:                             767.8
Df Residuals:                     924   BIC:                             821.0
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const             5.1756      0.128     40.486

C2.i The estimated return to education when using KWW as a proxy for ability is about 5.8%

C2.ii The return to education falls to about 4.9%

C.iii Both IQ and KWW are significant at the 5% level (in fact, IQ is significant at the 1% level) individually. The F-test provides strong evidence that they are jointly significant.

In [6]:
# Exercise 3
jtrain = pd.read_stata("stata/JTRAIN.DTA")
jtrain_1988 = jtrain.loc[jtrain["year"] == 1988, ["lscrap", "grant"]].dropna()
y = jtrain_1988.lscrap
X = sm.add_constant(jtrain_1988.grant)

model = sm.OLS(y, X).fit()
print(model.summary())
print(model.get_robustcov_results(cov_type = "HC3").summary())

                            OLS Regression Results                            
Dep. Variable:                 lscrap   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.019
Method:                 Least Squares   F-statistic:                   0.01948
Date:                Sat, 30 May 2020   Prob (F-statistic):              0.890
Time:                        19:28:34   Log-Likelihood:                -94.660
No. Observations:                  54   AIC:                             193.3
Df Residuals:                      52   BIC:                             197.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.4085      0.241      1.698      0.0

In [7]:
jtrain["lscrap_lag"] = jtrain["lscrap"].shift(1)
jtrain_1988 = jtrain.loc[jtrain["year"] == 1988, ["lscrap", "lscrap_lag", "grant"]].dropna()
y = jtrain_1988.lscrap
X = sm.add_constant(jtrain_1988[["grant", "lscrap_lag"]])

model = sm.OLS(y, X).fit()
model_hc3 = model.get_robustcov_results(cov_type = "HC3")
print(model.summary())
print("One sided p-value for grant is: ", (model.pvalues[1] / 2))
print(model_hc3.summary())
print("One sided p-value for grant is: ", (model_hc3.pvalues[1] / 2))

                            OLS Regression Results                            
Dep. Variable:                 lscrap   R-squared:                       0.873
Model:                            OLS   Adj. R-squared:                  0.868
Method:                 Least Squares   F-statistic:                     174.9
Date:                Sat, 30 May 2020   Prob (F-statistic):           1.47e-23
Time:                        19:28:34   Log-Likelihood:                -39.000
No. Observations:                  54   AIC:                             84.00
Df Residuals:                      51   BIC:                             89.97
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0212      0.089      0.238      0.8

In [8]:
print(model.t_test("lscrap_lag = 1"))
print(model_hc3.t_test("lscrap_lag = 1"))

                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             0.8312      0.044     -3.799      0.000       0.742       0.920
                             Test for Constraints                             
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
c0             0.8312      0.088     -1.914      0.061       0.654       1.008


C3.i Quite a lot depends on how grants are assigned. It would be fair to say that grants can be assigned to over or under performing firms, depending on who is administering the program. In either case the factors leading to higher or lower productivity are unobserved but would be correlated with grant.

C3.ii The simple regression does not provide evidence that receiving a job training grant significantly lowers a firm's scrap rate. With a positive coefficient, a significant result would mean the grant increased the scrap rate!

C3.iii Adding the lag turns the coefficient for grant negative. It is statistically significant at the 5% level with a p-value of 0.045 against the one sided test $\beta_{grant} < 0$

C3.iv The t-test is listed above and the p-value is incredibly small (0.000)

C3.v Robust results are reported below the standard results. The results from part (ii) do not substantively change. While the question did not ask to check part (iii), grant is now only significant at the 10% level when tested against the one sided alternative. The robust standard error is quite a bit larger and the test is now significant only at the 10% level.