# Chapter 15. Instrumental Variables Estimation and TSLS  
#### [Home](http://solomonegash.com/) | [Stata](http://solomonegash.com/woodridge1/index.html) | [R](http://solomonegash.com/econometrics/rbook1/index.html)

In [1]:
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

from linearmodels.iv import IV2SLS

from wooldridge import *

### Example 15.1. Estimating the Return to Education for Married Women

In [2]:
df = dataWoo('mroz')
print(smf.ols('lwage ~ 1 + educ', data=df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.118
Model:                            OLS   Adj. R-squared:                  0.116
Method:                 Least Squares   F-statistic:                     56.93
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           2.76e-13
Time:                        18:15:33   Log-Likelihood:                -441.26
No. Observations:                 428   AIC:                             886.5
Df Residuals:                     426   BIC:                             894.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.1852      0.185     -1.000      0.3

In [3]:
print(smf.ols('educ ~ 1 + fatheduc', data =df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                   educ   R-squared:                       0.196
Model:                            OLS   Adj. R-squared:                  0.195
Method:                 Least Squares   F-statistic:                     182.8
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           1.93e-37
Time:                        18:15:33   Log-Likelihood:                -1606.6
No. Observations:                 753   AIC:                             3217.
Df Residuals:                     751   BIC:                             3226.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      9.7990      0.199     49.356      0.0

In [4]:
df = df.dropna()
print(IV2SLS.from_formula('lwage ~ 1 + [educ ~ fatheduc]', data = df) .fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.0934
Estimator:                    IV-2SLS   Adj. R-squared:                 0.0913
No. Observations:                 428   F-statistic:                    2.5656
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.1092
Time:                        18:15:33   Distribution:                  chi2(1)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      0.4411     0.4643     0.9501     0.3421     -0.4689      1.3511
educ           0.0592     0.0369     1.6017     0.10

### Example 15.2. Estimating the Return to Education for Men

In [5]:
df = dataWoo("wage2")
print(smf.ols('educ ~ 1 + sibs', data =df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                   educ   R-squared:                       0.057
Model:                            OLS   Adj. R-squared:                  0.056
Method:                 Least Squares   F-statistic:                     56.67
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           1.22e-13
Time:                        18:15:33   Log-Likelihood:                -2034.4
No. Observations:                 935   AIC:                             4073.
Df Residuals:                     933   BIC:                             4083.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     14.1388      0.113    124.969      0.0

In [6]:
print(IV2SLS.from_formula('lwage ~ 1 + [educ ~ sibs]', data = df) .fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                     -0.0092
Estimator:                    IV-2SLS   Adj. R-squared:                -0.0103
No. Observations:                 935   F-statistic:                    24.850
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0000
Time:                        18:15:33   Distribution:                  chi2(1)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      5.1300     0.3304     15.528     0.0000      4.4825      5.7776
educ           0.1224     0.0246     4.9850     0.00

### Example 15.3. Estimating the Effect of Smoking on Birth Weight

In [7]:
df = dataWoo("bwght")
print(smf.ols('packs ~ 1 + cigprice', data =df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                  packs   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.001
Method:                 Least Squares   F-statistic:                    0.1305
Date:                Tue, 02 Jul 2024   Prob (F-statistic):              0.718
Time:                        18:15:33   Log-Likelihood:                -291.47
No. Observations:                1388   AIC:                             586.9
Df Residuals:                    1386   BIC:                             597.4
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0674      0.103      0.658      0.5

In [8]:
print(IV2SLS.from_formula('lbwght  ~ 1 + [packs ~ cigprice]', data = df) .fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                 lbwght   R-squared:                     -23.230
Estimator:                    IV-2SLS   Adj. R-squared:                -23.248
No. Observations:                1388   F-statistic:                    0.1107
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.7394
Time:                        18:15:33   Distribution:                  chi2(1)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      4.4481     0.9387     4.7388     0.0000      2.6084      6.2879
packs          2.9887     8.9832     0.3327     0.73

### Example 15.4. Using College Proximity as an IV for Education

In [9]:
df = dataWoo("card")
print(smf.ols(
    'educ ~ nearc4 + exper + expersq + black + smsa + south + smsa66 + reg661 + reg662 + reg663 + reg664 + reg665 + reg666 + reg667 + reg668', 
    data =df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                   educ   R-squared:                       0.477
Model:                            OLS   Adj. R-squared:                  0.474
Method:                 Least Squares   F-statistic:                     182.1
Date:                Tue, 02 Jul 2024   Prob (F-statistic):               0.00
Time:                        18:15:33   Log-Likelihood:                -6258.5
No. Observations:                3010   AIC:                         1.255e+04
Df Residuals:                    2994   BIC:                         1.265e+04
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     16.8485      0.211     79.805      0.0

In [10]:
print(smf.ols(
    'lwage ~ educ + exper + expersq + black + smsa + south + smsa66 + reg661 + reg662 + reg663 + reg664 + reg665 + reg666 + reg667 + reg668', 
    data =df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.300
Model:                            OLS   Adj. R-squared:                  0.296
Method:                 Least Squares   F-statistic:                     85.48
Date:                Tue, 02 Jul 2024   Prob (F-statistic):          1.74e-218
Time:                        18:15:34   Log-Likelihood:                -1288.8
No. Observations:                3010   AIC:                             2610.
Df Residuals:                    2994   BIC:                             2706.
Df Model:                          15                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.7394      0.072     66.259      0.0

In [11]:
print(IV2SLS.from_formula(
    'lwage ~ 1 + exper + expersq + black + smsa + south + smsa66 + reg661 + reg662 + reg663 + reg664 + reg665 + reg666 + reg667 + reg668 + [educ ~ nearc4]', 
    data=df).fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.2382
Estimator:                    IV-2SLS   Adj. R-squared:                 0.2343
No. Observations:                3010   F-statistic:                    840.83
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0000
Time:                        18:15:34   Distribution:                 chi2(15)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      3.7740     0.9174     4.1137     0.0000      1.9759      5.5720
exper          0.1083     0.0233     4.6376     0.00

### Example 15.5. Return to Education for Working Women

In [12]:
df = dataWoo("mroz")
df = df.dropna()
mreg1 = smf.ols('educ ~ exper + expersq + fatheduc + motheduc', data=df).fit()
hypotheses = '(fatheduc = motheduc = 0)'
f_test = mreg1.f_test(hypotheses)
print(f_test)

<F test: F=55.400300427777154, p=4.268908724631624e-22, df_denom=423, df_num=2>


In [13]:
print(IV2SLS.from_formula('lwage ~ 1 + [educ~fatheduc + motheduc] + exper + expersq', data=df).fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.1357
Estimator:                    IV-2SLS   Adj. R-squared:                 0.1296
No. Observations:                 428   F-statistic:                    18.611
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0003
Time:                        18:15:34   Distribution:                  chi2(3)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      0.0481     0.4278     0.1124     0.9105     -0.7903      0.8865
exper          0.0442     0.0155     2.8546     0.00

### Example 15.6. Using Two Test Scores as Indicators of Ability

In [14]:
print(IV2SLS.from_formula('lwage ~ 1+ educ + exper + tenure + married + south + urban + black + [IQ ~ KWW]', data = dataWoo("wage2")).fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.1900
Estimator:                    IV-2SLS   Adj. R-squared:                 0.1830
No. Observations:                 935   F-statistic:                    356.33
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0000
Time:                        18:15:34   Distribution:                  chi2(8)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      4.5925     0.3501     13.117     0.0000      3.9063      5.2786
educ           0.0250     0.0187     1.3410     0.17

### Example 15.7. Return to Education for Working Women

In [15]:
df = dataWoo("mroz")
df = df[(df['inlf']==1)]
v2 = smf.ols('educ ~ exper + expersq + fatheduc + motheduc', data=df).fit().resid
print(IV2SLS.from_formula('lwage ~1 + [educ ~ fatheduc + motheduc] + exper + expersq + v2 ', data = df).fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                  lwage   R-squared:                      0.1624
Estimator:                    IV-2SLS   Adj. R-squared:                 0.1544
No. Observations:                 428   F-statistic:                    87.093
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0000
Time:                        18:15:34   Distribution:                  chi2(4)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept      0.0481     0.4196     0.1146     0.9087     -0.7744      0.8706
exper          0.0442     0.0150     2.9382     0.00

In [16]:
print("The OLS estimate is ")
smf.ols('lwage ~ educ + exper + expersq', data=df).fit().params

The OLS estimate is 


Intercept   -0.522041
educ         0.107490
exper        0.041567
expersq     -0.000811
dtype: float64

### Example 15.8. Return to Education for Working Women

In [17]:
df = dataWoo("mroz")
df = df.dropna()
u1 = (IV2SLS.from_formula('lwage ~1 + [educ ~ fatheduc + motheduc] + exper + expersq ', data = df).fit()).resids
wreg = smf.ols('u1 ~ exper + expersq + fatheduc + motheduc', data=df).fit()
print(wreg.summary())

                            OLS Regression Results                            
Dep. Variable:                     u1   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                 -0.009
Method:                 Least Squares   F-statistic:                   0.09350
Date:                Tue, 02 Jul 2024   Prob (F-statistic):              0.984
Time:                        18:15:34   Log-Likelihood:                -436.70
No. Observations:                 428   AIC:                             883.4
Df Residuals:                     423   BIC:                             903.7
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.0110      0.141      0.078      0.9

In [18]:
LM1 = wreg.nobs * wreg.rsquared
LM1

0.3780713419638242

In [19]:
u2 = (IV2SLS.from_formula('lwage ~1 + [educ ~ fatheduc + motheduc + huseduc] + exper + expersq ', data = df).fit()).resids
wreg2 = smf.ols('u2 ~ exper + expersq + fatheduc + motheduc + huseduc', data=df).fit()
LM2 = wreg2.nobs * wreg2.rsquared
LM2

1.1150430012569967

In [20]:
(IV2SLS.from_formula('lwage ~1 + [educ ~ fatheduc + motheduc + huseduc] + exper + expersq ', data = df).fit()).params

Intercept   -0.186857
exper        0.043097
expersq     -0.000863
educ         0.080392
Name: parameter, dtype: float64

In [21]:
(IV2SLS.from_formula('lwage ~1 + [educ ~ fatheduc + motheduc] + exper + expersq ', data = df).fit()).params

Intercept    0.048100
exper        0.044170
expersq     -0.000899
educ         0.061397
Name: parameter, dtype: float64

### Example 15.9. Effect of Education on Fertility

In [22]:
df = dataWoo("fertil1")
print(IV2SLS.from_formula('kids ~ 1 + [educ ~ meduc + feduc] + age + agesq + black + east + northcen + west + farm + othrural + town + smcity + y74 + y76 + y78 + y80 + y82 + y84', data=df).fit())
print(smf.ols('kids ~ educ + age + agesq + black + east + northcen + west + farm + othrural + town + smcity + y74 + y76 + y78 + y80 + y82 + y84', data=df).fit().summary())


                          IV-2SLS Estimation Summary                          
Dep. Variable:                   kids   R-squared:                      0.1281
Estimator:                    IV-2SLS   Adj. R-squared:                 0.1148
No. Observations:                1129   F-statistic:                    150.13
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0000
Time:                        18:15:35   Distribution:                 chi2(17)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -7.2412     3.1890    -2.2707     0.0232     -13.491     -0.9910
age            0.5236     0.1395     3.7540     0.00

In [23]:
#Endogeneity
v2 = smf.ols('educ ~ meduc + feduc', data=df).fit().resid
print(IV2SLS.from_formula(
    'kids ~ 1 + [educ ~ meduc + feduc] + age + agesq + black + east + northcen + west + farm + othrural + town + smcity + y74 + y76 + y78 + y80 + y82 + y84 + v2', 
    data=df).fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                   kids   R-squared:                      0.1299
Estimator:                    IV-2SLS   Adj. R-squared:                 0.1158
No. Observations:                1129   F-statistic:                    177.40
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0000
Time:                        18:15:35   Distribution:                 chi2(18)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -7.4075     3.1199    -2.3743     0.0176     -13.522     -1.2926
age            0.5305     0.1380     3.8437     0.00

### Example 15.10. Job Training and Worker Productivity

In [24]:
df = dataWoo('jtrain')
df = df[(df['year']==1988)]
print(smf.ols(formula='chrsemp ~ cgrant + 1', data=df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                chrsemp   R-squared:                       0.392
Model:                            OLS   Adj. R-squared:                  0.387
Method:                 Least Squares   F-statistic:                     79.37
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           5.69e-15
Time:                        18:15:35   Log-Likelihood:                -515.77
No. Observations:                 125   AIC:                             1036.
Df Residuals:                     123   BIC:                             1041.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5093      1.558      0.327      0.7

In [25]:
df2=df.dropna()
print(IV2SLS.from_formula('clscrap  ~ 1 + [chrsemp ~ cgrant]', data=df2).fit())

                          IV-2SLS Estimation Summary                          
Dep. Variable:                clscrap   R-squared:                      0.0396
Estimator:                    IV-2SLS   Adj. R-squared:                -0.0084
No. Observations:                  22   F-statistic:                    1.7860
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.1814
Time:                        18:15:35   Distribution:                  chi2(1)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -0.0635     0.0875    -0.7258     0.4679     -0.2349      0.1079
chrsemp       -0.0129     0.0096    -1.3364     0.18

In [26]:
print(IV2SLS.from_formula(formula='clscrap  ~ 1 + chrsemp', data=df2).fit())

                            OLS Estimation Summary                            
Dep. Variable:                clscrap   R-squared:                      0.1011
Estimator:                        OLS   Adj. R-squared:                 0.0561
No. Observations:                  22   F-statistic:                    3.4283
Date:                Tue, Jul 02 2024   P-value (F-stat)                0.0641
Time:                        18:15:35   Distribution:                  chi2(1)
Cov. Estimator:                robust                                         
                                                                              
                             Parameter Estimates                              
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Intercept     -0.1138     0.0850    -1.3395     0.1804     -0.2804      0.0527
chrsemp       -0.0072     0.0039    -1.8516     0.06