# Instrumental Variables 

Here is our DAG that shows a chain of causal effects that contains 
all the information needed to understand the instrumental variables 
strategy. 

![dag-instrument](https://mixtape.scunning.com/07-Instrumental_Variables_files/figure-html/iv_dag1-1.png)

The key point is that we need to find an instrumental variable that

- $Z$ is independent with $U$
- $Z$ is correlated with $D$

When we have the omitted variable, the coefficients 
estimated from OLS is biased. Here is the simple
derivation:

$$
\begin{aligned}
\hat{\beta} & = (X'X)^{-1}X'(X\beta + Z \delta + \epsilon ) \\
            & = \beta + (X'X)^{-1}X'Z \delta + (X'X)^{-1}X' \epsilon \\
E(\hat{\beta}) &= \beta +  (X'X)^{-1} E[X'Z] \delta \\
               & = \beta + \text{bias}
\end{aligned}
$$

To estimate coefficients with instrumental variables, we could either 
esimate it directly or use two-stage least squares. The most common
IV specification uses the following estimator:

$$\hat{\beta}_{IV} = (Z'X)^{-1} Z'Y$$

If you will use two-stage least square:

Stage 1: Regression each of $X$ on $Z$

$$\hat{\delta} = (Z'Z)^{-1} Z' X $$

Then save the predicted values:

$$\hat{X} = Z \hat{\delta} $$

In the second stage, the regression of interest is estimated as usual, except that in this stage each endogenous covariate is replaced with the predicted values from the first stage:

$$Y = \hat{X} \beta + \epsilon $$

## Weak Instrument Problem

To understand weak instrument problem, read this paper: _Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable Is Weak_ 

Recent study have tried to addressed this issue: _Weak Instruments in IV Regression: Theory and Practice_

NBER has some mini courses related to this: [link](https://www.nber.org/research/lectures?endDate=&facet=lectureType%3AMethods%20Lecture&page=1&perPage=10&q=&startDate=)

Simple rule: Use the F-statistic to test for the significance of excluded instruments. If the first-stage F-statistic is smaller than 10, this indicates the presence of a weak instrument.

## Homogeneous vs. Heterogeneous Treatment Effects 

Both models, estimation, and interpretation have to be tuned based on the 
assumption that the treatment effects are homonogeneous or heterogeneous. 



In [1]:
use https://github.com/scunning1975/mixtape/raw/master/card.dta, clear

In [3]:
describe lwage educ exper black south married smsa


              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------
lwage           float   %9.0g                 
educ            float   %9.0g                 
exper           float   %9.0g                 
black           float   %9.0g                 
south           float   %9.0g                 
married         float   %9.0g                 
smsa            float   %9.0g                 


In [4]:
reg lwage educ exper black south married smsa 


      Source |       SS           df       MS      Number of obs   =     3,003
-------------+----------------------------------   F(6, 2996)      =    219.15
       Model |  180.255137         6  30.0425229   Prob > F        =    0.0000
    Residual |  410.705979     2,996  .137084773   R-squared       =    0.3050
-------------+----------------------------------   Adj R-squared   =    0.3036
       Total |  590.961117     3,002  .196855802   Root MSE        =    .37025

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0711729   .0034824    20.44   0.000     .0643447     .078001
       exper |   .0341518   .0022144    15.42   0.000     .0298098    .0384938
       black |  -.1660274   .0176137    -9.43   0.000    -.2005636   -.1314913
       south |  -.1315518   .0149691    -8.79   0.

In [7]:
ivregress 2sls lwage (educ=nearc4) exper black south married smsa


Instrumental variables (2SLS) regression          Number of obs   =      3,003
                                                  Wald chi2(6)    =     840.98
                                                  Prob > chi2     =     0.0000
                                                  R-squared       =     0.2513
                                                  Root MSE        =     .38384

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1241642   .0498975     2.49   0.013     .0263668    .2219616
       exper |   .0555882   .0202624     2.74   0.006     .0158746    .0953019
       black |  -.1156855   .0506823    -2.28   0.022    -.2150211     -.01635
       south |  -.1131647   .0232168    -4.87   0.000    -.1586687   -.0676607
     married |  -.0319754    .005081    -6.29   0.

In [9]:
reg educ nearc4 exper black south married smsa


      Source |       SS           df       MS      Number of obs   =     3,003
-------------+----------------------------------   F(6, 2996)      =    456.14
       Model |  10272.0963         6  1712.01605   Prob > F        =    0.0000
    Residual |  11244.7835     2,996  3.75326552   R-squared       =    0.4774
-------------+----------------------------------   Adj R-squared   =    0.4764
       Total |  21516.8798     3,002  7.16751492   Root MSE        =    1.9373

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      nearc4 |   .3272826   .0824239     3.97   0.000     .1656695    .4888957
       exper |   -.404434   .0089402   -45.24   0.000    -.4219636   -.3869044
       black |  -.9475281   .0905256   -10.47   0.000    -1.125027   -.7700295
       south |  -.2973528   .0790643    -3.76   0.

In [10]:
test nearc4


 ( 1)  nearc4 = 0

       F(  1,  2996) =   15.77
            Prob > F =    0.0001


In [11]:
ivregress 2sls lwage (educ=nearc4) exper black south married smsa, first


First-stage regressions
-----------------------

                                                Number of obs     =      3,003
                                                F(   6,   2996)   =     456.14
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4774
                                                Adj R-squared     =     0.4764
                                                Root MSE          =     1.9373

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       exper |   -.404434   .0089402   -45.24   0.000    -.4219636   -.3869044
       black |  -.9475281   .0905256   -10.47   0.000    -1.125027   -.7700295
       south |  -.2973528   .0790643    -3.76   0.000    -.4523787   -.1423269
 

In [12]:
* another example 
use http://fmwww.bc.edu/ec-p/data/wooldridge/mroz, clear

In [14]:
reg lwage educ 


      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(1, 426)       =     56.93
       Model |  26.3264237         1  26.3264237   Prob > F        =    0.0000
    Residual |  197.001028       426  .462443727   R-squared       =    0.1179
-------------+----------------------------------   Adj R-squared   =    0.1158
       Total |  223.327451       427  .523015108   Root MSE        =    .68003

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1086487   .0143998     7.55   0.000     .0803451    .1369523
       _cons |  -.1851969   .1852259    -1.00   0.318    -.5492674    .1788735
------------------------------------------------------------------------------


In [15]:
reg lwage educ exper 


      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(2, 425)       =     37.02
       Model |   33.132464         2   16.566232   Prob > F        =    0.0000
    Residual |  190.194987       425  .447517617   R-squared       =    0.1484
-------------+----------------------------------   Adj R-squared   =    0.1444
       Total |  223.327451       427  .523015108   Root MSE        =    .66897

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1094888   .0141672     7.73   0.000     .0816423    .1373353
       exper |   .0156736   .0040191     3.90   0.000     .0077738    .0235733
       _cons |  -.4001745   .1903682    -2.10   0.036     -.774355    -.025994
--------------------------------------------------

In [16]:
reg lwage educ exper expersq 


      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(3, 424)       =     26.29
       Model |  35.0223023         3  11.6741008   Prob > F        =    0.0000
    Residual |  188.305149       424  .444115917   R-squared       =    0.1568
-------------+----------------------------------   Adj R-squared   =    0.1509
       Total |  223.327451       427  .523015108   Root MSE        =    .66642

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .1074896   .0141465     7.60   0.000     .0796837    .1352956
       exper |   .0415665   .0131752     3.15   0.002     .0156697    .0674633
     expersq |  -.0008112   .0003932    -2.06   0.040    -.0015841   -.0000382
       _cons |  -.5220407   .1986321    -2.63   0.

In [17]:
// iv with father's education
ivreg lwage (educ = fatheduc) exper expersq


Instrumental variables (2SLS) regression

      Source |       SS           df       MS      Number of obs   =       428
-------------+----------------------------------   F(3, 424)       =      8.31
       Model |  31.9407914         3  10.6469305   Prob > F        =    0.0000
    Residual |   191.38666       424  .451383632   R-squared       =    0.1430
-------------+----------------------------------   Adj R-squared   =    0.1370
       Total |  223.327451       427  .523015108   Root MSE        =    .67185

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        educ |   .0702263   .0344427     2.04   0.042     .0025266     .137926
       exper |   .0436716   .0134001     3.26   0.001     .0173326    .0700105
     expersq |  -.0008822   .0004009    -2.20   0.028    -.0016702   -.0000941
       _

In [18]:
// first stage F test
reg educ fatheduc exper expersq 


      Source |       SS           df       MS      Number of obs   =       753
-------------+----------------------------------   F(3, 749)       =     67.20
       Model |  829.228297         3  276.409432   Prob > F        =    0.0000
    Residual |  3080.81154       749  4.11323304   R-squared       =    0.2121
-------------+----------------------------------   Adj R-squared   =    0.2089
       Total |  3910.03984       752  5.19952106   Root MSE        =    2.0281

------------------------------------------------------------------------------
        educ |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    fatheduc |   .2840103   .0208203    13.64   0.000     .2431373    .3248833
       exper |   .0879065   .0263852     3.33   0.001     .0361087    .1397043
     expersq |  -.0020435   .0008544    -2.39   0.017    -.0037208   -.0003662
       _cons |   9.214374   .2467006    37.35   0.

In [19]:
test fatheduc


 ( 1)  fatheduc = 0

       F(  1,   749) =  186.08
            Prob > F =    0.0000


Now, we try to replicate _Estimating the Electoral Effects of Voter Turnout_
with  the following model:

$$
DemoShare = \beta_0 + \beta_1 Turnout + \u 
$$

- 

In [24]:
use "./data/HansfordGomez_Data.dta", clear

In [31]:
reg demvoteshare2 turnout yr52 yr56 yr60 yr64 yr68 yr72 yr76 yr80 yr84 yr88 yr92 yr96 yr2000


      Source |       SS           df       MS      Number of obs   =    27,401
-------------+----------------------------------   F(14, 27386)    =    763.15
       Model |  1185257.54        14  84661.2526   Prob > F        =    0.0000
    Residual |  3038098.02    27,386  110.936172   R-squared       =    0.2806
-------------+----------------------------------   Adj R-squared   =    0.2803
       Total |  4223355.55    27,400  154.137064   Root MSE        =    10.533

------------------------------------------------------------------------------
demvotesha~2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     turnout |  -.1571625   .0068735   -22.87   0.000    -.1706349   -.1436901
        yr52 |  -10.21493    .344541   -29.65   0.000    -10.89025   -9.539616
        yr56 |  -8.756203   .3426223   -25.56   0.000     -9.42776   -8.084646
        yr60 |  -3.861622   .3497573   -11.04   0.

In [34]:
ivreg demvoteshare2 (turnout=dnormprcp_krig) yr52 yr56 yr60 yr64 yr68 yr72 yr76 yr80 yr84 yr88 yr92 yr96 yr2000


Instrumental variables (2SLS) regression

      Source |       SS           df       MS      Number of obs   =    27,401
-------------+----------------------------------   F(14, 27386)    =    600.55
       Model |  549701.891        14  39264.4208   Prob > F        =    0.0000
    Residual |  3673653.66    27,386  134.143492   R-squared       =    0.1302
-------------+----------------------------------   Adj R-squared   =    0.1297
       Total |  4223355.55    27,400  154.137064   Root MSE        =    11.582

------------------------------------------------------------------------------
demvotesha~2 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     turnout |   .3630942   .1752713     2.07   0.038     .0195536    .7066348
        yr52 |   -15.8325   1.928347    -8.21   0.000    -19.61216   -12.05284
        yr56 |  -13.65554   1.691514    -8.07   0.000    -16.97099   -10.34009
        