# AQA-II Lab 10 Propensity Score Matching
### Eric G. Zhou, NYU Wagner School

## Treatment effects

Assume we have a binary treatment $T_i = \{1, 0\}$, and accordingly we observe two outcomes $Y_{1,i}$ and $Y_{0,i}$, indicating the treated and untreated outcome repsectively. Usually, in an observational study, we tend to estimate 

$$ \underbrace{\hat{\beta}}_{\text{naive treatment effect estimator}} = E[ (Y_{1,i}|T_i = 1)]  - E[(Y_{0,i}|T_i = 0)],$$

however, imagine in a counterfactual world, where we could get a hold of $(Y_{1,i}|T_i = 1)$ and $(Y_{1,i}|T_i = 0)$, which is the outcome when treated and untreated (do not exist, counterfactual); certainly, also $(Y_{0,i}|T_i = 1)$ and $(Y_{0,i}|T_i = 0)$ for the same reason. 

**Now, can you see what might have been wrong in the equation above?**

\begin{align}
\underbrace{\hat{\beta}}_{\text{naive treatment effect estimator}} & = E[(Y_{1,i}|T_i = 1)] - E[(Y_{0,i}|T_i = 1)] - E[(Y_{0,i}|T_i = 0)] + E[(Y_{0,i}|T_i = 1)] \\
& = \underbrace{E[(Y_{1,i} - Y_{0,i}|T_i = 1)]}_{\text{ATET}} - \underbrace{E[(Y_{0,i}|T_i = 0)] + E[(Y_{0,i}|T_i = 1)]}_{\text{selection biase}}
\end{align}


However, if the **conditional independence assumption (CIA)** holds, i.e., $y_0, y_1 \perp T|x$, we could derive a different treatment effect than above, why?

Becuase in that case the counterfactual outcome will be independent from the treatment status, the implication will then be $(Y_{1,i}|T_i = 1) =(Y_{1,i}|T_i = 0)$ and $(Y_{0,i}|T_i = 1) = (Y_{0,i}|T_i = 0)$. 


So the first equation becomes 

\begin{align}
\underbrace{\hat{\beta}}_{\text{naive treatment effect estimator}} & = E[ (Y_{1,i}|T_i = 1)]  - E[(Y_{0,i}|T_i = 0)] \\
& = \underbrace{E[ (Y_{1,i}|T_i = 1)]  - E[(Y_{0,i}|T_i = 1)]}_{\text{ATET}} \\
& = \underbrace{E[ (Y_{1,i}|T_i = 0)]  - E[(Y_{0,i}|T_i = 0)]}_{\text{ATC}} \\
& = \underbrace{E[ Y_{1,i}] - E[Y_{0,i}]}_{\text{ATE}}
\end{align}

This is the what **Randomized Control Trial (RCT)** can achieve. 

In the Propensity Score matching framework, although we assume CIA to obtain ATE, we could still manipulate the propensity weight to calculate ATET and ATC, for example

$$ \text{ATE} = E[y_0 - y_1] = E\big[ \frac{(D - p(x))y}{p(x)(1-p(x))}  \big]$$

$$ \text{ATET} = E[y_0 - y_1| D =1 ] = E\big[ \frac{(D - p(x))y}{\text{Pr}[D=1](1-p(x))}  \big]$$

Question for you all, how to estimate $\text{Pr}[D=1]$?

## Various matching methods

In [11]:
import delim lalonde.csv, clear

(10 vars, 614 obs)


In [13]:
%head

Unnamed: 0,v1,treat,age,educ,race,married,nodegree,re74,re75,re78
1,1,1,37,11,black,1,1,0,0,9930.0459
2,2,1,22,9,hispan,0,1,0,0,3595.894
3,3,1,30,12,black,0,0,0,0,24909.449
4,4,1,27,11,black,0,1,0,0,7506.146
5,5,1,33,8,black,0,1,0,0,289.78989
6,6,1,22,9,black,0,1,0,0,4056.4939
7,7,1,23,12,black,0,0,0,0,0.0
8,8,1,32,11,black,0,1,0,0,8472.1582
9,9,1,22,16,black,0,0,0,0,2164.022
10,10,1,33,12,white,1,0,0,0,12418.07


- The variable `re78` is the outcome variable, and `treat` is the treatment variable, all other variables are control variables. 

- A naive model 

In [20]:
reg re78 treat


      Source |       SS           df       MS      Number of obs   =       614
-------------+----------------------------------   F(1, 612)       =      0.93
       Model |  52124752.5         1  52124752.5   Prob > F        =    0.3342
    Residual |  3.4161e+10       612  55817843.2   R-squared       =    0.0015
-------------+----------------------------------   Adj R-squared   =   -0.0001
       Total |  3.4213e+10       613  55811818.5   Root MSE        =    7471.1

------------------------------------------------------------------------------
        re78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       treat |  -635.0263   657.1374    -0.97   0.334    -1925.544    655.4916
       _cons |    6984.17   360.7097    19.36   0.000     6275.791    7692.549
------------------------------------------------------------------------------


In [56]:
ttest re78, by(treat)


Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |     429     6984.17    352.1655    7294.162    6291.981    7676.359
       1 |     185    6349.144    578.4229    7867.402    5207.949    7490.338
---------+--------------------------------------------------------------------
combined |     614    6792.834    301.4942    7470.731    6200.748    7384.921
---------+--------------------------------------------------------------------
    diff |            635.0263    657.1374               -655.4916    1925.544
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =   0.9664
Ho: diff = 0                                     degrees of freedom =      612

    Ha: dif

In [21]:
matrix naive = e(b) 

- a naive model with controls 

In [22]:
reg re78 treat age educ i.race married nodegree re74 re75


      Source |       SS           df       MS      Number of obs   =       614
-------------+----------------------------------   F(9, 604)       =     11.64
       Model |  5.0554e+09         9   561713775   Prob > F        =    0.0000
    Residual |  2.9157e+10       604  48273544.4   R-squared       =    0.1478
-------------+----------------------------------   Adj R-squared   =    0.1351
       Total |  3.4213e+10       613  55811818.5   Root MSE        =    6947.9

------------------------------------------------------------------------------
        re78 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       treat |   1548.244   781.2793     1.98   0.048     13.88986    3082.598
         age |   12.97763   32.48891     0.40   0.690    -50.82731    76.78258
        educ |   403.9412   158.9062     2.54   0.011     91.86538    716.0171
             |
        race |
     hispan  |   173

In [23]:
matrix naive_wc = e(b) 

- Plain PS matching

In [42]:
psmatch2 treat age educ i.race married nodegree re74 re75, out(re78) common 


Probit regression                               Number of obs     =        614
                                                LR chi2(8)        =     265.36
                                                Prob > chi2       =     0.0000
Log likelihood = -243.06433                     Pseudo R2         =     0.3531

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .008647     .00771     1.12   0.262    -.0064642    .0237583
        educ |   .0921263   .0379202     2.43   0.015     .0178041    .1664485
             |
        race |
     hispan  |   -1.24772   .2068295    -6.03   0.000    -1.653098   -.8423414
      white  |  -1.775345   .1502617   -11.82   0.000    -2.069852   -1.480837
             |
     married |  -.4752956   .1638448    -2.90   0.004    -.7964255   -.1541657
    n

- ask stata to calculate ATE for you

In [43]:
psmatch2 treat age educ i.race married nodegree re74 re75, out(re78) ate common 


Probit regression                               Number of obs     =        614
                                                LR chi2(8)        =     265.36
                                                Prob > chi2       =     0.0000
Log likelihood = -243.06433                     Pseudo R2         =     0.3531

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .008647     .00771     1.12   0.262    -.0064642    .0237583
        educ |   .0921263   .0379202     2.43   0.015     .0178041    .1664485
             |
        race |
     hispan  |   -1.24772   .2068295    -6.03   0.000    -1.653098   -.8423414
      white  |  -1.775345   .1502617   -11.82   0.000    -2.069852   -1.480837
             |
     married |  -.4752956   .1638448    -2.90   0.004    -.7964255   -.1541657
    n

- However, no estimated s.e. avaialble, that's bad, but no worries, stata is nice

In [45]:
psmatch2 treat age educ i.race married nodegree re74 re75, out(re78) ate ai(5) common 
matrix ps_ate = r(ate)
matrix ps_att = r(att)
matrix ps_atc = r(atu)



Probit regression                               Number of obs     =        614
                                                LR chi2(8)        =     265.36
                                                Prob > chi2       =     0.0000
Log likelihood = -243.06433                     Pseudo R2         =     0.3531

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .008647     .00771     1.12   0.262    -.0064642    .0237583
        educ |   .0921263   .0379202     2.43   0.015     .0178041    .1664485
             |
        race |
     hispan  |   -1.24772   .2068295    -6.03   0.000    -1.653098   -.8423414
      white  |  -1.775345   .1502617   -11.82   0.000    -2.069852   -1.480837
             |
     married |  -.4752956   .1638448    -2.90   0.004    -.7964255   -.1541657
    

In [47]:
matrix comb =  ps_ate \ ps_att \ ps_atc 
matrix list comb
matrix list naive
matrix list naive_wc




comb[3,1]
           c1
r1  1240.5136
r1  1221.5907
r1  1249.5905


naive[1,2]
         treat       _cons
y1  -635.02625   6984.1698


naive_wc[1,11]
                                                1b.          2.          3.
         treat         age        educ        race        race        race
y1   1548.2438   12.977634   403.94124           0   1739.5409   1240.6441

                                                               
       married    nodegree        re74        re75       _cons
y1    406.6208   259.81735   .29637744   .23152588  -1174.1296


- matching with a kernel smoother

In [39]:
psmatch2 treat age educ i.race married nodegree re74 re75, kernel k(normal) bw(.1) out(re78) ate common


Probit regression                               Number of obs     =        614
                                                LR chi2(8)        =     265.36
                                                Prob > chi2       =     0.0000
Log likelihood = -243.06433                     Pseudo R2         =     0.3531

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .008647     .00771     1.12   0.262    -.0064642    .0237583
        educ |   .0921263   .0379202     2.43   0.015     .0178041    .1664485
             |
        race |
     hispan  |   -1.24772   .2068295    -6.03   0.000    -1.653098   -.8423414
      white  |  -1.775345   .1502617   -11.82   0.000    -2.069852   -1.480837
             |
     married |  -.4752956   .1638448    -2.90   0.004    -.7964255   -.1541657
    n

In [40]:
psmatch2 treat age educ i.race married nodegree re74 re75, kernel k(normal) bw(.1) out(re78) ate common ai(2)

Option -ai- only allowed when doing nearest neighbor matching on the covariates 
> (Mahalanobis)


r(198);





- no s.e., what to do? **Keep calm and bootstrap**

In [41]:
bs r(ate) : psmatch2 treat age educ i.race married nodegree re74 re75, kernel k(normal) bw(.1) outcome(re78) ate common 

(running psmatch2 on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50

Bootstrap results                               Number of obs     =        614
                                                Replications      =         50

      command:  psmatch2 treat age educ i.race married nodegree re74 re75,
                    kernel k(normal) bw(.1) outcome(re78) ate common
        _bs_1:  r(ate)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   1337.268   857.8912     1.56   0.119    -344.1678    3018.704
------------------------------------------------------------------------------


 - matching with K-nearest neighbor

In [49]:
psmatch2 treat age educ i.race married nodegree re74 re75, outcome(re78) ate n(4) logit common ai(4)



Logistic regression                             Number of obs     =        614
                                                LR chi2(8)        =     263.65
                                                Prob > chi2       =     0.0000
Log likelihood = -243.92197                     Pseudo R2         =     0.3508

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0157771   .0135771     1.16   0.245    -.0108335    .0423876
        educ |   .1613069   .0651264     2.48   0.013     .0336614    .2889524
             |
        race |
     hispan  |  -2.081734   .3672147    -5.67   0.000    -2.801462   -1.362007
      white  |  -3.065368   .2865262   -10.70   0.000    -3.626949   -2.503787
             |
     married |  -.8321133   .2903292    -2.87   0.004    -1.401148   -.2630786
    

- Mathcing with a local linear regression

In [54]:
psmatch2 treat age educ i.race married nodegree re74 re75, llr  bw(.1) outcome(re78) ate common


Probit regression                               Number of obs     =        614
                                                LR chi2(8)        =     265.36
                                                Prob > chi2       =     0.0000
Log likelihood = -243.06433                     Pseudo R2         =     0.3531

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .008647     .00771     1.12   0.262    -.0064642    .0237583
        educ |   .0921263   .0379202     2.43   0.015     .0178041    .1664485
             |
        race |
     hispan  |   -1.24772   .2068295    -6.03   0.000    -1.653098   -.8423414
      white  |  -1.775345   .1502617   -11.82   0.000    -2.069852   -1.480837
             |
     married |  -.4752956   .1638448    -2.90   0.004    -.7964255   -.1541657
    n

In [57]:
bs r(att): psmatch2 treat age educ i.race married nodegree re74 re75, llr  bw(.1) outcome(re78) ate common

(running psmatch2 on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50

Bootstrap results                               Number of obs     =        614
                                                Replications      =         50

      command:  psmatch2 treat age educ i.race married nodegree re74 re75, llr
                    bw(.1) outcome(re78) ate common
        _bs_1:  r(att)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |    843.134   770.1087     1.09   0.274    -666.2513    2352.519
------------------------------------------------------------------------------
