Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard errors from did2s #13

Closed
friosavila opened this issue Mar 21, 2023 · 3 comments
Closed

Standard errors from did2s #13

friosavila opened this issue Mar 21, 2023 · 3 comments

Comments

@friosavila
Copy link

Hi Kyle
Not sure if you are still maintaining the program. But, while doing some notes on the methodology from scratch, i constructed a small dataset to run and try few commands (jwdid csdid did_impute and did2s).
The problem I find right now is that did2s does not seem to be providing the correct standard errors. They are, in fact, larger than the other estimators by far.
So, thinking about it again, I just went and constructed the GMM version of it, which produces the right numbers.
Would you mind taking a look and see what you think?

data:

clear
set seed 10101
set obs 100  // <- 100 units
gen id = _n
gen ai = rchi2(2)
// determines When would units receive treatment
gen     g = runiformint(2,10)
replace g = 999 if g>9   // never treated       
expand 10   // <-T=10
bysort id:gen t=_n 
gen event = max(0,t-g)
gen aux = runiform()*2
bysort t:gen at = aux[1] // Determines Time fixed effect
gen te = 0*rnormal()+(1-t/10)+(1-event/10)  // Treatment effect is but vanishes with time
gen eit= rnormal()
gen y = ai + at + te * (t>=g) + eit
replace g = 0 if g==999 
gen teff = te if g>0 & t>=g
sum teff

gen trt = t>=g
replace trt=0 if g==0

gen evnt=(t-g)*(g>0)+-1*(g==0)
gen evnt2 = evnt+9

SE using did2s

did2s y, first_stage(i.g i.t) treatment(trt) second_stage(ttrt) cluster(i)

. did2s y, first_stage(i.g i.t) treatment(trt) second_stage(ttrt) cluster(i)
(455 missing values generated)
                                     (Std. err. adjusted for clustering on id)
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        ttrt |   .8484402   .4099742     2.07   0.038     .0449056    1.651975
------------------------------------------------------------------------------

and using gmm:

. gmm ((y-{a0}-{a_i:i.g}-{a_t:i.t})*(trt==0)) ///
>         ( (y-{a0}-{a_i:}-{a_t:}-{att})*trt) , ///
>         winit(identity)  instruments(1:i.g i.t) onestep ///
>         quickderivatives        vce(cluster i)  

Step 1
Iteration 0:   GMM criterion Q(b) =  8.3872615  
Iteration 1:   GMM criterion Q(b) =  6.636e-25  
Iteration 2:   GMM criterion Q(b) =  9.708e-33  

note: model is exactly identified.

GMM estimation 

Number of parameters =  19
Number of moments    =  19
Initial weight matrix: Identity                   Number of obs   =      1,000

                                   (Std. err. adjusted for 100 clusters in id)
------------------------------------------------------------------------------
             |               Robust
             | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
a0           |
       _cons |    2.64532   .6996949     3.78   0.000     1.273943    4.016696
-------------+----------------------------------------------------------------
a_i          |
           g |
          2  |  -.0212671   .9900923    -0.02   0.983    -1.961812    1.919278
          3  |   -.140772   .9407414    -0.15   0.881    -1.984591    1.703047
          4  |  -.0025464   .7686242    -0.00   0.997    -1.509022    1.503929
          5  |  -.9571431   .7780565    -1.23   0.219    -2.482106    .5678197
          6  |   .8305891   1.006579     0.83   0.409    -1.142269    2.803448
          7  |  -.6738226   1.213618    -0.56   0.579     -3.05247    1.704825
          8  |   .6638362    1.18747     0.56   0.576    -1.663563    2.991235
          9  |  -.3110159   .9653441    -0.32   0.747    -2.203056    1.581024
-------------+----------------------------------------------------------------
a_t          |
           t |
          2  |   1.442357   .1514685     9.52   0.000     1.145484    1.739229
          3  |   1.832901    .184535     9.93   0.000     1.471219    2.194583
          4  |  -.0766473   .1718854    -0.45   0.656    -.4135364    .2602419
          5  |   .0549625    .182864     0.30   0.764    -.3034443    .4133694
          6  |   1.678028   .2372397     7.07   0.000     1.213046    2.143009
          7  |   1.160794   .2104858     5.51   0.000     .7482496    1.573339
          8  |   1.498006   .2312972     6.48   0.000     1.044672     1.95134
          9  |  -.2187513   .3737996    -0.59   0.558    -.9513852    .5138825
         10  |   1.966702   .2988338     6.58   0.000     1.380999    2.552406
-------------+----------------------------------------------------------------
        /att |   .8484402   .1588131     5.34   0.000     .5371722    1.159708
------------------------------------------------------------------------------

which are very close to did_imputation


.  did_imputation y i t g2, autosample

                                                         Number of obs = 1,000
------------------------------------------------------------------------------
           y | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         tau |   .8484403   .1579797     5.37   0.000     .5388057    1.158075
------------------------------------------------------------------------------
@kylebutts
Copy link
Owner

Actually, ignore my previous comment (I deleted it)! The difference between did_imputation and did2s is due to using i.g in the first_stage in the did2s call. Replacing that with i.i (which is what did_imputation) uses produces ~ the same standard error.

I'm not sure why gmm is producing smaller standard errors with i.g, but I'm pretty confident in the code. I just double checked all the theory and the results match between Stata and R.

@friosavila
Copy link
Author

friosavila commented Mar 21, 2023 via email

@kylebutts
Copy link
Owner

Ahh, nope, the discrepancy is you are clustering on i instead of g. Clustering both gmm and did2s on g makes them the same.

When you use group instead of individual fixed effects, the point estimate is the same (basically because the average of unit FEs in a group is equal to the group FE) but you're absorbing less variation so the standard errors are bigger. That's what you're seeing in the comparison of did2s vs. did_imputation. I don't think anything is a problem, just that you're not absorbing as much variation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants