Common Problems #12

kylebutts · 2022-01-24T16:27:59Z

A quick summary of problems I can link to people

kylebutts · 2022-05-20T17:06:34Z

Triple Differences

The following is the standard triple-difference estimator (e.g. Angrist and Pischke, 2008, p.181):

$$ Y_{i g \ell t} = \gamma_{\ell t} + \lambda_{g t} + \theta_{g \ell} + \tau_{g \ell t} D_{g \ell t} + \varepsilon_{i g \ell t}, $$

where $i$ is the individual observation, $\ell$ indexes regions, $t$ indexes time, and $g$ indicates within region groups (e.g. male/female, age groups, affected/unaffected by treatment). The fixed effects include region-specific time fixed effects (common across groups), group-specific time fixed effects (common across regions), and group-region fixed effects (common across time).

To implement this in did2s, you can specify the correct first_stage formula:

first_stage = ~ 0 | region^time + group^time + group^region

Then, for example, a second_stage formula with the treatment dummy ($D_{g \ell t}$) will estimate the average treatment effect

kylebutts · 2022-05-20T17:08:37Z

Big Data / Matrix Problems

The main pain point in the code is calculating analytic standard errors. The formula for them requires me to store, in memory, the full (sparse) matrix including fixed effects.

Two sources of problems:

I had to write a bespoke function that makes this matrix from a fixest estimate (fixest makes estimation super fast). This can be buggy. I have a new version written with the help of @lrberge that is more robust. You can try it by installing the github version of the package: remotes::install_github(“kylebutts/did2s”)
I store the matrix of fixed effects as a sparse matrix (since most values of a unit/time fixed effect is zeros). This saves a bunch of memory, but sometimes, even a sparse matrix is too large to hold in memory. Either many fixed effects or many observations

In either case, if your code faults, you can add the function parameter bootstrap = TRUE. In this case, it won’t calculate the standard error analytically. An example:

library(did2s)

did2s(df_het, 
    yname = "dep_var", first_stage = ~ 0 | state + year, 
    second_stage = ~i(treat, ref=FALSE), treatment = "treat", 
    cluster_var = "state", bootstrap = T, n_boostraps = 250
)
#> Running Two-stage Difference-in-Differences
#> • first stage formula `~ 0 | state + year`
#> • second stage formula `~ i(treat, ref = FALSE)`
#> • The indicator variable that denotes when treatment is on is `treat`
#> • Standard errors will be block bootstrapped with cluster `state`
#> • Starting 250 bootstraps at cluster level: state
#> OLS estimation, Dep. Var.: dep_var
#> Observations: 46,500 
#> Standard-errors: Custom 
#>             Estimate Std. Error t value  Pr(>|t|)    
#> treat::TRUE  2.15221   0.046291 46.4928 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 1.41487   Adj. R2: 0.337905

^{Created on 2022-05-20 by the reprex package (v2.0.1)}

kylebutts closed this as completed Jan 24, 2022

kylebutts pinned this issue Mar 29, 2022

kylebutts mentioned this issue May 31, 2022

Enable large Armadillo matrices #17

Merged

kylebutts reopened this Oct 4, 2022

kazuyanagimoto mentioned this issue Apr 7, 2023

Stata ↔ R issue kazuyanagimoto/staggered_did_tutorial#8

Closed

1 task

andrewbaxter439 mentioned this issue Jun 10, 2024

Recover triple-difference and subgroup effects from same model #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common Problems #12

Common Problems #12

kylebutts commented Jan 24, 2022 •

edited

Loading

kylebutts commented May 20, 2022

kylebutts commented May 20, 2022 •

edited

Loading

Common Problems #12

Common Problems #12

Comments

kylebutts commented Jan 24, 2022 • edited Loading

kylebutts commented May 20, 2022

Triple Differences

kylebutts commented May 20, 2022 • edited Loading

Big Data / Matrix Problems

kylebutts commented Jan 24, 2022 •

edited

Loading

kylebutts commented May 20, 2022 •

edited

Loading