Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Problems #12

Open
kylebutts opened this issue Jan 24, 2022 · 2 comments
Open

Common Problems #12

kylebutts opened this issue Jan 24, 2022 · 2 comments

Comments

@kylebutts
Copy link
Owner

kylebutts commented Jan 24, 2022

A quick summary of problems I can link to people

@kylebutts kylebutts pinned this issue Mar 29, 2022
@kylebutts
Copy link
Owner Author

Triple Differences

The following is the standard triple-difference estimator (e.g. Angrist and Pischke, 2008, p.181):

$$ Y_{i g \ell t} = \gamma_{\ell t} + \lambda_{g t} + \theta_{g \ell} + \tau_{g \ell t} D_{g \ell t} + \varepsilon_{i g \ell t}, $$

where $i$ is the individual observation, $\ell$ indexes regions, $t$ indexes time, and $g$ indicates within region groups (e.g. male/female, age groups, affected/unaffected by treatment). The fixed effects include region-specific time fixed effects (common across groups), group-specific time fixed effects (common across regions), and group-region fixed effects (common across time).

To implement this in did2s, you can specify the correct first_stage formula:

first_stage = ~ 0 | region^time + group^time + group^region

Then, for example, a second_stage formula with the treatment dummy ($D_{g \ell t}$) will estimate the average treatment effect

@kylebutts
Copy link
Owner Author

kylebutts commented May 20, 2022

Big Data / Matrix Problems

The main pain point in the code is calculating analytic standard errors. The formula for them requires me to store, in memory, the full (sparse) matrix including fixed effects.

Two sources of problems:

  1. I had to write a bespoke function that makes this matrix from a fixest estimate (fixest makes estimation super fast). This can be buggy. I have a new version written with the help of @lrberge that is more robust. You can try it by installing the github version of the package: remotes::install_github(“kylebutts/did2s”)

  2. I store the matrix of fixed effects as a sparse matrix (since most values of a unit/time fixed effect is zeros). This saves a bunch of memory, but sometimes, even a sparse matrix is too large to hold in memory. Either many fixed effects or many observations

In either case, if your code faults, you can add the function parameter bootstrap = TRUE. In this case, it won’t calculate the standard error analytically. An example:

library(did2s)

did2s(df_het, 
    yname = "dep_var", first_stage = ~ 0 | state + year, 
    second_stage = ~i(treat, ref=FALSE), treatment = "treat", 
    cluster_var = "state", bootstrap = T, n_boostraps = 250
)
#> Running Two-stage Difference-in-Differences
#> • first stage formula `~ 0 | state + year`
#> • second stage formula `~ i(treat, ref = FALSE)`
#> • The indicator variable that denotes when treatment is on is `treat`
#> • Standard errors will be block bootstrapped with cluster `state`
#> • Starting 250 bootstraps at cluster level: state
#> OLS estimation, Dep. Var.: dep_var
#> Observations: 46,500 
#> Standard-errors: Custom 
#>             Estimate Std. Error t value  Pr(>|t|)    
#> treat::TRUE  2.15221   0.046291 46.4928 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 1.41487   Adj. R2: 0.337905

Created on 2022-05-20 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant