Skip to content

frederickluser/ecic

Repository files navigation

Extended Changes-in-Changes (ECIC)

R-CMD-check CRAN version CRAN checks CRAN downloads Dependencies

ecic estimates a changes-in-changes model with multiple periods and cohorts as suggested in Athey and Imbens (2006). Changes-in-changes is a generalization of the difference-in-differences approach, estimating a treatment effect for the entire distribution instead of averages.

Athey and Imbens (2006) show how to extend the model to multiple periods and cohorts, analogously to a two-way fixed-effects model for averages. This package implements this, calculating standard errors via bootstrap and plotting results, aggregated or in an event-study-style fashion.

Installation

ecic is available on CRAN using:

install.packages("ecic")

You can install the newest version from GitHub:

# install.packages("remotes")
remotes::install_github("frederickluser/ecic")

Basic Example

Estimation

Let's look at a short example how to use the package. First, load some simulated sample data.

library(ecic)
data(dat, package = "ecic")

head(dat)
#>  countyreal  first.treat   year time_to_treat   lemp
#>       <int>        <int>  <int>         <int>  <dbl>
#>           3        1980    1980             0   2.21
#>           3        1980    1981             1   3.33
#>           3        1980    1982             2   3.67
#>           5        1980    1980             0   2.77
#>           5        1980    1981             1   3.88
#>           5        1980    1982             2   3.80

Then, the function ecic estimates the changes-changes-model:

# Estimate the model
mod =
  ecic(
    yvar  = lemp,         # dependent variable
    gvar  = first.treat,  # group indicator
    tvar  = year,         # time indicator
    ivar  = countyreal,   # unit ID
    dat   = dat,          # dataset
    boot  = "weighted",   # bootstrap proceduce ("no", "normal", or "weighted")
    nReps = 10            # number of bootstrap runs
    )

The input gvar denotes the period in which this individual receives the treatment. mod contains for every bootstrap run the point-estimates. The function summary then combines all bootstrap runs to a quantile treatment effect and adds standard errors:

mod_res = summary(mod) 

#> perc    coefs         se
#>  0.1 1.215531 0.02670761
#>  0.2 1.324130 0.02310521
#>  0.3 1.458270 0.02105119
#>  0.4 1.590848 0.02128534
#>  0.5 1.747296 0.02098057
#>  0.6 1.921818 0.02135982
#>  0.7 2.124138 0.01802972
#>  0.8 2.372483 0.01799869
#>  0.9 2.787395 0.02241811

Plotting

Finally, results can be plotted using ecic_plot.

ecic_plot(mod_res)

Event-Study Example

The package also allows to report event-study-style results of the effect. To do so, simply add the es = T argument to the estimation and summary will report effects for every event period.

# Estimate the model
mod =
  ecic(
    yvar  = lemp,         # dependent variable
    gvar  = first.treat,  # group indicator
    tvar  = year,         # time indicator
    ivar  = countyreal,   # unit ID
    dat   = dat,          # dataset
    es    = T,            # report an event study
    boot  = "weighted",   # bootstrap proceduce ("no", "normal", or "weighted")
    nReps = 10            # number of bootstrap runs
    )

# report results for every event period
mod_res = summary(mod) 

#> [[1]]
#> perc es     coefs         se
#>  0.1  0 0.9175263 0.02924326
#>  0.2  0 0.9675225 0.02508082
#>  0.3  0 0.9959150 0.02116782
#>  0.4  0 1.0388312 0.02373263
#>  0.5  0 1.0992322 0.02558309
#>  0.6  0 1.1496203 0.03078493
#>  0.7  0 1.2049797 0.03654320
#>  0.8  0 1.2519476 0.03291178
#>  0.9  0 1.3616626 0.01765538

#> [[2]]
#> perc es    coefs          se
#>  0.1  1 2.393816 0.022273736
#>  0.2  1 2.386941 0.020039276
#>  0.3  1 2.423415 0.017145110
#>  0.4  1 2.452259 0.017982620
#>  0.5  1 2.484616 0.009979006
#>  0.6  1 2.525388 0.012816760
#>  0.7  1 2.575615 0.015196499
#>  0.8  1 2.630959 0.019570320
#>  0.9  1 2.730742 0.024796025

#> [...]

Plotting

In addition to a standard plot showing everything at once, event-study results can be plotted for every period individually with the option es_type = "for_periods".

ecic_plot(
    mod_res, 
    periods_plot = c(0, 2),   # which periods you want to show
    es_type = "for_periods",  # plots by period
    ylim = c(.5, 4)           # same y-axis
    )

Alternatively, es_type = "for_quantiles" generates one plot for every quantile of interest.

ecic_plot(
    mod_res, 
    periods_plot = c(.1, .5, .9), # which quantiles you want to show
    es_type = "for_quantiles",    # plots by period
    ylim = c(.5, 5)               # same y-axis
    )

Under the hood

Estimation

For every treated cohort, we observe the distribution of the potential outcome $Y(1)$. In the case of two groups / cohorts and two periods, Athey and Imbens (2006) show how to construct the counterfactual $Y(0)$. This extends to the case with multiple cohorts and periods, where every not-yet-treated cohort is a valid comparison group. Hence, every combination of treated and not-yet-treated cohorts with a common pre-treatment period estimates in theory the quantile treatment effect. Then, I simply want to average them.

Yet, since it is not allowed to simply average quantile treatment effects, we must first store the empirical CDF of $Y(1)$ and $Y(0)$ for every two-by-two case. Next, I aggregate all estimated CDFs to get the plug-in estimates of $Y(1)$ and $Y(0)$, weighting for the cohort sizes. Finally, I invert them to get the quantile function and compute the quantile treatment effect as in the standard case.

Note that it is impossible to estimate a quantile treatment effect for units treated in the first (no pre-treatment period) and last period (no comparison cohort). In addition, the default value of nMin skips small cohorts (default nMin = 40) as we need more observations to estimate quantile treatment effects compared to an average effect.

Speed Improvements

Technically, ecic generates a grid over the dependent variable and imputes all empirical CDFs for every unique value of yvar. You can (cautiously) speed up the imputation by rounding the dependent variable to n_digits.

Bootstrap

I calculate standard errors by bootstrap. I resample with replacement the entire dataset and estimate $Y(1)$ and $Y(0)$ nRep times (default nReps = 1). Bootstrap can either be computed through replacement over the entire dataset (with boot = "normal") or you can weight by cohort sizes (with boot = "weighted") if you worry, for example, about small cohorts. This part can be parallelized by setting nCores > 1, speeding up the computation at the cost of additional overhead to load the cores.

progress_bar prints the progress of the bootstrapping by default. Alternatively, the option progress_bar = "cli" also shows estimated running time, but requires the cli package to be installed.

Next Steps for the package

  • Add covariates