/
Theory.Rmd
196 lines (120 loc) · 11.1 KB
/
Theory.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
---
title: "Theory"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Theory}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
header-includes:
- \usepackage{amsmath}
- \usepackage{amssymb}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## 1. Conceptual Framework
### 1.1 Notation and Key Concepts
- $i$: Index for individual unit.
- $t$: Time period.
- $D_{i,t}$: Binary indicator for treatment. We assume throughout that treatment is received permanently once it has been received for the first time. In other words, $D_{i,t}=1 \implies D_{i,t+1}=1$.
- $G_i$: Treatment cohort, i.e., the time at which treatment is first received by $i$. That is, $G_i = g \implies D_{i,t}=1, \forall t\geq g$. Note: If treatment is not received, $G_i = \infty$.
- $Y_{i,t}$: Observed outcome of interest.
- $Y_{i,t}(g)$: Counterfactual outcome if treatment cohort were $G_i=g$.
### 1.2 Goal
Our goal is to identify the average treatment effect on the treated (ATT), for cohort $g$ at event time $e \equiv t-g$, which is defined by:
$$
\text{ATT}_{g,e} \equiv \mathbb{E}[Y_{i,g+e}(g) - Y_{i,g+e}(\infty) | G_i = g]
$$
We may also be interested in the average ATT across treated cohorts for a given event time:
$$
\text{ATT}_{e} \equiv \sum_g \omega_{g,e} \text{ATT}_{g,e}, \quad \omega_{g,e} \equiv \frac{\sum_i 1\{G_i=g\}}{\sum_i 1\{G_i < \infty\}}
$$
Lastly, we may be interested in the average across certain event times of the average ATT across cohorts:
$$
\text{ATT}_{E} \equiv \frac{1}{|E|} \sum_{e \in E} \text{ATT}_{e}
$$
where $E$ is a set of event times, e.g., $E = \{1,2,3\}$.
### 1.3 Difference-in-differences
**Control group:** For the treated cohort $G_i = g$, let $C_{g,e}$ denote the corresponding set of units $i$ that belong to a control group.
- At a minimum, the control group must satisfy $i \in C_{g,e} \implies G_i > \max\{g, g+e\}$. This says that the control group must belong to a later cohort than the treated group of interest, and the control group must not have been treated yet by the event time of interest.
**Base event time:** We consider a reference event time from before treatment $b$, which satisfies $b<0$.
**Difference-in-differences:** The difference-in-differences estimand is defined by,
$$
\text{DiD}_{g,e} \equiv \mathbb{E}[Y_{i,g+e} - Y_{i,g+b} | G_i = g] - \mathbb{E}[Y_{i,g+e} - Y_{i,g+b} | i \in C_{g,e}]
$$
## 2. Identification
Throughout this section, our goal is to identify $\text{ATT}_{g,e}$ for some treated cohort $g$ and some event time $e$. We take the base event time $b<0$ as given.
### 2.1 Identifying Assumptions
**Parallel Trends:**
$$
\mathbb{E}[Y_{i,g+e}(\infty) - Y_{i,g+b}(\infty) | G_i = g] = \mathbb{E}[Y_{i,g+e}(\infty) - Y_{i,g+b}(\infty) | i \in C_{g,e}]
$$
This says that, in the absence of treatment, the treatment and control groups would have experienced the same average change in their outcomes between event time $b$ and event time $e$.
**No Anticipation:**
$$
\mathbb{E}[ Y_{i,g+b}(g) | G_i = g] = \mathbb{E}[ Y_{i,g+b}(\infty) | G_i = g]
$$
This says that, at base event time $b$, the observed outcome for the treated cohort would have been the same if it had instead been assigned to never receive treatment.
### 2.2 Proof of Identification by DiD
We prove that $\text{DiD}_{g,e}$ identifies $\text{ATT}_{g,e}$ in three steps:
**Step 1:** Add and subtract $Y_{i,g+b}(\infty)$ from the ATT definition:
$$
\text{ATT}_{g,e} \equiv \mathbb{E}[Y_{i,g+e}(g) - Y_{i,g+e}(\infty) | G_i = g]
$$
$$
= \mathbb{E}[Y_{i,g+e}(g) - Y_{i,g+b}(\infty) | G_i = g] - \mathbb{E}[Y_{i,g+e}(\infty) - Y_{i,g+b}(\infty) | G_i = g]
$$
**Step 2:** Assume that Parallel Trends holds. Then, we can replace the conditioning set $G_i=g$ with the conditioning set $i \in C_{g,e}$ in the second term:
$$
\text{ATT}_{g,e} = \mathbb{E}[Y_{i,g+e}(g) - Y_{i,g+b}(\infty) | G_i = g] - \mathbb{E}[Y_{i,g+e}(\infty) - Y_{i,g+b}(\infty) | G_i = g]
$$
$$
= \mathbb{E}[Y_{i,g+e}(g) - Y_{i,g+b}(\infty) | G_i = g] - \mathbb{E}[Y_{i,g+e}(\infty) - Y_{i,g+b}(\infty) | i \in C_{g,e}]
$$
**Step 3:** Assume that No Anticipation holds. Then, we can replace $Y_{i,g+b}(\infty)$ with $Y_{i,g+b}(g)$ if the conditioning set is $G_i = g$:
$$
\text{ATT}_{g,e} = \mathbb{E}[Y_{i,g+e}(g) - Y_{i,g+b}(\infty) | G_i = g] - \mathbb{E}[Y_{i,g+e}(\infty) - Y_{i,g+b}(\infty) | i \in C_{g,e}]
$$
$$
= \mathbb{E}[Y_{i,g+e}(g) - Y_{i,g+b}(g) | G_i = g] - \mathbb{E}[Y_{i,g+e}(\infty) - Y_{i,g+b}(\infty) | i \in C_{g,e}]
$$
where the final expression is $\text{DiD}_{g,e}$.
Thus, we have shown that $\text{DiD}_{g,e} = \text{ATT}_{g,e}$ if Parallel Trends and No Anticipation hold.
## 3. The `DiDge(...)` Command
$\text{DiD}_{g,e}$ is estimated in `DiDforBigData` by the `DiDge(...)` command, which is documented [here](https://setzler.github.io/DiDforBigData/reference/DiDge.html).
### 3.1 Automatic Control Group Selection
**All:** The largest valid control group is $C_{g,e} \equiv \{ i : G_i > \max\{g, g+e\}\}$. To use this control group, specify `control_group = "all"` in the `DiDge(...)` command. This option is selected by default.
Two alternatives can be specified.
**Never-treated:** The never-treated control group is defined by $C_{g,e} \equiv \{ i : G_i = \infty \}$. To use this control group, specify `control_group = "never-treated"` in the `DiDge(...)` command.
**Future-treated:** The future-treated control group is defined by $C_{g,e} \equiv \{ i : G_i > \max\{g, g+e\} \text{ and } G_i < \infty\}$. To use this control group, specify `control_group = "future-treated"` in the `DiDge(...)` command.
**Base event time:** The base event time can be specified using the `base_event` argument in `DiDge(...)`, where `base_event = -1` by default.
### 3.2 DiD Estimation for a Single $(g,e)$ Combination
The `DiDge()` command performs the following sequence of steps:
**Step 1.** Define the $(g,e)$-specific sample of treated and control units, $S_{g,e} \equiv \{G_i=g\} \cup \{i \in C_{g,e}\}$. Drop any observations that do not satisfy $i \in S_{g,e}$.
**Step 2.** Construct the within-$i$ differences $\Delta Y_{i,g+e} \equiv Y_{i,g+e} - Y_{i,g+b}$ for each $i \in S_{g,e}$.
**Step 3.** Estimate the simple linear regression $\Delta Y_{i,g+e} = \alpha_{g,e} + \beta_{g,e} 1\{G_i =g\} + \epsilon_{i,g+e}$ by OLS for $i \in S_{g,e}$.
The OLS estimate of $\beta_{g,e}$ is equivalent to $\text{DiD}_{g,e}$. The standard error provided by OLS for $\beta_{g,e}$ is equivalent to the standard error from a two-sample test of equal means for the null hypothesis $$\mathbb{E}[\Delta Y_{i,g+e} | G_i = g] = \mathbb{E}[\Delta Y_{i,g+e} | i \in C_{g,e}] $$ which is equivalent to testing that $\text{ATT}_{g,e}=0$.
## 4. The `DiD(...)` Command
`DiDforBigData` uses the `DiD(...)` command to estimate $\text{DiD}_{g,e}$ for all available cohorts $g$ across a range of possible event times $e$; `DiD(...)` is documented [here](https://setzler.github.io/DiDforBigData/reference/DiD.html).
### 4.1 DiD Estimation for All Possible $(g,e)$ Combinations
`DiD(...)` uses the `control_group` and `base_event` arguments the same way as `DiDge(...)`.
`DiD(...)` also uses the `min_event` and `max_event` arguments to choose the minimum and maximum event times $e$ of interest. If these arguments are not specified, it assumes all possible event times are of interest.
In practice, `DiD(...)` completes the following steps:
**Step 1.** Determine all possible combinations of $(g,e)$ available in the data. The `min_event` and `max_event` arguments allow the user to restrict the minimum and maximum event times $e$ of interest.
**Step 2.** In parallel, for each $(g,e)$ combination, construct the corresponding control group $C_{g,e}$ the same way as `DiDge(...)`. Drop any $(g,e)$ combination for which the control group is empty.
**Step 3.** Within each $(g,e)$-specific process, define the $(g,e)$-specific sample of treated and control units, $S_{g,e} \equiv \{G_i=g\} \cup \{i \in C_{g,e}\}$. Drop any observations that do not satisfy $i \in S_{g,e}$.
**Step 4.** Within each $(g,e)$-specific process, construct the within-$i$ differences $\Delta Y_{i,g+e} \equiv Y_{i,g+e} - Y_{i,g+b}$ for each $i$ that remains in the sample.
**Step 5.** Within each $(g,e)$-specific process, estimate $\Delta Y_{i,g+e} = \alpha_{g,e} + \beta_{g,e} 1\{G_i =g\} + \epsilon_{i,g+e}$ by OLS.
The OLS estimate of $\beta_{g,e}$ is equivalent to $\text{DiD}_{g,e}$. The standard error provided by OLS for $\beta_{g,e}$ is equivalent to the standard error from a two-sample test of equal means for the null hypothesis $$\mathbb{E}[\Delta Y_{i,g+e} | G_i = g] = \mathbb{E}[\Delta Y_{i,g+e} | i \in C_{g,e}] $$ which is equivalent to testing that $\text{ATT}_{g,e}=0$. Note that $\text{ATT}_{g,e}=0$ is tested as a single hypothesis for each $(g,e)$ combination; no adjustment for multiple hypothesis testing is applied.
### 4.2 Estimate the Average DiD across Cohorts and Event Times
Aside from estimating each $\text{DiD}_{g,e}$, `DiD(...)` also estimates $\text{DiD}_{e}$ for each $e$ included in the event times of interest.
To do so, `DiD(...)` completes the following steps:
**Step 1.** At the end of the $(g,e)$-specific estimation in parallel described above, it returns the various $(g,e)$-specific samples of the form $S_{g,e} \equiv \{G_i=g\} \cup \{i \in C_{g,e}\}$.
**Step 2.** It defines an indicator for corresponding to cohort $g$, then stacks all of the samples $S_{g,e}$ that have the same $e$. Note that the same $i$ can appear multiple times due to membership in both $S_{g_1,e}$ and $S_{g_2,e}$, so the distinct observations are distinguished by the indicators for $g$.
**Step 3.** It estimates $\Delta Y_{i,g+e} = \sum_g \alpha_{g,e} + \sum_g \beta_{g,e} 1\{G_i =g\} + \epsilon_{i,g+e}$ by OLS for the stacked sample across $g$.
**Step 4.** It constructs $\text{DiD}_e = \sum_g \omega_{g,e} \beta_{g,e}$, where $\omega_{g,e} \equiv \frac{\sum_i 1\{G_i=g\}}{\sum_i 1\{G_i < \infty\}}$. Since each $\beta_{g,e}$ is an estimate of the corresponding $\text{ATT}_{g,e}$, it follows that $\text{DiD}_e$ is an estimate of the weighted average $\text{ATT}_{e} \equiv \sum_g \omega_{g,e} \text{ATT}_{g,e}$.
**Step 5.** To test the null hypothesis that $\text{ATT}_{e} = 0$, it defines $\bar\beta_e = (\beta_{g,e})_g$ and $\bar\omega_e = (\omega_{g,e})_g$. Note that $\text{DiD}_e = \bar\omega_e' \bar\beta_e$. To get the standard error, for $\text{DiD}_e$, it uses that $\text{Var}(\text{DiD}_e) = \bar\omega_e' \text{Var}(\bar\beta_e) \bar\omega_e$, where $\text{Var}(\bar\beta_e)$ is the usual (heteroskedasticity-robust) variance-covariance matrix of the OLS coefficients. Since the same unit $i$ appears on multiple rows of the sample, we must cluster on $i$ when estimating $\text{Var}(\bar\beta_e)$. Finally, the standard error corresponding to the null hypothesis of $\text{ATT}_{e} = 0$ is $\sqrt{\text{Var}(\text{DiD}_e)}$.
A similar approach is used to estimate $\text{DiD}_{E}$, the average $\text{DiD}_{e}$ across a set of event times $E$. It again uses that these average DiD parameters can be represented as a linear combination of OLS coefficients $\beta_{g,e}$ with appropriate weights to construct the standard error for $\text{ATT}_{E}$.