<h1>The Difference-In-Difference Estimator</h1>

We will consider two groups before and after a policy change, with the __treatment group__ being affected by the policy, and the __control group__ being unaffected by the policy. Before the policy change we observe the treatment group value $y = B$, and after the policy is implemented the treatment group value is $y = C$. We can isolate the effect of the treatment by using a control group that is not affected by
the policy change. Before the policy change, we observe the control group value $y = A$, and after the policy change, the control group value is $y = E$. In order to estimate the treatment effect using the four pieces of information contained in the points $A$, $B$, $C$, and $E$, we make the strong assumption that the two groups experience a __common trend__. In the figure below, the dashed line $BD$ represents what we imagine the treatment group growth would have been in the absence of the policy change. The growth described by the dashed line $BD$ is unobservable, and is obtained by assuming that the growth in the treatment group that is unrelated to the policy change is the same as the growth in the control group. The treatment effect d $CD$ is the difference between the treatment and control values of $y$ in the ‘‘after’’ period, after subtracting $DE$, which is what the difference between the two groups would have been in the absence of the policy. Using the common growth assumption, the difference $DE$ equals the initial difference $AB$. Using the four observable points $A$, $B$, $C$, and $E$ depicted in the figure below, estimation of the treatment effect is based on data averages for the two groups in the two periods.

$\hat{\delta} = ( \hat{C} - \hat{E} ) - ( \hat{B} - \hat{A} ) = ( \hat{y}_{Treatment,After} - \hat{y}_{Control,After}) - ( \hat{y}_{Treatment,Before} - \hat{y}_{Control,Before})$

$\bar{y}_{Control,Before} = \hat{A} = $  sample mean of $y$ for control group before policy implementation.  

$\bar{y}_{Treatment,Before} = \hat{B} = $  sample mean of $y$ for control group before policy implementation.  

$\bar{y}_{Control,After} = \hat{E} = $  sample mean of $y$ for control group before policy implementation.  

$\bar{y}_{Treatment,After} = \hat{C} = $  sample mean of $y$ for control group before policy implementation.



![title](./DID_image.png)

The estimator $\delta$ is called a __differences-in-differences (DID)__ estimator of the treatment effect.
The estimator $\delta$ can be conveniently calculated using a simple regression. Define $y_{it}$ to be the observed outcome for individual $i$ in period $t$. Let $AFTER_t$ be an indicator variable that equals one in the period after the policy change $(t = 2)$ and zero in the period before the policy change $(t = 1)$. Let $TREAT_i$ be a dummy variable that equals one if individual $i$ is in the treatment group and zero if the individual is in the control (non-treatment) group.
Consider the regression model:

$ y_{it} \, = \, \beta_1 \, + \, \beta_2 \, TREAT_i \, + \, \beta_3 \, AFTER_t \, + \, \delta \; (TREAT_i \times AFTER_t) \, + \, e_{it} $

The regression function is:

$
\begin{equation*}
 E(y_{it}) =
    \begin{cases}
        \beta_1                              & TREAT = 0 & AFTER = 0 & \text{ Control before = A}\\
        \beta_1 + \beta_2                    & TREAT = 1 & AFTER = 0 & \text{ Control before = B}\\
        \beta_1 + \beta_3                    & TREAT = 0 & AFTER = 1 & \text{ Control before = E}\\
        \beta_1 + \beta_2 + \beta_3 + \delta & TREAT = 1 & AFTER = 1 & \text{ Control before = C}
    \end{cases}
\end{equation*}
$

In [1]:
library(foreign)

In [2]:
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta")

In [3]:
mydata$after = ifelse(mydata$year >= 1994, 1, 0)

In [4]:
mydata$treat = ifelse(mydata$country == "E" |
mydata$country == "F" |
mydata$country == "G", 1, 0)

In [5]:
mydata$did = mydata$after * mydata$treat

In [6]:
didreg = lm(y ~ treat + after + did, data = mydata)

In [7]:
summary(didreg)


Call:
lm(formula = y ~ treat + after + did, data = mydata)

Residuals:
       Min         1Q     Median         3Q        Max 
-9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  3.581e+08  7.382e+08   0.485   0.6292  
treat        1.776e+09  1.128e+09   1.575   0.1200  
after        2.289e+09  9.530e+08   2.402   0.0191 *
did         -2.520e+09  1.456e+09  -1.731   0.0882 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.953e+09 on 66 degrees of freedom
Multiple R-squared:  0.08273,	Adjusted R-squared:  0.04104 
F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249
