In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import statsmodels.formula.api as smf
from numpy import random

from plotnine import *
%matplotlib inline

# Differences in differences

We have thought of events as treatments, and we have usually applied it to samples all at the same time. But what happens if we consider the specific effect of a policy that is established at one point in time and then it remains as such.

Consider for example a change in tax law that increases rates for the wealthiest people in a country. We would be very interested to see if that actually increases the tax revenue of the country significantly. Furthermore, we would like this to see if this effect can be persistent. 

But, in this case we only have one unit ( a single country) treated. Perhaps, we could think of looking at how the trend of tax revenue looks like before and after the policy has happened and use that difference as our treatment effect. However, we find that this might not be the effect we are looking for. Why?

In general we would be looking to see if the trends change, but trends can change for more reasons than simply the treatment. In our case we can have that for example an industry boom in a very profitable technology field creates lots of new wealthy individuals, thus affecting the trend. This would make it difficult to examine the effect of the treatment.

What can we do?

Well, maybe we can grab another country that is very similar to the one that gets the policy, but that it did not get it. The problem is that unlike experiments, we cannot generate comparability by having a large enough number of units that could give us the way to make the comparison in distributions. But, maybe if they are comparable enough, we can do something else.

One of the things we have become good at doing is generating predictions. We can actually generate a "what if" had the treatment had never happened to the treated unit. We call this a counterfactual. If we find a control unit that has a parallel trend to the treated unit, we can use the period prior to the treatment to extrapolate the counterfactual in the post treatment period and use the difference from the actual observations on the treated unit as our treatment effect.


Let's show a plot to make it less confusing:

![Parallel](https://miro.medium.com/max/606/1*5mHmHpDaqYoWn5BqQ0a77w.png)


So using the fits in the pretreatment stage of paralell units where one has not been treated, we can use the fit and further time observarions as our counterfactual. Then the difference with the post treatment outcomes will give us our effect. 

Using our potential outcomes notation to define our treatment effect:


* $Y^1_{Pre}$ is the treated unit before the treatment

* $Y^0_{Pre}$ is the control unit before the treatment

* $Y^1_{Post}$ is the treated unit after the treatment

* $Y^0_{Post}$ is the control unit after the treatment


So we get that:

$$\hat{\delta}=mean(Y^1_{Post}-Y^0_{Post})-mean(Y^1_{Pre}-Y^0_{Pre})$$

Hence the name! 
Recall that:

$$T=\begin{cases}
        1, \mbox{if Treated}\\ 
        0, \mbox{Otherwise} \\
        \end{cases}$$
        
        
And now let's define:

$$A=\begin{cases}
        1, \mbox{if in post-treatment time}\\ 
        0, \mbox{if in pre-treatment time} \\
        \end{cases}$$


So now we can estimate $\delta$ by using the following regression for individuals $i$ at time $t$:

$$Y_{it}=\alpha+\gamma T+\lambda A+\delta (TA)+\varepsilon_{it}$$

Wait, how?

This can be seen easily when we note that while setting the appropriate values to 1 or zero we get:

* $Y^0_{Pre}= \alpha$

* $Y^0_{Post}= \alpha+ \lambda$

* $Y^1_{Pre}= \alpha+ \gamma$

* $Y^1_{Post}= \alpha+ \gamma+\lambda+\delta$


Now note:

$$Y^1_{Pre}-Y^0_{Pre}=\alpha+\gamma-\alpha= \gamma$$

And:

$$Y^1_{Post}-Y^0_{Post}=\alpha+ \gamma+\lambda+\delta-(\alpha+\lambda)= \gamma+\delta$$

Thus:

$$(Y^1_{Post}-Y^0_{Post})-(Y^1_{Pre}-Y^0_{Pre})=\gamma+\delta-\gamma=\delta$$




## Minimum wage, the Nobel Prize and estimation


One of the papers that got David Card his part of the Nobel in Economics in 2021 tried to determine the effects of minimum wage on employment. For this him and Alan Krueger used data from fast food employees in New Jersey and Pennsylvania. Why? Let's see the story:

* In November 1992 the minumum wage in NJ went from 4.25 dollars an hour to 5.05

* Pennsylvania is neighboring state similar in many aspects but there was no change to happen in minimum wage

* Using multiple restaurants they decided to take a survey before the change in policy in february, and repeat it in November.


The plot they got would look something like this in the best of cases (Taken from Causal Inference: The Mixtape by Scott Cunningham):

![Fig1](https://mixtape.scunning.com/causal_inference_mixtape_files/figure-html/dd-diagram-1.png)


But you might note there is a problem... the trends are not parallel...

Why is that a problem... well. Let's observe another plot from Scott's book to see what is going on:

![Fig2](https://mixtape.scunning.com/causal_inference_mixtape_files/figure-html/dd-diagram2-1.png)

$\delta_{ATT}$ is the true parameter we want to estimate. However, the linear regression is giving us the one with the light gray line. Hence why we would like to have the parallel trends stand.

An important lesson for us:

* We need more than one period pre and post control

* We need to check the parallel trend assumption

For now we can use the data for the paper and replicate the regression anyway. 


In [2]:
#Reading data

df = pd.read_csv('njmin3.csv')

In [3]:
df.head()

Unnamed: 0,CO_OWNED,SOUTHJ,CENTRALJ,PA1,PA2,DEMP,nj,bk,kfc,roys,wendys,d,d_nj,fte
0,0,0,1,0,0,12.0,1,1,0,0,0,0,0,15.0
1,0,0,1,0,0,6.5,1,1,0,0,0,0,0,15.0
2,0,0,1,0,0,-1.0,1,0,0,1,0,0,0,24.0
3,1,0,0,0,0,2.25,1,0,0,1,0,0,0,19.25
4,0,0,0,0,0,13.0,1,1,0,0,0,0,0,21.5


Let's examine what we have for variables:


* nj=1 if New Jersey
* d=1 if after minimum wage increase
* d_nj=1 nj and d interaction
* fte  full time equivalent employees
* bk=1 if Burger King
* kfc=1 if KFC
* roys=1 if Roy Rodgers
* wendys=1 if Wendy's
* co_owned=1 if company owned
* centralj=1 if Central NJ
* southj=1 if Southern NJ
* pa1=1 if in PA, northeast suburbs of Philadelphia
* pa2=1 if PA, Easton, etc

Let's check with our regression:

In [4]:
reg1 = smf.ols('fte ~ nj+d+d_nj', df).fit()

reg1.summary()

0,1,2,3
Dep. Variable:,fte,R-squared:,0.007
Model:,OLS,Adj. R-squared:,0.004
Method:,Least Squares,F-statistic:,1.964
Date:,"Thu, 18 Nov 2021",Prob (F-statistic):,0.118
Time:,16:11:07,Log-Likelihood:,-2904.2
No. Observations:,794,AIC:,5816.0
Df Residuals:,790,BIC:,5835.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,23.3312,1.072,21.767,0.000,21.227,25.435
nj,-2.8918,1.194,-2.423,0.016,-5.235,-0.549
d,-2.1656,1.516,-1.429,0.154,-5.141,0.810
d_nj,2.7536,1.688,1.631,0.103,-0.561,6.068

0,1,2,3
Omnibus:,218.742,Durbin-Watson:,1.842
Prob(Omnibus):,0.0,Jarque-Bera (JB):,804.488
Skew:,1.268,Prob(JB):,2.03e-175
Kurtosis:,7.229,Cond. No.,11.3


This tells us that the effect on employment is 2.7536 but that it is insignificant.

If assumptions are well met we would be able to say that there is no significant effect of the minimum wage increase on employment!