# Difference in Differences

## Study Case 1 - New Jersey vs Pennsylvania

In [1]:
#impoprt packages

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
import statsmodels.api as smf
%matplotlib inline



In [2]:
#import data

df = pd.read_csv('njmin3.csv')

df.head()

Unnamed: 0,NJ,POST_APRIL92,NJ_POST_APRIL92,fte,bk,kfc,roys,wendys,co_owned,centralj,southj,pa1,pa2,demp
0,1,0,0,15.0,1,0,0,0,0,1,0,0,0,12.0
1,1,0,0,15.0,1,0,0,0,0,1,0,0,0,6.5
2,1,0,0,24.0,0,0,1,0,0,1,0,0,0,-1.0
3,1,0,0,19.25,0,0,1,0,1,0,0,0,0,2.25
4,1,0,0,21.5,1,0,0,0,0,0,0,0,0,13.0


### Cleaning & Wrangling

In [3]:
# null values
df.isna().sum()

NJ                  0
POST_APRIL92        0
NJ_POST_APRIL92     0
fte                26
bk                  0
kfc                 0
roys                0
wendys              0
co_owned            0
centralj            0
southj              0
pa1                 0
pa2                 0
demp               52
dtype: int64

In [4]:
df.dtypes

NJ                   int64
POST_APRIL92         int64
NJ_POST_APRIL92      int64
fte                float64
bk                   int64
kfc                  int64
roys                 int64
wendys               int64
co_owned             int64
centralj             int64
southj               int64
pa1                  int64
pa2                  int64
demp               float64
dtype: object

In [5]:
missingvalues = SimpleImputer(missing_values = np.nan,
                              strategy = 'mean').fit(df[['fte', 'demp']])

In [6]:
df[['fte', 'demp']] = missingvalues.transform(df[['fte', 'demp']])

In [7]:
df.isna().sum()

NJ                 0
POST_APRIL92       0
NJ_POST_APRIL92    0
fte                0
bk                 0
kfc                0
roys               0
wendys             0
co_owned           0
centralj           0
southj             0
pa1                0
pa2                0
demp               0
dtype: int64

### Model

In [8]:
# isolate variables X and Y
X = df.iloc[:,0:3].values
y = df.iloc[:, 3].values

In [9]:
X = smf.add_constant(X)
model1 = smf.OLS(y,X).fit()
model1.summary(yname = 'FTE',
               xname = ('intercept',
                        'New Jersey',
                        'After April 1992',
                        'New Jersey and after April 1992')           
              )

0,1,2,3
Dep. Variable:,FTE,R-squared:,0.007
Model:,OLS,Adj. R-squared:,0.004
Method:,Least Squares,F-statistic:,1.974
Date:,"Sat, 04 Mar 2023",Prob (F-statistic):,0.116
Time:,08:19:57,Log-Likelihood:,-2986.2
No. Observations:,820,AIC:,5980.0
Df Residuals:,816,BIC:,5999.0
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,23.2728,1.041,22.349,0.000,21.229,25.317
New Jersey,-2.8157,1.159,-2.430,0.015,-5.091,-0.541
After April 1992,-2.1108,1.473,-1.433,0.152,-5.001,0.780
New Jersey and after April 1992,2.6810,1.639,1.636,0.102,-0.536,5.898

0,1,2,3
Omnibus:,232.659,Durbin-Watson:,1.847
Prob(Omnibus):,0.0,Jarque-Bera (JB):,908.337
Skew:,1.289,Prob(JB):,5.7200000000000005e-198
Kurtosis:,7.465,Cond. No.,11.4


***New Jersey and April 92*** coef show us that minimum wage increase has positive impact on employment (2.68)

In [18]:
###model 2 with more x variables

#isolate X and y
X = df.loc[:, ['NJ', 'POST_APRIL92', 'NJ_POST_APRIL92', 'bk', 'kfc','wendys']]

#regression
X = smf.add_constant(X)
model2 = smf.OLS(y,X).fit()
model2.summary(yname = 'FTE',
               xname = ('intercept',
                        'New Jersey',
                        'After April 1992',
                        'New Jersey and after April 1992',
                        'Burguer King',
                        'KFC',
                        'Wendys'))

0,1,2,3
Dep. Variable:,FTE,R-squared:,0.191
Model:,OLS,Adj. R-squared:,0.185
Method:,Least Squares,F-statistic:,31.95
Date:,"Sat, 04 Mar 2023",Prob (F-statistic):,1.3e-34
Time:,08:47:09,Log-Likelihood:,-2902.4
No. Observations:,820,AIC:,5819.0
Df Residuals:,813,BIC:,5852.0
Df Model:,6,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,23.4055,1.085,21.575,0.000,21.276,25.535
New Jersey,-2.2349,1.050,-2.129,0.034,-4.296,-0.174
After April 1992,-2.1108,1.332,-1.585,0.113,-4.725,0.504
New Jersey and after April 1992,2.6810,1.482,1.809,0.071,-0.229,5.591
Burguer King,2.1620,0.748,2.891,0.004,0.694,3.630
KFC,-8.4912,0.890,-9.540,0.000,-10.238,-6.744
Wendys,1.0496,0.970,1.082,0.280,-0.855,2.954

0,1,2,3
Omnibus:,300.626,Durbin-Watson:,1.965
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1848.909
Skew:,1.53,Prob(JB):,0.0
Kurtosis:,9.69,Cond. No.,12.0


***we do not include roys caus dummy variable trap. So, Roys became part of intercept***

With more variables wu gotten the same coef to NJ post April but with more significant statsict (p-value.07)

In [19]:
###model 3 - more variables

#isolate X and y
X = df.loc[:, ['NJ', 'POST_APRIL92', 'NJ_POST_APRIL92',
               'bk', 'kfc','wendys',
               'co_owned', 'centralj', 'southj']]

#regression
X = smf.add_constant(X)
model3 = smf.OLS(y,X).fit()
model3.summary(yname = 'FTE',
               xname = ('intercept',
                        'New Jersey',
                        'After April 1992',
                        'New Jersey and after April 1992',
                        'Burguer King',
                        'KFC',
                        'Wendys',
                        'Co-owned',
                        'Central J',
                        'South J'))

0,1,2,3
Dep. Variable:,FTE,R-squared:,0.217
Model:,OLS,Adj. R-squared:,0.208
Method:,Least Squares,F-statistic:,24.89
Date:,"Sat, 04 Mar 2023",Prob (F-statistic):,6.45e-38
Time:,08:50:05,Log-Likelihood:,-2889.1
No. Observations:,820,AIC:,5798.0
Df Residuals:,810,BIC:,5845.0
Df Model:,9,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
intercept,23.9321,1.184,20.204,0.000,21.607,26.257
New Jersey,-1.3009,1.078,-1.207,0.228,-3.416,0.815
After April 1992,-2.1108,1.313,-1.608,0.108,-4.688,0.466
New Jersey and after April 1992,2.6810,1.461,1.835,0.067,-0.187,5.549
Burguer King,1.6653,0.832,2.002,0.046,0.033,3.298
KFC,-8.2346,0.899,-9.161,0.000,-9.999,-6.470
Wendys,0.6218,1.017,0.612,0.541,-1.374,2.617
Co-owned,-0.7456,0.699,-1.066,0.287,-2.118,0.627
Central J,0.0030,0.867,0.003,0.997,-1.699,1.705

0,1,2,3
Omnibus:,309.762,Durbin-Watson:,2.047
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1987.511
Skew:,1.57,Prob(JB):,0.0
Kurtosis:,9.951,Cond. No.,12.6


We created different angles to check de impact. So, we created 3 different models and the NJ post april coeff was the same (quite similar) on 3 scenarios. 
We can conclude that minimum wage on employment is positive in that case.