# Impact of minimum wage increase on employment rate of fast food restaurants

- In April 1992, New Jersey rose the minimum wage from $4.25 to $5.05
- Just comparing before and after would not be accurate, as it would fall into ommitted variables bias
- In one of the most relevant DiD studies, the researchers compared New Jersy to Pensylvania. This would resolve the ommitted variable bias mentioned above, because both states share very similar characteristics
- Economy theory suggests that an increase in the minimum wage results in decreased unemployment
- The study was made on the fast food restaurant segments
- Will that hold to be true?

**How do we model? We need to define...**

- Which fast food chains belong to New Jersey and which belong to Pennsylvania?
    - We will use a dummy variable to flag whether a fast food chain belongs to NJ or PENN
- If the observation was recorded before or after April 1992
    - We will use a dummy variable to flag "After April 92"
- The wage impact on employment
    - We multiply the NJ variable by the "After April 92"

In [None]:
import numpy as np
import pandas as pd

In [None]:
# pull the data
dataset = pd.read_csv("datasets/njmin3.csv")

In [None]:
dataset.head()

Unnamed: 0,NJ,POST_APRIL92,NJ_POST_APRIL92,fte,bk,kfc,roys,wendys,co_owned,centralj,southj,pa1,pa2,demp
0,1,0,0,15.0,1,0,0,0,0,1,0,0,0,12.0
1,1,0,0,15.0,1,0,0,0,0,1,0,0,0,6.5
2,1,0,0,24.0,0,0,1,0,0,1,0,0,0,-1.0
3,1,0,0,19.25,0,0,1,0,1,0,0,0,0,2.25
4,1,0,0,21.5,1,0,0,0,0,0,0,0,0,13.0


- `NJ`: if the fast food restaurante is located at New Jersey (1) or Pensylvania (0)
- `POST_APRIL92`: if the observation was recorded after (1) or before (0) april 92
- `NJ_POST_APRIL92`: multiplication of `NJ` by `POST_APRIL92`
- `fte`: full time employment rate


Each line of the dataframe represents an observation of fte on a fast food restaurant.

In [None]:
dataset.shape

(820, 14)

In [None]:
dataset.describe()

Unnamed: 0,NJ,POST_APRIL92,NJ_POST_APRIL92,fte,bk,kfc,roys,wendys,co_owned,centralj,southj,pa1,pa2,demp
count,820.0,820.0,820.0,820.0,820.0,820.0,820.0,820.0,820.0,820.0,820.0,820.0,820.0,820.0
mean,0.807317,0.5,0.403659,21.026511,0.417073,0.195122,0.241463,0.146341,0.343902,0.153659,0.226829,0.087805,0.104878,-0.070443
std,0.394647,0.500305,0.49093,9.271972,0.493376,0.396536,0.428232,0.353664,0.475299,0.360841,0.419037,0.283184,0.306583,8.725511
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-41.5
25%,1.0,0.0,0.0,15.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-3.5
50%,1.0,0.5,0.0,20.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,1.0,1.0,1.0,25.5,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0
max,1.0,1.0,1.0,85.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,34.0


In [None]:
dataset.isnull().sum()

NJ                  0
POST_APRIL92        0
NJ_POST_APRIL92     0
fte                26
bk                  0
kfc                 0
roys                0
wendys              0
co_owned            0
centralj            0
southj              0
pa1                 0
pa2                 0
demp               52
dtype: int64

In [None]:
# replacing null values with averages
from sklearn.impute import SimpleImputer

missingvalues_imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
missingvalues_imputer.fit(dataset[['fte', 'demp']])
dataset[['fte', 'demp']] = missingvalues_imputer.transform(dataset[['fte', 'demp']])

# DiD with Aggregated Metrics

In [None]:
dataset.groupby(['NJ', 'POST_APRIL92'])['fte'].mean().reset_index()

Unnamed: 0,NJ,POST_APRIL92,fte
0,0,0,23.272823
1,0,1,21.162064
2,1,0,20.457145
3,1,1,21.027396


- (NJ fte after treatment) - (NJ fte before treatment) = 21.03 - 20.46 = 0.57
- (PENN fte after treatment) - (PENN fte before treatment) = 21.162064 - 23.272823 = - 2.11
- DiD = 0.57 - (-2.11) = 0.57 + 2.11 = 2.68
- DiD = 2.68

The full time employment (fte) rate on New Jersey have an **increase of 2.73 due to the minimum wage increase policy**.

In other words, increasing the minimum wage has a positive impact on employment rate for fast food restaurants on New Jersey.

# DiD with Linear Regression

Let NJ be represented by G and POST_APRIL92 represented by T. So the functional form of linear regression is:

$$fte(G,T) = \beta_0 + \beta_1 G + \beta_2 T + \beta_3 T G$$

$$DiD = [fte(1,1) - fte(1,0)] - [fte(0,1) - fte(0,0)]$$

$$DiD = [\beta_0 + \beta_1 + \beta_2 + \beta_3 - \beta_0 - \beta_1] - [\beta_0 + \beta_2 - \beta_0]$$

$$DiD = \beta_2 + \beta_3 - \beta_2 = \beta_3$$

$$DiD = \beta_3$$

In [None]:
X = dataset[['NJ', 'POST_APRIL92', 'NJ_POST_APRIL92']]
y = dataset['fte'].values

In [None]:
import statsmodels.api as sm
X = sm.add_constant(X)
model1 = sm.OLS(y, X).fit()

In [None]:
print(model1.summary(yname="FTE",
                     xname=("intercept", "New Jersey", "After April 1992", "New Jersey and after April 1992"),
                     title="Model 1: FTE ~ NJ + POST_APRIL92 + NJ_POST_APRIL92"))

              Model 1: FTE ~ NJ + POST_APRIL92 + NJ_POST_APRIL92              
Dep. Variable:                    FTE   R-squared:                       0.007
Model:                            OLS   Adj. R-squared:                  0.004
Method:                 Least Squares   F-statistic:                     1.974
Date:                Wed, 28 Dec 2022   Prob (F-statistic):              0.116
Time:                        20:11:03   Log-Likelihood:                -2986.2
No. Observations:                 820   AIC:                             5980.
Df Residuals:                     816   BIC:                             5999.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                                      coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------------------
intercept 

The coefficient of the variable `NJ_POST_APRIL92 = New Jersey and after April 1992` is 2.68, that is equal to the value founded by the aggregation method for DiD.

# References

- [Kaggle Notebook: Difference-in-Differences in Python](https://www.kaggle.com/code/harrywang/difference-in-differences-in-python/notebook)
- [Intuição do Diff-in-Diff no Blog Estatsite](https://estatsite.com.br/2017/02/03/diferencas-em-diferencas-diff-in-diff/)
- [Doug McKee: An intuitive introduction to Difference-in-Differences](https://www.youtube.com/watch?v=J7q2H8aB8bQ&t=340s)
- [Curso de Avaliação de Políticas Públicas do Prof. Felipe Nunes na UFMG](https://www.youtube.com/playlist?list=PL7Xpx-hrPv-GUMXGpajmfLqdCZf3VMN4u)
    - [Ementa e Material do curso](https://www.felipenunescp.com/avaliaccedilatildeob.html)
- [Mastering Econometrics with Joshua Angrist (MIT)](https://www.youtube.com/playlist?list=PL-uRhZ_p-BM5ovNRg-G6hDib27OCvcyW8)