In [1]:
import pandas as pd
import numpy as np

- Consider the data for 8 fast food restaurants that were part of a study of the effect of raising the minimum wage in NJ. The treatment group is the 2 restaurants in NJ and the control group is a set of 6 restaurants in PA (where the minimum wage was not raised). The outcome Y obs i is the number of people employed (including part time employees) at the end of the year. There are two covariates – Xi1, the identify of the fast food chain (Burger King or Kentucy Fried Chicken) and Xi2, employment at the end of the year prior to the increase in the minimum wage.

In [7]:
# Create a list of lists
data = [["NJ", "BK", 22.5, 30], 
        ["NJ", "KFC", 14, 12.5], 
        ["PA", "KFC", 13.8, 17],
        ["PA", "BK", 26.5, 18.5],
        ["PA", "BK", 20, 19.5], 
        ["PA", "BK", 13.5, 21],
        ["PA", "BK", 32.5, 26.5], 
        ["PA", "KFC", 21, 23]]

# Create the dataframe
fast_food = pd.DataFrame(data, columns = ["State", "Rest_Chain", "Init_Empl", "Final_Empl"])

In [8]:
fast_food

Unnamed: 0,State,Rest_Chain,Init_Empl,Final_Empl
0,NJ,BK,22.5,30.0
1,NJ,KFC,14.0,12.5
2,PA,KFC,13.8,17.0
3,PA,BK,26.5,18.5
4,PA,BK,20.0,19.5
5,PA,BK,13.5,21.0
6,PA,BK,32.5,26.5
7,PA,KFC,21.0,23.0


In [9]:
# Split treatment(NJ) and control(PA) group
treatment, control = fast_food[:2], fast_food[2:]

In [10]:
treatment

Unnamed: 0,State,Rest_Chain,Init_Empl,Final_Empl
0,NJ,BK,22.5,30.0
1,NJ,KFC,14.0,12.5


In [11]:
control

Unnamed: 0,State,Rest_Chain,Init_Empl,Final_Empl
2,PA,KFC,13.8,17.0
3,PA,BK,26.5,18.5
4,PA,BK,20.0,19.5
5,PA,BK,13.5,21.0
6,PA,BK,32.5,26.5
7,PA,KFC,21.0,23.0


1. We want to use matching to estimate the effect of raising the minimum wage assuming that unconfoundedness holds. We will match a single control unit with each treatment unit (without replacement). Our distance measure is D(i, j) = 100 ∗ I(Xi1 ̸ = Xj1) + |Xi2 − Xj2| where the indicator I is 1 if the two units are different chains and 0 if they are the same chain. Identify the matches for the 2 treatment units.

In [24]:
# Define a function to calculate distance
def distance_calc(treatment, control):
    dist = 0
    if treatment["Rest_Chain"] == control["Rest_Chain"]:
        dist = abs(treatment["Init_Empl"] - control["Init_Empl"])
    else: 
        dist = 100 + abs(treatment["Init_Empl"] - control["Init_Empl"])
    return round(dist, 1)

In [40]:
# Loop through treatment and control group
for t_store_index in range(0, 2):
    min_dist = 1000
    min_store_index = 0
    for c_store_index in range(0, 6):
        dist = distance_calc(treatment.iloc[t_store_index], control.iloc[c_store_index])
        if dist < min_dist:
            min_dist = dist
            min_store_index = c_store_index + 1
        print(f"Distance Between Treatment {t_store_index + 1} & Control {c_store_index + 1} is ", dist)
    print(f"The closest control store is {min_store_index}. The distance is {min_dist}")
    print("-----------------")
print("The selected store is the counterfactual observations of the treatment store.")

Distance Between Treatment 1 & Control 1 is  108.7
Distance Between Treatment 1 & Control 2 is  4.0
Distance Between Treatment 1 & Control 3 is  2.5
Distance Between Treatment 1 & Control 4 is  9.0
Distance Between Treatment 1 & Control 5 is  10.0
Distance Between Treatment 1 & Control 6 is  101.5
The closest control store is 3. The distance is 2.5
-----------------
Distance Between Treatment 2 & Control 1 is  0.2
Distance Between Treatment 2 & Control 2 is  112.5
Distance Between Treatment 2 & Control 3 is  106.0
Distance Between Treatment 2 & Control 4 is  100.5
Distance Between Treatment 2 & Control 5 is  118.5
Distance Between Treatment 2 & Control 6 is  7.0
The closest control store is 1. The distance is 0.2
-----------------
The selected store is the counterfactual observations of the treatment store


2. What is the estimate of average treatment effect on treated (ATT)?

In [44]:
ATT = np.mean((treatment.iloc[0]["Final_Empl"] - control.iloc[2]["Final_Empl"]) + 
              (treatment.iloc[1]["Final_Empl"] - control.iloc[0]["Final_Empl"]))
print(f'The ATT with matching is {ATT}.')

The Average Treatment Effect of the Treated with matching is 6.0.


3. What is the estimate of average treatment effect (ATE)?

In [46]:
ATE = round(np.mean(treatment["Final_Empl"] - treatment["Init_Empl"]) - 
            np.mean(control["Final_Empl"] - control["Init_Empl"]), 1)

print(f'The ATE with matching is {ATE}.')

The Average Treatment Effect with matching is 3.3.
