# Matching Exercise

In this exercise, we’ll be evaluating how getting a college degree impacts earnings in the US using matching.

# Data Setup

In [10]:
import pandas as pd

df = pd.read_stata(
    "./data/cps_for_matching.dta"
)

df.head()

Unnamed: 0,index,annual_earnings,female,simplified_race,has_college,age,county,class94
0,151404,,1,3.0,1,30,0-WV,"Private, For Profit"
1,123453,,0,0.0,0,21,251-TX,"Private, For Profit"
2,187982,,0,0.0,0,40,5-MA,"Self-Employed, Unincorporated"
3,122356,,1,0.0,1,27,0-TN,"Private, Nonprofit"
4,210750,42900.0,1,0.0,0,52,0-IA,"Private, For Profit"


# Getting To Know Your Data
Before you start matching, it is important to examine your data to ensure that matching is feasible (you have some overlap the the features of people in the treated and untreated groups), and also that there is a reason to match: either you’re unsure about some of the functional forms at play, or your have some imbalance between the two groups.

## Exercise 1
Show the raw difference of annual_earnings between those with and without a college degree (has_college). Is the difference statistically significant?

In [11]:
cps[cps['annual_earnings'].isnull() == False].head()

Unnamed: 0,index,annual_earnings,female,simplified_race,has_college,age,county,class94
4,210750,42900.0,1,0.0,0,52,0-IA,"Private, For Profit"
5,14438,31200.0,0,2.0,0,34,0-NV,"Private, For Profit"
7,108334,20020.0,0,0.0,1,68,0-GA,"Private, For Profit"
8,249989,22859.2,0,0.0,0,46,0-VT,Government - Local
9,64886,73860.8,0,0.0,1,38,0-MT,"Private, For Profit"


In [15]:
from scipy.stats import ttest_ind

dfn = df[df['annual_earnings'].isnull() == False]

# Compare
college    =  dfn[dfn['has_college']==1 ]["annual_earnings"].mean()
nocollege =  dfn[dfn['has_college']==0]["annual_earnings"].mean()

t_test = ttest_ind(dfn[dfn['has_college']==1 ]["annual_earnings"],
                   dfn[dfn['has_college']==0]["annual_earnings"])

print("Annual earnings with college degree: {:.2f}".format(college))
print("Annual earnings without college degree: {:.2f}".format(nocollege))
print("Difference between genders : {:.2f}".format(college-nocollege))
print("P-value : {:.4f}".format(t_test[1]))

Annual earnings with college degree: 53024.16
Annual earnings without college degree: 38865.67
Difference between genders : 14158.50
P-value : 0.0000


**Answer:** The annual salary with a college degree is 53024 dollars, while the annual salary without a college degree is 38865. The difference is about 14158 dollars. According to the p-value of the t-test, the difference is significant. 