# Hypothesis Testing of racial and gender discrimination in US job market

### Background
This is a hypothesis testing analysis to verify the claim that there exists a discrimination based on race and/or gender in the US job market. 

### Data
Each row in the data represents a resume. The 'race' column has two unique values, 'b' and 'w' indicating the race of the candidate. Similarly, the 'gender' column has two unique values, 'f' and 'm' indicating the gender. Finally, the 'call' column consists of 1s and 0s indicating whether the candidate received a call (1) or not (0) from the employers.

### Importing libraries and data

In [2]:
import pandas as pd
import numpy as np
import scipy as sp
from scipy import stats

In [3]:
df = pd.io.stata.read_stata('./lakisha_aer.dta')
df.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [4]:
df.columns

Index(['id', 'ad', 'education', 'ofjobs', 'yearsexp', 'honors', 'volunteer',
       'military', 'empholes', 'occupspecific', 'occupbroad', 'workinschool',
       'email', 'computerskills', 'specialskills', 'firstname', 'sex', 'race',
       'h', 'l', 'call', 'city', 'kind', 'adid', 'fracblack', 'fracwhite',
       'lmedhhinc', 'fracdropout', 'fraccolp', 'linc', 'col', 'expminreq',
       'schoolreq', 'eoe', 'parent_sales', 'parent_emp', 'branch_sales',
       'branch_emp', 'fed', 'fracblack_empzip', 'fracwhite_empzip',
       'lmedhhinc_empzip', 'fracdropout_empzip', 'fraccolp_empzip',
       'linc_empzip', 'manager', 'supervisor', 'secretary', 'offsupport',
       'salesrep', 'retailsales', 'req', 'expreq', 'comreq', 'educreq',
       'compreq', 'orgreq', 'manuf', 'transcom', 'bankreal', 'trade',
       'busservice', 'othservice', 'missind', 'ownership'],
      dtype='object')

In [5]:
df.race.unique()

array(['w', 'b'], dtype=object)

In [6]:
# number of calls for blacks
sum(df[df.race=='b'].call)

157.0

In [7]:
#number of calls for whites
sum(df[df.race=='w'].call)

235.0

In [8]:
df.shape

(4870, 65)

In [9]:
df_white = df[df.race=='w']
df_black = df[df.race=='b']

In [10]:
df_white.call.mean(), df_black.call.mean()

(0.09650924, 0.064476386)

In [11]:
df_white.call.var(), df_black.call.var()

(0.08723103, 0.060343966)

In [12]:
delta_mean = df_white.call.mean() - df_black.call.mean()
delta_mean

0.032032855

In [13]:
len(df_white.call), len(df_black.call)

(2435, 2435)

### Which test is applicable for this data?
For this data, the population variance is unknown and sample size > 30. Hence, t-test is recommended for this problem. 

This is a binomial distribution as each resume has only two outcomes, accepted or rejected. For very large sample size, a binomial distribution can be approximated as a normal distribution. Since this dataset has a fairly large sample size (4870) it can be considered as a normal distribution and the Central Limit Theorem is applicable.

By default, the scipy ttest_ind function assumes equal variances for both samples. In practice if the ratio of variances is less than 4:1, it can be considered as equal variances. For this data, 0.09:0.06 is less than 4:1, hence, equal variances assumption is valid.

### Null and Alternate hypotheses
- Null hypothesis $H_o$: White and black resumes get the same responses i.e. $\mathbf{\bar{x}-\bar{y} = 0}$
- Alternate hypothesis $H_a$: White and black resumes do not get same responses i.e. $\mathbf{\bar{x}-\bar{y} \ne 0}$

### Confidence interval, margin of error and p-value
**Degree of freedom** (df) = $(n_1-1)+(n_2-1)$

Here, df = 4868

For df > 30, t-stat is the same as z-stat. Hence t-critical for 0.05 significance level is:

$t_{crit} = 1.962$

**Margin of error** (moe) = $t_{crit} \times \sigma$

Here, we calculate the t-value standard error as $\sigma = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$
where $s_i^2$ is the sample variance and $n_i$ is the sample size.



In [14]:
std_error = np.sqrt(df_white.call.var()/len(df_white.call) + df_black.call.var()/len(df_black.call))
std_error

0.007784969418393478

In [15]:
t_crit = 1.962
margin_of_error = t_crit*std_error
margin_of_error

0.015274109998888003

In [31]:
confidence_interval = (delta_mean-margin_of_error, delta_mean+margin_of_error)
confidence_interval

(0.01436296157379523, 0.04970274813632593)

In [17]:
stats.ttest_ind(df_white.call, df_black.call)

Ttest_indResult(statistic=4.114705290861751, pvalue=3.940802103128885e-05)

The difference is mean of the sample data is 0.032 and the confidence interval shows that the true difference in means is between 0.014 and 0.05. So 95% of the time, the true difference will be different from 0. The **p-value** of 3.94e-05 is much less than the significance level of 0.05.

### Conclusion

Assuming the null hypothesis is true (white and black resumes get same responses), there is a 5% chance that the difference between two means (white calls and black calls) will be greater than 0.015. However, the observed difference is 0.032. Therefore, the chance of observing the given difference if the null hypothesis is true, is statistically insignificant. Hence, the null hypothesis can be neglected. 

In [18]:
df.sex.unique()

array(['f', 'm'], dtype=object)

In [19]:
df_male = df[df.sex=='m']
df_female = df[df.sex=='f']

In [20]:
sum(df_male.call), sum(df_female.call)

(83.0, 309.0)

In [21]:
len(df_male.call), len(df_female.call)

(1124, 3746)

In [22]:
df_male.call.mean(), df_female.call.mean()

(0.07384342, 0.082487985)

In [32]:
delta_mean = df_female.call.mean() - df_male.call.mean()
delta_mean

0.008644566

In [23]:
df_male.call.var(), df_female.call.var()

(0.068451464, 0.07570393)

Here the ratio of sample variances 0.08:0.06 is much less than 4:1, hence, the assumption of equal variances is valid.

### Null and Alternate hypotheses
- Null hypothesis $H_o$: Male and female resumes get the same responses i.e. $\mathbf{\bar{x}-\bar{y} = 0}$
- Alternate hypothesis $H_a$: Male and female resumes do not get same responses i.e. $\mathbf{\bar{x}-\bar{y} \ne 0}$

In [24]:
std_error = np.sqrt(df_male.call.var()/len(df_male.call) + df_female.call.var()/len(df_female.call))
std_error

0.009006061815119953

In [25]:
t_crit = 1.962
margin_of_error = t_crit*std_error
margin_of_error

0.017669893281265347

In [33]:
confidence_interval = (delta_mean-margin_of_error, delta_mean+margin_of_error)
confidence_interval

(-0.009025327341362088, 0.026314459221168606)

In [26]:
stats.ttest_ind(df_male.call, df_female.call)

Ttest_indResult(statistic=-0.9341989341332145, pvalue=0.3502476207298205)

The difference is mean of the sample data is 0.009 and the confidence interval shows that the true difference in means is between -0.009 and 0.03. Hence the observed difference lies within the 95% confidence interval. The **p-value** of 0.35 is much greater than the significance level of 0.05.

### Conclusion

Assuming null hypothesis is true (male and female resumes get same responses), 95% of the time the difference in mean will be less than 0.017 and the observed difference is 0.008. Furthermore, there is more than 5% chance of observing the given difference if the null hypothesis is true. This is statistically significant and hence the null hypothesis cannot be rejected.