# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [10]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

In [4]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [5]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [6]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit




## Answer 1: 
### An appropriate test here is a two-sample bootstrap hypothesis test for difference of means as we are dealing with two seperate groups of resumes, which are randomly assigned

## Answer 2: 
### The null hypothesis is that there is no difference between the two grounps in terms of call back rates. The alternate hypothesis is that there is indeed a statistically significatn difference between the two groups.


In [7]:
w = data[data.race=='w']
b = data[data.race=='b']

In [28]:
print('The number of call back for white sounding names', sum(w.call), 'from', len(w.call), 'resumes')
print('The sample proportion for white sounding names is', sum(w.call)/len(w.call))
p1 = sum(w.call)/len(w.call)

The number of call back for white sounding names 235.0 from 2435 resumes
The sample proportion for white sounding names is 0.09650924024640657


In [26]:
print('The number of call back for black sounding names', sum(b.call), 'from', len(b.call), 'resumes')
print('The sample proportion for white sounding names is', sum(b.call)/len(b.call))
p2 = sum(b.call)/len(b.call)

The number of call back for black sounding names 157.0 from 2435 resumes
The sample proportion for white sounding names is 0.06447638603696099


In [29]:
difference_sample_prop = p1-p2
print('The difference in the sample proportions is', difference_sample_prop)

The difference in the sample proportions is 0.032032854209445585


In [30]:
z = 1.96 # z-value for a 95% confidence interval with 1000+ samples

In [34]:
margin_error = np.sqrt(z*(p1*(1-p1)/len(w.call) + (p2*(1-p2)/len(b.call))))
print('The margin of error for a 95% CI is',margin_error)

The margin of error for a 95% CI is 0.010896718821347457


In [35]:
print('The 95% confidence interval is',difference_sample_prop-margin_error, 'to', difference_sample_prop+margin_error)

The 95% confidence interval is 0.02113613538809813 to 0.04292957303079304


In [39]:
p_total = sum(data.call)/len(data.call)
z_score= (difference_sample_prop/np.sqrt(p_total*(1-p_total)/len(w.call)+p_total*(1-p_total)/len(b.call)))
print('Z-Value is', z_score)

Z-Value is 4.108412152434346


In [43]:
p_value = stats.norm.sf(abs(z_score))*2
print('the P-Value is', p_value)

the P-Value is 3.983886837585077e-05


In [83]:
w_samples = np.random.binomial(len(w.call), p1, 10000)/len(w.call)
b_samples = np.random.binomial(len(b.call), p2, 10000)/len(b.call)

In [53]:
def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1) - np.mean(data_2)

    return diff

In [116]:
# difference of mean difference of proportions of call backs: empirical_diff_means
empirical_diff_means = diff_of_means(w_samples, b_samples)

# Initialize bootstrap replicates: bs_replicates
bs_replicates = np.empty(10000)

for i in range(10000):
    # Generate bootstrap sample
    bs_sample_w = np.random.choice([0, 1],size=(2435,), p=[1-p1, p1])
    bs_sample_b = np.random.choice([0, 1],size=(2435,), p=[1-p2, p2])
    
    # Compute replicate
    bs_replicates[i] = diff_of_means(bs_sample_w, bs_sample_b)

print(bs_replicates)

# Compute and print p-value: p
p = np.sum(bs_replicates <= 0) / len(bs_replicates)
print('p-value =', p)

[0.0386037  0.03162218 0.03080082 ... 0.03655031 0.0386037  0.03080082]
p-value = 0.0001


## Answer 4: 
### The data story above tells us that there is a statistically significant differnece between the two proportions of call-backs between the two applicant groups. Inthe bootstrap version we tested the chance that we could see a zero or negative difference between many itirations of this the proportions that we saw in the original data and we saw that there is an incredibly low likelyhood - almost zero - that the proportins that we saw were a result of chance.

## Answer 5: 
### The analysis shows that race can be an important factor in the call-back decisions, however it is difficult to say whether it is the MOST important factor. in order to try to answer that, we would need to keep all other elements of the application (which could indlude the companies that applicats applied to, the resume information, and more) the same. Similarly, multiple AB tests ina similar format could help answer the importance of race as compared to other criteria.