# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [20]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats


In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [5]:
data.shape

(4870, 65)

In [11]:
data1 = data[['race','call']]

#### Q1. The CLT does not apply here. It is a Bernouli process. We can use t-test for this problem.

In [12]:
data1.race.value_counts()

w    2435
b    2435
Name: race, dtype: int64

In [16]:
data1[data1.race == 'b'].call.value_counts()

0.0    2278
1.0     157
Name: call, dtype: int64

In [26]:
w = data1[data1.race == 'w']
w = w.call

In [27]:
b = data1[data1.race == 'b']
b = b.call

#### Q2. Null hypothesis -> race impacts call back rates 
####       Alternative hypothesis -> race has no impact on call back rates

#### Q3. P value and CI

In [37]:
np.mean(w)

0.09650924056768417

In [38]:
stats.norm.interval(0.95, np.mean(w), np.std(w)) #confidence intervals for white calls

(-0.4822384644939024, 0.67525694562927074)

In [39]:
np.mean(b)

0.0644763857126236

In [40]:
stats.norm.interval(0.95, np.mean(w), np.std(w)) #confidence intervals for black calls

(-0.4822384644939024, 0.67525694562927074)

In [42]:
pvalue = stats.ttest_ind(w,b) #p-value
pvalue

Ttest_indResult(statistic=4.1147052908617514, pvalue=3.9408021031288859e-05)

#### Q4. The p value of 0.039 from the t-test is less than 0.05 at 95% confidence interval. Hence, the null hypothesis is rejected and it can be established that race has no impact on the rate of call backs for the resume.

#### Q5. Since it has been shown that race has no impact on the resume call back rates, it is not an important factor to consider here. Other factors such as experience, education, etc. must be evaluated to see if any of these important criteria determine the rate of callbacks.