# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [4]:
data = pd.io.stata.read_stata('us_job_market_discrimination.dta')

In [5]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [6]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


## 1.What test is appropriate for this problem? Does CLT apply?

### Let us split the data based on Race.

In [8]:
race_data = data[['race','call']]
race_data_white = race_data[race_data.race=='w']
race_data_black = race_data[race_data.race!='w']

len(race_data_black)

2435

In [9]:
len(race_data_white)

2435

### These are clearly binomial distributions. 

## 2. Let $\hat{p_w}$ be the sample probability of sucess (being called back) for a white person and let $\hat{p_b}$ be the corresponding measure for a black person.

In [18]:
nB=len(data[data.race=='b'])
nW=len(data[data.race=='w'])

In [19]:
callBackRatioBlack=sum(data[data.race=='b'].call)/nB
callBackRatioWhite=sum(data[data.race=='w'].call)/nW
print ('callBackRatioBlack: ', callBackRatioBlack)
print ('callBackRatioWhite: ', callBackRatioWhite)

('callBackRatioBlack: ', 0.064476386036960986)
('callBackRatioWhite: ', 0.096509240246406572)


### Hypothesis testing
### H0: callBackRatioBlack = callBackRatioWhite, H1: callBackRatioBlack < callBackRatioWhite

In [25]:
prop=sum(data.call)/len(data)

SE=((prop*(1-prop)/nB)+(prop*(1-prop)/nW))**0.5
zScore=(callBackRatioWhite-callBackRatioBlack)/SE
p_value = stats.norm.sf(abs(zScore))# using one-sided test
p_value

1.9919434187925383e-05

We can reject H0. Resumes with black-sounding names generate less interest among recruiters.

## 3.Compute margin of error, confidence interval, and p-value

### Confidence Intervals

In [28]:
critical_value=stats.norm.ppf(1-0.05/2)# two sided
SE_c = ( (callBackRatioWhite*(1-callBackRatioWhite) / nW) + (callBackRatioBlack*(1-callBackRatioBlack) / nB) ) ** 0.5
Margin_of_Error=critical_value*SE_c
Margin_of_Error

0.015255126028214831

In [29]:
Confidence_Interval = ((callBackRatioWhite-callBackRatioBlack) - ME, (callBackRatioWhite-callBackRatioBlack) + ME)
Confidence_Interval

(0.016777728181230755, 0.047287980237660412)

### At confidence level of 95%, callback ration is 1.68 to 4.73 percent higher for resumes with white-sounding names, compared to resumes with black-sounding names.

## 4. Write a story describing the statistical significance in the context or the original problem.

### There is a 95% chance that if a random sample is chosen from job candidates of the USA then the ratio of the folks with White Names in the sample getting a call will be greater than the folks with Black Names. 

### We can state with a reasonable degree of confidence (95% to be precise) and a 100% sense of disappointment that a candidate with a White Names is 1.68% to 4.73% more likely to be called as compared to a candidate with a Black Names.



## 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

### No. Although race does seem to matter but were to consider other factors such as years of experience, education etc. It could be that Black individuals may be having less years of experience overall or less years of education compared to their White counterparts which may be the reason for getting lesser number of calls. Hence infering a racial bias only by considering one variable ('race') would be incorrect. The analysis could be ammended by taking into account other variables available in the data.