# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [4]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [5]:
# number of callbacks for black-sounding names
print(sum(data[data.race=='b'].call))

# number of callbacks for white-sounding names
print(sum(data[data.race=='w'].call))

# Data description
data.info()
data.head()
data.describe()


157.0
235.0
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4870 entries, 0 to 4869
Data columns (total 65 columns):
id                    4870 non-null object
ad                    4870 non-null object
education             4870 non-null int8
ofjobs                4870 non-null int8
yearsexp              4870 non-null int8
honors                4870 non-null int8
volunteer             4870 non-null int8
military              4870 non-null int8
empholes              4870 non-null int8
occupspecific         4870 non-null int16
occupbroad            4870 non-null int8
workinschool          4870 non-null int8
email                 4870 non-null int8
computerskills        4870 non-null int8
specialskills         4870 non-null int8
firstname             4870 non-null object
sex                   4870 non-null object
race                  4870 non-null object
h                     4870 non-null float32
l                     4870 non-null float32
call                  4870 non-null float32


Unnamed: 0,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,occupbroad,workinschool,...,educreq,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind
count,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,...,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0,4870.0
mean,3.61848,3.661396,7.842916,0.052772,0.411499,0.097125,0.448049,215.637782,3.48152,0.559548,...,0.106776,0.437166,0.07269,0.082957,0.03039,0.08501,0.213963,0.267762,0.154825,0.165092
std,0.714997,1.219126,5.044612,0.223601,0.492156,0.296159,0.497345,148.127551,2.038036,0.496492,...,0.308866,0.496083,0.259649,0.275854,0.171677,0.278932,0.410141,0.442847,0.361773,0.371308
min,0.0,1.0,1.0,0.0,0.0,0.0,0.0,7.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,3.0,5.0,0.0,0.0,0.0,0.0,27.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4.0,4.0,6.0,0.0,0.0,0.0,0.0,267.0,4.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,4.0,4.0,9.0,0.0,1.0,0.0,1.0,313.0,6.0,1.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
max,4.0,7.0,44.0,1.0,1.0,1.0,1.0,903.0,6.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [6]:
# 1. What test is appropriate for this problem? Does CLT apply?

# We can use Inference Through Proportions test in order to solve this problem. Since there are 4870 resumes, the 
# central limit theorem applies and states that the difference between the two proportions is approximately 
# a normal distribution.

# Call back count by race
df = data.loc[:,["race","call"]]
callback_by_race = df.pivot_table(index="race", columns="call", aggfunc=len, margins=True)
print(callback_by_race)

# We can summarize our metrics from 1. and 2. as follows:
n_b = len(data[data.race=='b'])
n_w = len(data[data.race=='w'])
c_b = sum(data[data.race=='b'].call)
c_w = sum(data[data.race=='w'].call)
p_b = c_b / n_b
p_w = c_w / n_w

call     0.0    1.0     All
race                       
b     2278.0  157.0  2435.0
w     2200.0  235.0  2435.0
All   4478.0  392.0  4870.0


In [12]:
# 2. What are the null and alternate hypotheses?

# Let p_b = proportion of callback rates for black sounding names
# Let p_w = proportion of callback rates for white sounding names

# Thus, our hypothesis is as follows:
# HO: p_w - p_b = 0 
# HA: p_w - p_b > 0

# Under null hypothesis,
pooled_proportion = 392/4870

#In order for normal model to apply, the following must be true :
print( pooled_proportion * n_w > 10)
print( (1-pooled_proportion) * n_w > 10)
print( pooled_proportion * n_b > 10)
print( (1-pooled_proportion) * n_b > 10)
# Thus, success-failure condition is satisfied.

print('Callback rates for blacks & whites are: {:1.3f}, {:1.3f}'.format(p_b,p_w))

# Point estimate for difference in call back rate:
w_b_diff = p_w - p_b
print("Point esimate:",w_b_diff)

True
True
True
True
Callback rates for blacks & whites are: 0.064, 0.097
Point esimate: 0.0320328542094


In [19]:
# 3. Compute margin of error, confidence interval, and p-value.
import scipy.special


# Compute the standard error for the difference between callback rates
se = np.sqrt((p_b*(1-p_b)/n_b)+(p_w*(1-p_w)/n_w))
print('Standard error for the difference between callback rates: {:1.4f}'.format(se))

# Compute margin of error and CIs:
print("Confidence Intervals for difference in callback rates for white and black sounding names:")
# Find 95% confidence intervals
margin = 1.96*se
ci_95 = [w_b_diff - margin, w_b_diff + margin]
print("95% confidence interval: ({:2.3f}, {:2.3f})".format(ci_95[0],ci_95[1]))
# 99% confidence interval 
margin = 2.58*se
ci_99 = [w_b_diff - margin, w_b_diff + margin]
print("99% confidence interval: ({:2.3f}, {:2.3f})".format(ci_99[0],ci_99[1]))

# P-value of one-tail z test:
z_stat = (w_b_diff - 0) / se
p_value = scipy.special.ndtr(-z_stat)
print('The z score is: {:1.5f}'.format(z_stat))
print('The p-value for the test of equality of callback rates is: {:1.5f}'.format(p_value))




Standard error for the difference between callback rates: 0.0078
Confidence Intervals for difference in callback rates for white and black sounding names:
95% confidence interval: (0.017, 0.047)
99% confidence interval: (0.012, 0.052)
The z score is: 4.11555
The p-value for the test of equality of callback rates is: 0.00002


In [20]:
# 4. Write a story describing the statistical significance in the context or the original problem.

# We observe that the difference in call back rates (= 3.2%) is statistically significant. We can reject the null
# hypothesis.
# Moreover, the 99% confience interval suggests that the true callback rate difference could range 
#from 1.2% points to 5.2 percentage points. Racism remains a major challenge in the labor market.



In [None]:
# 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

# The analysis shows that racial features of names play a pivotal role in 
#callback rates for interviews. But the data contains other interesting factors such as education, years of experience 
#and computer skills, that can affect the callback success. 
# A logistic regression is needed to examine which factors are significant predictors and their effect sizes.