# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [5]:
#extract the entries associated with black-sounding names
w = data[data.race=='w']
#etract the entries associated with white-sounding names
b = data[data.race=='b']

# number of resumes with black-sounding names
print(len(b))
# number of callbacks for black-sounding names
print(sum(b.call))
# number of no callbacks for black-sounding names
print(len(b)-sum(b.call))

# number of resumes with white-sounding names
print(len(w))
# number of callbacks for white-sounding names
print(sum(w.call))
# number of no callbacks for white-sounding names
print(len(w)-sum(w.call))

2435
157.0
2278.0
2435
235.0
2200.0


## Q1. What test is appropriate for this problem? Does CLT apply?
<p>To understand whether the rate of callbacks depends on the race, we should compare the rate of callbacks in résumés with black sounding names *vs.* the rate of callbacks in résumés with white sounding names. Therefore, a **two sample test** is appropriate in this case.</p>

<p>Check conditions for CLT:</p>
<p>1) random:</p>
<p>As mentioned in the background, the résumés are randomly assigned to black-sounding or white-sounding names for the experience. Therefore, this is a random sampling with no bias.</p>
<p>2) normal:</p>
<p>To ensure the sample distribution is approximately normal, the sample size should be large enough so that np >= 10 and n(1-p) >= 10. In this case, as shown above for both black-sounding and white-sounding names, the conditions of np >=10 and n(1-p) >= 10 are met.</p>
<p>3) Indepedent: </p>
<p>The researchers in this study used a randomized field experiment to evaluate the level of racial discrimination in the labor market, and nearly 5,000 resumes were collected in response to over 1,300 newspaper ads for sales, administrative, and clerical jobs in Boston and Chicago. Therefore, the population in this case is all the people looking for sales, administrative and clerical jobs in Boston and Chicago, which should be larger than 10 times 5000. Thus, the sample size is less than 10% of the population, and the observations in the sample are independent.</p>

<p>As discussed, the three conditions of CLT are met. Therefore, **CLT applies in this case.**</p>

## Q2. What are the null and alternate hypotheses?
<p>Null hypothesis: race doesn't have a significant impact on the rate of callbacks for resumes.</p>
<p>Alternative hypothesis: race has a significant impact on the rate of callbacks for resumes.</p>

## Q3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

### 1) Bootstrap approach
<p>Test statistics: sample proportion (ratio)</p>
<p>Method: two sample test with permutation</p>
<p>Significance level: $\alpha$=0.05</p>

In [6]:
#compute the ratio of callbacks in black-sounding names
r_b = sum(b.call)/len(b)
#compute the ratio of callbacks in white-sounding names
r_w = sum(w.call)/len(w)
#compute the difference in their ratios
r_diff = r_b - r_w

#concatenate the 'call' columns from the two race groups together assuming the null hypothesis is true and there's no difference on their callbacks rates
r = np.concatenate((b.call, w.call))
#initialize an empty array for storing all bootstrap replicates
bs_reps = np.empty(10000)

#use for loop to compute a desired number of bootstrap replicates
for i in range(10000):
    #generate bootstrap sample from the concatenated array
    bs = np.random.choice(r, size=len(r))
    #assign the first half to the male group and the second half to the female group
    bs_b = bs[:len(b)]
    bs_w = bs[len(b):]
    #compute the difference in bootstrap sample proportions
    bs_reps[i] = sum(bs_b)/len(bs_b) - sum(bs_w)/len(bs_w)

#compute margin of error based on 95% confidence interval (z=1.96)
moe = 1.96*np.std(bs_reps)
print("Margin-of-error is: ", moe)

#compute 95% confidence interval
con_int = np.percentile(bs_reps, [2.5, 97.5])
print("95% confidence interval is:", con_int)

#compute the p-value based on the number of replicates which are more extreme than what is observed in the sample
p = np.sum(bs_reps <= r_diff)/len(bs_reps)
#multiply p by 2 for two-tailed test
p = 2*p
print("p-value is: ", p)

Margin-of-error is:  0.015245716635875482
95% confidence interval is: [-0.01519507  0.01519507]
p-value is:  0.0004


Given that p< $\alpha$ and p-value is a very small number, the null hypothesis is rejected. Therefore, race has a significant impact on the rate of callbacks for resumes.

### 2) Frequentist statistcal approach
<p>Null hypothesis: race doesn't have a significant impact on the rate of callbacks for resumes. i.e. r(black) - r(white) = 0</p>
<p>Test statistics: sample proportion (ratio)</p>
<p>Method: two sample test</p>
<p>Significance level: $\alpha$=0.05</p>

In [7]:
#compute the proportion receiving callbacks in black-sounding names
r_b = sum(b.call)/len(b)
#compute the proportion receiving callbacks in white-sounding names
r_w = sum(w.call)/len(w)
#compute the difference in their proportions
r_diff = r_b - r_w

#according to the null hypothesis, the hypothesized difference in proportions is zero
r_diff_hypo = 0

#compute the variance of sampling distribution of sample proportion for black-sounding names, here sample proportion is used as an estimate of the population proportion
var_b = r_b*(1-r_b)/len(b)
#compute the variance of sampling distribution of sample proportion for white-sounding names, here sample proportion is used as an estimate of the
var_w = r_w*(1-r_w)/len(w)
#compute the standard deviation of sampling distribution of the difference in sample proportions
std_r_diff = np.sqrt(var_b + var_w)

#compute margin-of-error based on 95% confidence interval (z=1.96)
moe = 1.96*std_r_diff
print("Margin-of-error is: ", moe)

#compute 95% confidence interval, the corresponding z value is 1.96
con_int = r_diff_hypo + np.array([-1,1])*moe
print("95% confidence interval is: ", con_int)

#compute z statistic and look up the p-value from a $z$ table
z = (r_diff - r_diff_hypo)/std_r_diff
print("z score is: ", z)

Margin-of-error is:  0.015255406349886438
95% confidence interval is:  [-0.01525541  0.01525541]
z score is:  -4.11555043573


<p>Use a $z$ table to look up the p value for two-tailed test:</p> 
<p>p($z$<= -4.116) < 0.0001</p>

In [8]:
#use scipy.stats module to compute the p-value
p = stats.norm.cdf(r_diff, r_diff_hypo, std_r_diff)
#multiply p by 2 for two-tailed test
p = 2*p
print("p-value is ", p)

p-value is  3.862565207522622e-05


<p>The p-value calculated using scipy.stats module is in agreement with the p-value found using a $z$ table.</p>
Given that p< $\alpha$ and p-value is a very small number, the null hypothesis is rejected. Therefore, race has a significant impact on the rate of callbacks for resumes.

In [9]:
print(r_b, r_w, r_diff)

0.06447638603696099 0.09650924024640657 -0.032032854209445585


## Q4. Write a story describing the statistical significance in the context or the original problem.

<p>Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés black-sounding or white-sounding names and observing the impact on requests for interviews from employers.</p>
<p>The results of this study shows that 9.65% of the resumes with white-sounding names received callbacks, whereas only 6.45% of the resumes with black-sounding names received callbacks. The difference in their callback rates, 3.20%, are statistically significant, with a z score of -4.116 and a p-value close to 0. This suggests that, all other things being equal, race still has a statistically significant impact on the rate of callbacks the candidates receive.</p>

## Q5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

<p>The above analysis shows that race/name is indeed an important factor in callback success, but it may or may not be **the most important** factor.</p> 
<p>Résumé quality also varies by summer employment experience, school-year employment, volunteering experience, extra computer skills, special honors and military experience, etc.</p>
<p>To understand if race/name is the most important factor, the correlation between callback rate and all these potential factors should be evaluted as well, using the information available from the dataset. Then, a ranking of their relative importance can be concluded to determine if race/name is the most important factor among all.</p>