# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [46]:
import pandas as pd
import numpy as np
from scipy import stats
np.random.seed(42)

In [47]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [48]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [49]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [50]:
print(data.shape)
print(data.loc[data.race == 'w'].shape)
print(data.loc[data.race == 'b'].shape)


(4870, 65)
(2435, 65)
(2435, 65)


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>
Q1. An appropriate test is A/B test. Where each group corresponds represents black and white-sounding names. The number of call backs can be compared for each group and see if there is significant difference. CLT applies here because of the large sample size. We can also assume that each observation in the data set is an independent, which is a requirement for CLT to hold.

Q2. The null hypthesis is that there is no statistical difference in the number of call backs for both groups. Both groups have the same distribution of number of call backs. The alternate hypothesis would be that there is a statistical significant difference in number of call backs for both groups.

In [51]:
w = data[data.race=='w']
b = data[data.race=='b']

In [52]:
def bootstrap_perms(concatenate_data, func, size_group_A):
    """Generate permutations. Calculate statistic of each group. Return difference"""
    perm = np.random.permutation(concatenate_data)
    group_A = perm[:size_group_A]
    group_B = perm[size_group_A:]
    return func(group_A) - func(group_B)

In [53]:
# Your solution to Q3 here
group_size = 2435
data_size =  4870
percent_callbacks_w = np.mean(w.call.values)
percent_callbacks_b = np.mean(b.call.values)
obs_diff = percent_callbacks_w - percent_callbacks_b
print('% callbacks white group = ', percent_callbacks_w)
print('% callbacks black group = ', percent_callbacks_b)
print('observed difference = '    , obs_diff)

concatenate_data = np.concatenate((w.call.values, b.call.values))

# take 10000 permutations of data
num_of_perms = 100000
bs_perms = np.empty(num_of_perms)
for i in range(num_of_perms):
    bs_perms[i] = bootstrap_perms(concatenate_data, np.mean, group_size)
print('\n')
print('*** bootstrap approach ***')    
# calculate 95% confidence interval
conf_int = np.percentile(bs_perms, [2.5, 97.5])
# calculate mean of mean temperature
mean_diff = np.mean(bs_perms)    
# calculate std of mean temperature
std_diff = np.std(bs_perms)       
    
print('95% confidence interval = '   , conf_int)
print('mean diff = '                 , mean_diff)
print('standard deviation of diff = ', std_diff)
    
p_value = np.sum(bs_perms >= obs_diff)/len(bs_perms)
print('p value = ', p_value)

print('\n')
print('*** frequentist approach ***')
#assume identical distribution for white and black groups
#calculate percent of call backs across all data
percent_callback = np.mean(data.call.values)
print('sample % callbacks = ', percent_callback)
#calculate standard deviation across all data
std_callback     = np.std(data.call.values)
print('sample callback standard deviation = ', std_callback)
#calculate 95% conf interval assuming distribution is given by CLT
scale = std_callback/np.sqrt(data_size)
print('std CLT = ', scale)
conf_int = [stats.norm.ppf(0.025  , loc=percent_callback, scale=scale), \
            stats.norm.ppf(0.975 , loc=percent_callback, scale=scale)]
print('95% confidence interval CLT = ', conf_int)
p_w = stats.norm.sf(percent_callbacks_w, loc=percent_callback, scale=scale)
p_b = stats.norm.sf(percent_callbacks_b, loc=percent_callback, scale=scale)

print('p value white group = ', p_w)
print('p value black group = ', p_b)

print('probability of observation = ', p_w*p_b)


% callbacks white group =  0.09650924
% callbacks black group =  0.064476386
observed difference =  0.032032855


*** bootstrap approach ***
95% confidence interval =  [-0.01560576  0.01560576]
mean diff =  -6.090349778532982e-05
standard deviation of diff =  0.007808701158907378
p value =  1e-05


*** frequentist approach ***
sample % callbacks =  0.08049282
sample callback standard deviation =  0.27205464
std CLT =  0.0038984472366298296
95% confidence interval CLT =  [0.07285200068602002, 0.08813363304486835]
p value white group =  1.9919529310671513e-05
p value black group =  0.999980080635494
probability of observation =  1.9919132526306386e-05


<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>
Q4. In bootstrap approach, permutations of data is taken 100,000 times and the observed difference in callbacks between both groups are calculated for each permutation. From this set of permuations, a distribution is created with the 95% confidence interval, mean, and error shown above. The p-value returned is 1 part in 100,000.

In frequentist approach, I assume that the distribution of callbacks across all groups has a mean and standard deviation from entire data sample. Assuming CLT, the distribution of percent callback is a normal distribution with mean equal to sample mean and standard deviation equal to sample standard deviation divided by square root of sample size. Given this distribution, I calculate a p-value for the black and white group observed percent callback. The p value of observing the given percent callbacks for each of the groups is the product of the individual group p values. The p-value returned is 2 parts in 100,000.

Both approaches returned a p-value much less than 0.01. Therefore, the observed difference in callbacks for white and black groups is statistical significant. On average, the white group has a higher percent callback value than the black group.

Q5. Other factors need to be considered in resume such as job experience, education level to determine if race is the most important factors.
