# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [2]:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


### 1. What test is appropriate for this problem? Does CLT apply? ###

A z-test for a difference of proportions is appropriate for this problem. This test is appropriate because the data comes from two independent populations (applicants with 'black-sounding' versus 'white-sounding' names), and the variable of interest is the 'call' variable, which is a Bernoulli random variable where a value of 1 denotes success in getting a callback and 0 means no callback. Since this variable has values of only 0 or 1, we can count the number of successes and find the proportion of callbacks. The CLT applies because (1) the names are randomly assigned to resumes, which creates two independent samples and (2) for each group, the sample size multiplied by the proportion of successes and failures respectively (i.e. the number of successes and failures) is greater than or equal to 10, sufficient to use a normal approximation for the binomial distribution of the difference in proportions. The calculations to find the proportions in each sample are shown below.

In [22]:
#Split data into applicants with white-sounding names and applicants with black-sounding names and find the size of each sample

b = data[data.race=='b']
w = data[data.race=='w']

num_b = len(b)
num_w = len(w)

#Find number of successes in each sample

b_num_success = len(b[b['call'] == 1])
w_num_success = len(w[w['call'] == 1])

#Find proportion of successes in each sample

prop_w_success = float(w_num_success) / num_w
prop_b_success = float(b_num_success) / num_b
print('Proportion of applicants with black-sounding names who were called back: ' + str(prop_b_success))
print('Proportion of applicants with white-sounding names who were called back: ' + str(prop_w_success) + '\n')

#Check conditions for CLT, if all are true, necessary conditions are met

print('Check conditions needed to use normal approximation and CLT:')
print('Is the number of successes in the black-sounding names sample at least 10? ' + str(b_num_success >= 10))
print('Is the number of successes in the white-sounding names sample at least 10? ' + str(w_num_success >= 10))
print('Is the number of failures in the black-sounding names sample at least 10? ' + str((num_b - b_num_success) >= 10))
print('Is the number of failures in the white-sounding names sample at least 10? ' + str((num_w - w_num_success) >= 10))


Proportion of applicants with black-sounding names who were called back: 0.06447638603696099
Proportion of applicants with white-sounding names who were called back: 0.09650924024640657

Check conditions needed to use normal approximation and CLT:
Is the number of successes in the black-sounding names sample at least 10? True
Is the number of successes in the white-sounding names sample at least 10? True
Is the number of failures in the black-sounding names sample at least 10? True
Is the number of failures in the white-sounding names sample at least 10? True


### 2. What are the null and alternate hypotheses? ###

The null hypothesis is that the proportion of applicants with black-sounding names who were called back is equal to the proportion of applicants with white-sounding names who were called back: 

p_b = p_w

The alternate hypothesis is that the proportion of applicants with black-sounding names who were called back is less than the proportion of applicants with white-sounding names who were called back:

p_b < p_w

### 3. Compute margin of error, confidence interval, and p-value ###

In [56]:
# Frequentist stats approach:
# Find standard error, margin of error, and confidence interval

std_error = ((prop_b_success*(1-prop_b_success) / num_b) + (prop_w_success*(1-prop_w_success) / num_w))**(0.5)

#Find margin of error, assuming a confidence level of 95%
z_value = stats.norm.ppf(.95)
m_error = z_value * std_error
print('The margin of error is', m_error)

#Find 95% confidence interval
conf_int = (((prop_b_success - prop_w_success) - m_error), (prop_b_success - prop_w_success) + m_error)
print('The confidence interval is', conf_int)

#Conduct hypothesis test and find p-value
p_value = proportions_ztest([b_num_success, w_num_success], [num_b, num_w], alternative='smaller')[1]
print('The p-value is', p_value)

The margin of error is 0.012802505339402668
The confidence interval is (-0.04483535954884825, -0.019230348870042917)
The p-value is 1.9919434187925383e-05


In [57]:
#Bootstrapping approach to find confidence interval:

#Helper function to calculate the difference in proportions of callbacks between two samples of data
def find_diff_prop(data_1, data_2):
    return float(np.sum(data_1)) / len(data_1) - float(np.sum(data_2)) / len(data_2)

#Resample from each sample of data separately, then calculate bootstrap replicate difference in proportions of callbacks; repeat a large number of times. The 95% confidence interval includes the middle 95% of all the bootstrap replicates
def bootstrap_conf_int(data_1, data_2, size=10000):
    bootstrap_replicates = np.empty(size)
    for i in range(size):
        data_1_resample = np.random.choice(data_1, len(data_1))
        data_2_resample = np.random.choice(data_2, len(data_2))
        bootstrap_replicates[i] = find_diff_prop(data_1_resample, data_2_resample)
    return np.percentile(bootstrap_replicates, [2.5, 97.5])

conf_int = bootstrap_conf_int(b['call'], w['call'])
print('The bootstrap confidence interval is', conf_int)

#Permutation test to find the p-value:
#Function to generate a permutation sample; first shuffle the data, and then draw a sample of the same size as one of the original samples. Use the remaining values for the second sample.
def perm_sample(vals, len_data_1):
    callbacks_perm = np.random.permutation(vals)
    perm_sample_1 = callbacks_perm[:len_data_1]
    perm_sample_2 = callbacks_perm[len_data_1:]
    return perm_sample_1, perm_sample_2

#Function to calculate a large number of permutation replicates for the difference in proportions of callbacks    
def draw_perm_reps(size=10000):
    pt_replicates = np.empty(size)
    for i in range(size):
        data_1, data_2 = perm_sample(data.call.values, num_b)
        pt_replicates[i] = find_diff_prop(data_1, data_2)
    return pt_replicates

perm_replicates = draw_perm_reps()

#Calculate p-value as the percentage of permutation replicates that are less than or equal to the observed difference in proportions of callbacks in the original data
p = float(np.sum(perm_replicates <= (prop_b_success - prop_w_success))) / len(perm_replicates)
print('The p-value is', p)

The bootstrap confidence interval is [-0.04722793 -0.01683778]
The p-value is 0.0001


### 4. Write a story describing the statistical significance in the context of the original problem ###

The analyses from the frequentist stats approach as well as the bootstrapping approach resulted in a very small p-value. The p-value is the probability of getting a sample statistic value as extreme or more extreme than what is observed from the data, when the null hypothesis is true. Since the computed p-values from both approaches are very small, this implies that it is very unlikely to obtain, by random chance alone, a difference in the proportions of callbacks between applicants with black-sounding names versus those with white-sounding names as small as what was calculated in the data. Thus our data is very contradictory to the null hypothesis. We have sufficient evidence to reject the null hypothesis and conclude that the interview callback rate for job applicants with black-sounding names is less than the callback rate for applicants with white-sounding names.  

The frequentist 95% confidence interval was calculated using a method such that 95% of all confidence intervals capture the true (population) difference in the proportions of callbacks. The value 0 is not included in the interval we calculated. Since there is a 95% probablity that this interval captures the true difference in proportions of callbacks, and 0 is not included in the interval, there is sufficient evidence that the true difference in callback rates by race/name is not zero. The bootstrapped interval also was calculated in a similar manner and does not include 0. Furthermore, the range of probable values in both intervals (from the frequentist and bootstrapping approaches) for prop_b_success - prop_w_success are all negative. This suggests that the true rate of callbacks for applicants with black-sounding names is less than the true rate of callbacks for applicants with white-sounding names.

### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis? ###

This analysis does not mean that race/name is the most important factor in interview callback success. It means that race is one factor that has a statistically significant effect on callback rates. There could be other variables that affect callback success; individual hypothesis tests would need to be conducted on other variables of interest to determine whether they are statistically significant to callback success. In addition, some of these variables may be dependent on each other. Other measures or tests such as graphing, computing correlations or chi-squared tests for association can help determine if there is a dependence between race/name and other variables.