# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd 
import numpy as np
from scipy import stats


In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')


In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)


157.0

In [5]:
data.head()


Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


## 1. What test is appropriate for this problem? Does the CLT apply? 
In this case, we are comparing the callback rates of two independent groups. Names were randomly assigned to resumes, therefore, the two groups are independent. Therefore, the best test for this is a two sample t-test for independent samples. 

For the CLT to apply, there are a few conditions that need to be met. First, the sample needs to be randomly drawn from the population. In this case, the names are randomly assigned to resumes, so we can assume that this condition is met. Second, the size of the sample must not be more than 10% of the total population size. Since there are millions of possibilities for names and recruiters see hundreds of resumes, we can assume that this condition is also met. The last condition is that the sample size must be big enough, which is usually categorized as n > 30. This condition is also met, so the CLT applies in this case. 

## 2. What are the null and alternative hypotheses? 

*Hull hypothesis*: There is no difference in the callback rates of white and black-sounding names. 

*Alternative hypothesis*: There is a significant difference in the callback rates between white and black-sounding names. (Race has a significant effect on callback rate). 

## 3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

In [21]:
w = data[data.race=='w']
b = data[data.race=='b'] 
b = b["call"] 
w = w["call"] 

def draw_bootstrap(data, func, size=1):

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = func(np.random.choice(data, size=len(data)))

    return bs_replicates

b_replicates = draw_bootstrap(b, sum, 1000) 
w_replicates = draw_bootstrap(w, sum, 1000) 

pooled_stdev_bs = np.sqrt(((len(b_replicates) - 1) * (np.std(b_replicates)**2) + (len(w_replicates) - 1) * (np.std(w_replicates)**2))/(len(b) + len(w) - 2))

moe_bs = 1.96 * (pooled_stdev_bs)*(np.sqrt((1/len(b_replicates)) + (1/len(w_replicates))))
print('The margin of error is', moe_bs) 

ci_low_bs = (np.mean(b_replicates) - np.mean(w_replicates)) / 1000 - moe_bs 
ci_high_bs = (np.mean(b_replicates) - np.mean(w_replicates)) / 1000 + moe_bs 
print('95% confidence interval is: [', ci_low_bs, ',', ci_high_bs, ']') 

p = np.sum(b_replicates >= w_replicates) / 1000 
p2 = np.sum(b_replicates - w_replicates >= 0) / 1000 
print('p =', p) 
print('p2 =', p2)  


The margin of error is 0.7448481561869554
95% confidence interval is: [ -79.03684815618696 , 0.6665561561869554 ]
p = 0.0
p2 = 0.0


Using the bootstrap method, we can see that the p-value is below p < 0.05. Therefore, there is a significant difference between the callback success of black-sounding names and white-sounding names. We reject the null hypothesis that there is no difference in callback success rate. 

In [33]:
w = data[data.race=='w'] 
w = w["call"]
b = data[data.race=='b']
b = b["call"] 

b_mean = np.mean(b)
print('The proportion of callbacks for black-sounding names is', b_mean) 

w_mean = np.mean(w) 
print('The proportion of callbacks for white-sounding names is', w_mean)  

n_b = len(b)  
n_w = len(w) 

pooled_stdev = np.sqrt(((len(b) - 1) * (np.std(b)**2) + (len(w) - 1) * (np.std(w)**2))/(len(b) + len(w) - 2))

moe_t = 1.96 * (pooled_stdev)*(np.sqrt((1/len(b)) + (1/len(w))))
print('The margin of error is', moe_t) 

ci_low = (b_mean - w_mean) - moe_t
ci_high = (b_mean - w_mean) + moe_t 
print('95% confidence interval is: [', ci_low, ',', ci_high, ']') 

from statsmodels.stats.proportion import proportions_ztest 

calls_b = sum(data[data.race=='b'].call)
calls_w = sum(data[data.race=='w'].call)
print('The two sample z test of proportions for independent samples is:')
proportions_ztest([calls_b, calls_w], [len(b), len(w)], value=None, alternative='two-sided', prop_var=False)


The proportion of callbacks for black-sounding names is 0.0644763857126236
The proportion of callbacks for white-sounding names is 0.09650924056768417
The margin of error is 0.015255284385449893
95% confidence interval is: [ -0.04728813924051047 , -0.016777570469610682 ]
The two sample z test of proportions for independent samples is:


(-4.108412152434346, 3.983886837585077e-05)

The 95% confidence interval does not contain zero. Therefore, there is evidence to suggest that there is indeed a significant difference between the proportion of black-sounding names that get callbacks and the proportion of white-sounding names that get callbacks. 

Using the two-sample z test for equal proportions in independent samples, the p-value is p < 0.00. THis means that we reject the null hypothesis. There is enough evidence to suggest that race has a significant effect on callback success. 

Just to be sure, we can check the results with a two-sample t-test for independent samples. 

In [34]:
print('The two sample t-test for independent samples result is:')
stats.ttest_ind(w, b, axis=0, equal_var=True) 


The two sample t-test for independent samples result is:


Ttest_indResult(statistic=4.114705290861751, pvalue=3.940802103128886e-05)

Using the two-sample t-test for independent samples, the p-value is p < 0.00. This means that we reject the null hypothesis. There is enough evidence to suggest that race has a significant effect on callback success. 

## 4. Write a story describing the statistical significance in the context or the original problem.
This research experiment was intended to investigate the idea that recruiters have stereotypes about race, which can be drawn from the name on a person's resume. In this experiment, researchers randomly assigned either a white or black-sounding name to identical resumes. They then measured if the resume got a callback, denoted as the binary variable "call" which was assigned a 1 if the resume received a callback and 0 otherwise. 

After analyzing the callback rate of the two groups, we can see that perceived race of the person does indeed have a significant effect on callback success rate. Resumes with black-sounding names are significantly less likely to get a callback than a resume with a white-sounding name, even when the content is identical. 

## 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis? 
This analysis does not mean that race / name is the most important factor in callback success. It only means that it is **an** important factor in callback success. To determine the most important factor, we need to include ther factors that may impract callback success and conduct a regression analysis. 