# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [64]:
#Import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import scipy.stats
#from statsmodels.stats.weightstats import ztest
#allow all columns to be viewed:
pd.set_option('display.max_columns', None)

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [24]:
# number of callbacks for black-sounding names
black_callback = sum(data[data.race=='b'].call)
#Total black-sounding names
black_names = data[data.race=='b']['id'].count()

In [25]:
# number of callbacks for white-sounding names
white_callback = sum(data[data.race=='w'].call)
#Total white-sounding names
white_names = data[data.race=='w']['id'].count()

In [11]:
#Total callbacks
all_callback = sum(data.call)

In [26]:
print(str(black_callback) + " people with black-sounding names received a callback out of " + str(black_names) + " people with black-sounding names, which is " + str(black_callback / black_names * 100) + "%.")

157.0 people with black-sounding names received a callback out of 2435 people with black-sounding names, which is 6.447638603696099%.


In [27]:
print(str(white_callback) + " people with white-sounding names received a callback out of " + str(white_names) + " people with white-sounding names, which is " + str(white_callback / white_names * 100) + "%.")

235.0 people with white-sounding names received a callback out of 2435 people with white-sounding names, which is 9.650924024640657%.


In [29]:
print("Overall, " + str(black_callback + white_callback) + " individuals received a callback out of " + str(white_names + black_names) + " which is " + str((white_callback + black_callback) / (white_names + black_names)) + "%.")

Overall, 392.0 individuals received a callback out of 4870 which is 0.08049281314168377%.


In [6]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,occupbroad,workinschool,email,computerskills,specialskills,firstname,sex,race,h,l,call,city,kind,adid,fracblack,fracwhite,lmedhhinc,fracdropout,fraccolp,linc,col,expminreq,schoolreq,eoe,parent_sales,parent_emp,branch_sales,branch_emp,fed,fracblack_empzip,fracwhite_empzip,lmedhhinc_empzip,fracdropout_empzip,fraccolp_empzip,linc_empzip,manager,supervisor,secretary,offsupport,salesrep,retailsales,req,expreq,comreq,educreq,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,1,0,0,1,0,Allison,f,w,0.0,1.0,0.0,c,a,384.0,0.98936,0.0055,9.527484,0.274151,0.037662,8.706325,1.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,6,1,1,1,0,Kristen,f,w,1.0,0.0,0.0,c,a,384.0,0.080736,0.888374,10.408828,0.233687,0.087285,9.532859,0.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,1,1,0,1,0,Lakisha,f,b,0.0,1.0,0.0,c,a,384.0,0.104301,0.83737,10.466754,0.101335,0.591695,10.540329,1.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,5,0,1,1,1,Latonya,f,b,1.0,0.0,0.0,c,a,384.0,0.336165,0.63737,10.431908,0.108848,0.406576,10.412141,0.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,5,1,1,1,0,Carrie,f,w,1.0,0.0,0.0,c,a,385.0,0.397595,0.180196,9.876219,0.312873,0.030847,8.728264,0.0,some,,1.0,9.4,143.0,9.4,143.0,0.0,0.204764,0.727046,10.619399,0.070493,0.369903,10.007352,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

## Question 1 - What test is appropriate for this problem? Does CLT apply?

I believe a two-sample, two-tailed t-test applies in this situation, because I am checking if there is a significant difference in the call-back rates of the two groups. There are two samples - the white-sounding name resumees and black-sounding name resumees - which makes it a two-sample test.

Yes, Central Limit Theorem applies in this case. The data itself follows a binomial distribution, but I can use bootstrap methods and oversample the dataset to approximate a normal distribution.

## Question 2 - What are the null and alternate hypothesis?

The null hypothesis is that there is no difference in the call-back rates between the two groups of names. The alternate hypothesis is that white-sounding names have a different call-back rate than black-sounding names.

## Question 3 - Compute margin of error, confidence interval, and p-value, using bootstrapping and frequentist approaches

In [38]:
#Create new datasets with each subset group
w = data[data.race=='w']['call']
b = data[data.race=='b']['call']

### Bootstrap methods

In [49]:
#Create bootstrap samples of w and b
w_samp = np.random.choice(w, len(w)*10)
b_samp = np.random.choice(b, len(b)*10)

In [50]:
#Calculate mean white callback rate
w_samp_mean = np.mean(w_samp)
#Calculate mean white callback standard deviation
w_samp_std = np.std(w_samp)

print(w_samp_mean)
print(w_samp_std)

0.09396304
0.2917773


In [51]:
#Calculate mean white callback rate
b_samp_mean = np.mean(b_samp)
#Calculate mean white callback standard deviation
b_samp_std = np.std(b_samp)

print(b_samp_mean)
print(b_samp_std)

0.06488706
0.24632649


In [53]:
#Calculate 5% margin of error using bootstrap sample data
me_w_bs = 1.96*(w_samp_std/np.sqrt(len(w_samp)))

me_b_bs = 1.96*(b_samp_std/np.sqrt(len(b_samp)))

print(me_w_bs)
print(me_b_bs)

0.003664866073887781
0.0030939814761724305


The 5% margin of error calculated from the bootstrap samples is 0.0037 for white-sounding names and 0.0031 for black-sounding names.

In [54]:
#Calculate Confidence Intervals
def bootstrap_conf_int(data, alpha):
    bs_sample = np.random.choice(data, len(data))
    
    loc = np.mean(bs_sample)
    scale = np.std(bs_sample)
    
    conf_int = stats.norm.interval(alpha=alpha, loc=loc, scale=scale)
    
    return conf_int


In [56]:
#Calculate the confidence interval using bootstrap methods with one draw formula
w_bs_conf_int = bootstrap_conf_int(w_samp, alpha=.68)
b_bs_conf_int = bootstrap_conf_int(b_samp, alpha=.68)

print(w_bs_conf_int)
print(b_bs_conf_int)

(-0.19462184588081988, 0.3747450558341281)
(-0.18126499853209846, 0.31424240968303085)


In [66]:
#Calculate p-value comparing the two groups
bs_test_stat = scipy.stats.ttest_ind(w_samp, b_samp)
bs_test_stat

Ttest_indResult(statistic=11.881743781552869, pvalue=1.6333145992643703e-32)

The p-value is very small, which tells me I can reject the null hypothesis that there is no difference between the call-back rate of the two groups. Race does appear to be a factor based on these methods. 

### Frequentist Methods

In [36]:
#Calculate mean white callback rate
w_mean = np.mean(w.call)
#Calculate mean white callback standard deviation
w_std = np.std(w.call)

print(w_mean)
print(w_std)

0.09650924056768417
0.29528486728668213


In [37]:
#Calculate mean black callback rate
b_mean = np.mean(b.call)
#Calculate mean black callback standard deviation
b_std = np.std(b.call)

print(b_mean)
print(b_std)

0.0644763857126236
0.24559901654720306


In [67]:
#Calculate 5% margin of error with frequentist methods
w_me = 1.96*(w_std/np.sqrt(len(w)))
b_me = 1.96*(b_std/np.sqrt(len(b)))

print(w_me)
print(b_me)

0.011728643328433408
0.00975513338480545


The margin of error for the response rate is 0.0117 for white-sounding names and 0.0097 for black-sounding names.

In [68]:
#Calculate 95% confidence intervals
w_upper_bound = w_mean + w_me
w_lower_bound = w_mean - w_me
print("CI for White Names: " + str(w_lower_bound) + " - " + str(w_upper_bound))

CI for White Names: 0.08478059723925077 - 0.10823788389611758


In [69]:
#Calculate 95% confidence intervals
b_upper_bound = b_mean + b_me
b_lower_bound = b_mean - b_me
print("CI for Black Names: " + str(b_lower_bound) + " - " + str(b_upper_bound))

CI for Black Names: 0.054721252327818146 - 0.07423151909742905


Interestingly, the upper bounds of the CI for black names does not even touch the lower bound for the CI of white names. This is an indication that there is a significant difference between the two of them.

In [70]:
#Calculate p-value using t-test
test_stat = scipy.stats.ttest_ind(w, b)
test_stat

Ttest_indResult(statistic=4.114705290861751, pvalue=3.940802103128886e-05)

The p-value for the t-test statistic is 0.0000394. This is very low and indicates that we can reject the null hypothesis, which was that there is no difference between the two group's call-back rates. 

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

## Question 4 - Story describing statistical significance in the context of the original problem

The original problem statement was asking whether race has a significant impact on the rate of callbacks for resumes. After comparing the rates of the two datasets I can say that there is a statistically significant difference between the rate of callbacks of people with black-sounding and white-sounding names. There is only a 0.0000394 probability that we would get the same call-back rate for each group if their true call-back rates were identical, which is a very small probability. 

## Question 5 - Does this mean race is the most important factor? Why or why not? If not, how would you amend your analysis?

No, this does not indicate that race is the most important factor, only that race is a statistically significant factor. This is because we have only tested one factor, we have not done any analysis of the other factors, and a low p-value alone cannot tell how important a result is.

I would improve this analysis by doing a logistic regression and checking the coefficients to see which variable has the greatest impact on callbacks. 