# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [62]:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

In [63]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [64]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [65]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

Q1. What test is appropriate for this problem? Does CLT apply?

A two-sample z test for differences in proportions is most appropriate.  The Central Limit Theorem (CLT) applies here.  Although the distribution of callbacks is not normally distributed (it is a dichotomous variable), the CLT suggests that the sampling distribution of the differences in proportions will be normally distributed (assuming a sufficiently large random sample).  

Q2. What are the null and alternate hypotheses?

Ho: The difference in the proportion of callbacks for resumes with black and white sounding names equals zero.

Ha: The difference in the proportion of callbacks for resumes with black and white sounding names does not equal zero.  

In [66]:
w = data[data.race=='w']
b = data[data.race=='b']

In [67]:
# Your solution to Q3 here
#Q3. Compute margin of error, confidence interval, and p-value. 
#Try using both the bootstrapping and the frequentist statistical approaches.

In [68]:
#Frequentist approach
#Get number of callbacks
wcallback=np.sum(w['call'])
bcallback=np.sum(b['call'])

#Get number of resumes
wtotal=len(w['call'])
btotal=len(b['call'])

#Create input for np.proportions_ztest
count = ([wcallback, bcallback])
nobs = ([wtotal, btotal])

#Generate and print z and p values

z,p=proportions_ztest(count,nobs, value=None, alternative='two-sided')
print(z,p)
#Reject Ho

#Define functions to compute confidence intervals
def compute_standard_error_prop_two_samples(x1, n1, x2, n2, alpha=0.05):
    p1 = x1/n1
    p2 = x2/n2    
    se = p1*(1-p1)/n1 + p2*(1-p2)/n2
    return np.sqrt(se)
    
def zconf_interval_two_samples(x1, n1, x2, n2, alpha=0.05):
    p1 = x1/n1
    p2 = x2/n2    
    se = compute_standard_error_prop_two_samples(x1, n1, x2, n2)
    z_critical = stats.norm.ppf(1-0.5*alpha)
    return p2-p1-z_critical*se, p2-p1+z_critical*se

#Compute and print confidence interval
ci = zconf_interval_two_samples(wcallback, wtotal, bcallback, btotal)

#Compute and print margin of error
moe = (ci[1]-ci[0])/2
print(ci, moe)


4.108412152434346 3.983886837585077e-05
(-0.04728798023766041, -0.016777728181230755) 0.015255126028214829


In [84]:
#Compute probability of observed difference using permutations

permutation_replicates = np.empty(10000)


white = w["call"]
black = b["call"]
delta_mean = np.mean(white) - np.mean(black)

for i in range(len(permutation_replicates)):
    np.random.seed(12345)
    permutation_samples = np.random.permutation(np.concatenate((white, black)))

    white_perm = permutation_samples[:len(white)]
    black_perm = permutation_samples[len(black):]

    permutation_replicates[i] = np.abs(np.mean(white_perm) - np.mean(black_perm))

p = np.sum(permutation_replicates > delta_mean) / len(permutation_replicates)
print('p =', p)


p = 0.0


<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

Q4. Write a story describing the statistical significance in the context or the original problem.

The fact that the p values from both the frequentist and permutation methods are less than .05 indicates that we can be confident that the "true" difference in percentages is not zero.  In other words, we can say with some certainty that resumes with black-sounding names receive significantly fewer callbacks than those with white-sounding names.  

Q5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

The analysis does not allow us to compare the importance of factors in callback success as it only accounts for the impact of a person's perceived race (based on their name).  There are several ways that we could examine the relationship of other factors on callback rates and compare them to the impact of race.  For instance, we could use a logistic regression model that included other variables from the dataset as predictors.  The results would show the relationship between each variable and the probability of receiving a callback independent of the other predictors.  These estimates could be compared to the estimate for race using tests for differences in coefficients.  