# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [88]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from statsmodels.stats import proportion as prop

def ecdf(array):
    x = np.sort(array)
    y = np.arange(1,float(len(x)+1))/(len(x))
    return x,y

In [89]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [90]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [91]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


### 1) What test is appropriate for this problem? Does CLT apply?

We will conduct a two-sided binomial test. The central limit theorem is useful here in that it allows us to create a normal approximation to the binomial distribution. This works because we can consider each application's success or failure as a sample of size 1 from a bernoulli distribution (a bernoulli trial). The law of large numbers, the CLT's simpler antecendent, describes the way in which the sample mean becomes a more reliable measure of centrality as the number of bernoulli trials increases.

### 2) What are the null and alterative hypotheses?

Our null hypothesis will be that the true expected value for the proportion of applications to calls for black applicants is equal to the measured proportion for white applicants.

### 3) Compute margin of error, confidence interval, and p-value.

We will do so in our test below.

In [92]:
#We compute the number of trials, number of successes, and proportion of successes for white and black applicants.
b_sample_size = len(data[data['race'] == 'b'])
w_sample_size = len(data[data['race'] == 'w'])

b_calls = sum(data[data.race=='b'].call)
w_calls = sum(data[data.race=='w'].call)

b_call_frac = b_calls/b_sample_size
w_call_frac = w_calls/w_sample_size

# We compute 95% confidence intervals for the measured proportions of black and white calls per application.
b_ci = prop.proportion_confint(b_calls,b_sample_size,alpha=0.05,method='normal')
w_ci = prop.proportion_confint(w_calls,w_sample_size,alpha=0.05,method='normal')

# We compute the margin of error from the confidence interval.
b_moe = b_call_frac-b_ci[0]
w_moe = w_call_frac-w_ci[0]

# We conduct the binomial test described above and get a p value
p = stats.binom_test(x = b_calls,n = b_sample_size,p = w_call_frac,alternative='two-sided')

# We report the results with these print statements.
print("Sample size for black applicants: " + str(b_sample_size))
print("Calls for black applicants: " + str(b_calls))
print("Calls per application for black applicants: " + str(b_call_frac))
print("95% confidence interval for black calls per application: " + str(b_ci))
print("Margin of error for black calls per application: " + str(b_moe))
print('\n')
print("Sample size for white applicants: " + str(w_sample_size))
print("Calls for white applicants: " + str(w_calls))
print("Calls per application for white applicants: " + str(w_call_frac))
print("95% confidence interval for white calls per application: " + str(w_ci))
print("Margin of error for white calls per application: " + str(w_moe))

print('\n')
print("We reject the null hypothesis at a p value of: " + str(p))

Sample size for black applicants: 2435
Calls for black applicants: 157.0
Calls per application for black applicants: 0.064476386037
95% confidence interval for black calls per application: (0.054721407262367537, 0.074231364811554429)
Margin of error for black calls per application: 0.00975497877459


Sample size for white applicants: 2435
Calls for white applicants: 235.0
Calls per application for white applicants: 0.0965092402464
95% confidence interval for white calls per application: (0.084780674296387401, 0.10823780619642574)
Margin of error for white calls per application: 0.01172856595


We reject the null hypothesis at a p value of: 2.04875126546e-08


### 4) Write a story describing the statistical significance in the context or the original problem.

On average, black job applicants are less likely to be called back for a interview than their white counterparts. Given the results of this study, we can state this with great confidence. Of course, we're left with many follow up questions that can refine the specificity of our claim. Are there particular industries, regions, and employers that the problem is localized to? We can expect all of these variables to have had some impact on the average behavior.

Our initial study has provided a clear answer to the question "Is there racial bias in the hiring process?". If our goal is to understand and eliminate this bias, then our natural next step might be to begin dissecting those variables which contribute to the average behavior we've successfully measured.

### 5) Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

No, our analysis sought to establish that race was an impactful factor in callback success, all other factors being controlled for. To compare the impact of racial bias to other variables, such as years of experience, we could do similarly controlled tests on those individual variables. Our current data might lead us to useful findings in this regard if we measure the impact of each variable on the binary outcome using a technique like logistic regression. Of course, we would need to carefully reduce the covariance between the measured variables and with other lurking variables. Deconstructing the results of a principal component analysis might offer a useful way forward for finding an orthonormal basis of success factors in which race would is one vector.

