# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context of the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [9]:
import pandas as pd
import numpy as np
from scipy import stats

In [10]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [11]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [12]:
#number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [13]:
b = data[data.race=='b'].call
w = data[data.race=='w'].call

<div class="span5 alert alert-success">
<p> This analysis will use a **two-proportion z test** hypothesis test to compare the callbacks and no-callbacks of both white-sounding-name applicants and black-sounding-name applicants. CLT applies here as the two samples are viewed as independent random variables. Let us define the proportion of black-sounding names that got a callback (1) as p1 and the proportion of white-sounding names that got a callback p2. <br><br>

**Null hypothesis:** p1=p2 <br> <br>
**Alternative Hypothesis:** p1 != p2 </p>
</div>

The entire scope of a two-proportion z-test procedure can be found at https://stattrek.com/hypothesis-test/difference-in-proportions.aspx

## Frequentist Approach

In [14]:
n2= w.count()
p2= w.sum()/n2

In [15]:
n1=b.count()
p1= b.sum()/n1

In [16]:
p= (p1*n1+p2*n2)/(n1+n2)
SE= np.sqrt(p*(1-p)*((1/n1)+(1/n2)))
z= (p1-p2)/SE
p_value= stats.norm.sf(abs(z))
p_value_2tail= p_value*2
print("P-value: ", p_value_2tail)

P-value:  3.983886837585077e-05


In [17]:
#compute margin of error for both sets of data 
SE_b= np.sqrt((p1*(1-p1))/n1)
SE_w= np.sqrt((p2*(1-p2))/n2)
MoE_b=1.96*SE_b
MoE_w= 1.96*SE_w
print("Margin of Error (at 95% confidence) for callbacks of black-sounding named applicants:", MoE_b)
print("95% Confidence Interval (black-sounding names):", np.array([p1-MoE_b,p1+MoE_b]))
print("Margin of Error (at 95% confidence) for callbacks of white-sounding named applicants:", MoE_w)
print("95% Confidence Interval (white-sounding names):", np.array([p2-MoE_w,p2+MoE_w]))

Margin of Error (at 95% confidence) for callbacks of black-sounding named applicants: 0.009755158027911414
95% Confidence Interval (black-sounding names): [0.05472123 0.07423154]
Margin of Error (at 95% confidence) for callbacks of white-sounding named applicants: 0.011728781469131009
95% Confidence Interval (white-sounding names): [0.08478046 0.10823802]


<div class="span5 alert alert-success">
<h4> Frequentist Test summary of results </h4>
<p> 
p-value: 3.984 x 10^-5 <br>
Margin of Error (at 95% confidence) for callbacks of black-sounding named applicants: 0.00976 <br> 
95% Confidence Interval (black-sounding names): 0.0547 - 0.0742 <br>
Margin of Error (at 95% confidence) for callbacks of white-sounding named applicants: 0.01173 <br> 
95% Confidence Interval (white-sounding names): 0.0848 - 0.108 <br>
</p>
</div>

## Bootstrap Approach

First, we construct functions that will bootstrap our data and return a 95% confidence interval.

In [18]:
def bootstrap_replicate_1d(data, func):
    return func(np.random.choice(data, size=len(data)))
def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data,func)

    return bs_replicates

In [19]:
def ci_95(data):
    ran= np.random.choice(data,10000)
    bs= draw_bs_reps(ran, np.sum, 10000)
    bs_prop= bs/len(bs)
    return np.percentile(bs_prop, [2.5,97.5])
    

Proceed to calculate 2-sample bootstrap hypothesis for difference of proportions. We first calculate the empirical difference of proportions between the samples to use as a reference point in our bootstrapping analysis.

Under the null hypothesis, we assume that **p1=p2** so we shift our two samples (black and white sounding named applicants) to have the same proportion of callbacks. Then we bootstrap our samples and calculate the difference between their respective proportions and compare each difference to the empirical difference (perviously calculated). The proportions of these differences that are greater than the empirical difference gives us our p-value.

In [20]:
diff=p2-p1

In [21]:
avg= (p1+p2)/2

In [22]:
b_shift= int(round(avg*n1))
b_shifted= np.array([1]*b_shift+[0]*(n1-b_shift))
w_shift= int(round(avg*n2))
w_shifted= np.array([1]*w_shift+[0]*(n1-w_shift))

In [23]:
bs_b= draw_bs_reps(b_shifted, np.sum,10000)
bs_w= draw_bs_reps(w_shifted, np.sum,10000)

In [24]:
bs= (bs_b/len(bs_b))-(bs_w/(len(bs_w)))
bs_p_value1= np.sum(bs > diff)/len(bs)
print("p-value (diff > .032): ", bs_p_value1)
bs_p_value2= np.sum(bs < (-diff))/len(bs)
print("p-value (diff < -.032): ", bs_p_value2)

p-value (diff > .032):  0.0
p-value (diff < -.032):  0.0


In [25]:
print("95% Confidence Interval (black-sounding names): ", ci_95(b))
print("95% Confidence Interval (white-sounding names): ", ci_95(w))

95% Confidence Interval (black-sounding names):  [0.0524 0.0616]
95% Confidence Interval (white-sounding names):  [0.0857975 0.097    ]


<div class="span5 alert alert-success">
<h4> Bootstrap Approach summary of results </h4>
<p> 
p-value: 0.0 <br>
95% Confidence Interval (black-sounding names): 0.0582 - 0.0677 <br>
95% Confidence Interval (white-sounding names): 0.0941 - 0.1058 <br>
</p>
</div>

<div class="span5 alert alert-success">
<p> The statistical tests performed above both yeilded extremely low p-values. This leads us to reject the null hypothesis, which states that the proportions of callbacks of job applicants with black and white sounding names is equal. However, there are upwards of 60 variables recorded for each applicant so we cannot say for certain that race/name is *the most* important factor when it comes to applicant callbacks. To ascertain this information, we would need to employ more in depth exploratory data analysis. Because the data is of a categorical type, we would use Chi-Square tests to reveal the biggest correlations in the variables of the data. </p>
</div>