# Examining Racial Discrimination in the US Job Market

Sarah Robinson
9.11.2018

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

Q1. Since the central limit theorom can be applied, because n > 30, a two sample ttest is appropriate, since z and t statistics would be similar under CLT circumstances. 

Q2: The null hypothesis is that the two samples have identical mean values, the propability of call backs. The alternate hypothesis is that the two percentages are not identical. 

In [10]:
# Your solution to Q3 here

#Compute margin of error, confidence interval, and p-value. 
#Try using both the bootstrapping and the frequentist statistical approaches.

w = data[data.race=='w']
b = data[data.race=='b']
w_call = w.call.values
b_call = b.call.values



### Q3: Bootstrap approach

In [15]:
#bootstrapping approach

#define bootstrap replicates function
def draw_bs_reps(data, func, size = 1):
    """Draw bootstap replicates."""
    bs_replicates = np.empty(size)
    
    
    for i in range(size):
        bs_replicates[i] = func(np.random.choice(data, size = len(data)))
    
    return bs_replicates

In [49]:
#margin of error using fraction of call backs as test statistic

#MOE Black Names
bs_replicates_b = draw_bs_reps(b_call, np.mean, 10000)
bs_std_b = np.std(bs_replicates_b)
mean_b = np.mean(b_call)
bs_moe_b = 1.96 * bs_std_b
print("The Mean fraction of callbacks for black names is " + str(mean_b))
print("The Margin of Error for the fraction of callbacks for black names is " + str(bs_moe))

The Mean fraction of callbacks for black names is 0.064476386
The Margin of Error for the fraction of callbacks for black names is 0.009645797200751783


In [50]:
#MOE White Names
bs_replicates_w = draw_bs_reps(w_call, np.mean, 10000)
bs_std_w = np.std(bs_replicates_w)
mean_w = np.mean(w_call)
bs_moe_w = 1.96 * bs_std_w
print("The Mean fraction of callbacks for white names is " + str(mean_w))
print("The Margin of Error for the fraction of callbacks for white names is " + str(bs_moe))

The Mean fraction of callbacks for white names is 0.09650924
The Margin of Error for the fraction of callbacks for white names is 0.009645797200751783


In [52]:
#Confidence Interval for black names
ci_lower_b = mean_b - bs_moe_b
ci_upper_b = mean_b + bs_moe_b
print("A 95% confidence interval gives a callback percentage for black names between " + str(ci_lower_b*100)+ "% and " +str(ci_upper_b*100)+"%")

#Confidence Interval for white names
ci_lower_w = mean_w - bs_moe_w
ci_upper_w = mean_w + bs_moe_w
print("A 95% confidence interval gives a callback percentage for white names between " + str(ci_lower_w*100)+ "% and " +str(ci_upper_w*100)+"%")

A 95% confidence interval gives a callback percentage for black names between 5.466939456335392% and 7.4283376861893275%
A 95% confidence interval gives a callback percentage for white names between 8.481241521599497% and 10.820606591937338%


*The confidence intervals do not overlap, providing evidence that there is likely a difference in percent of callbacks between the two groups. A p-test using permutation will provide further insight. 

In [16]:
#p-value using bootstrap permutation

#define permutation sample function
def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = np.concatenate((data1, data2))

    # Permute the concatenated array: permuted_data
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2

In [17]:
#define bootstrap permutation replicates
def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates

In [53]:
#define call fraction function for black names
def frac_call_b(b,w):
    frac = np.sum(b)/len(b)
    return frac

#black callback fraction
b_frac = np.mean(b_call)

In [54]:
#use permutation samples
perm_replicates = draw_perm_reps(b_call, w_call, frac_call_b, size = 100000)

#compute and print p-value: p
p = np.sum(perm_replicates<= b_frac) / len(perm_replicates)
           
print('p-value =' +str(p))

p-value =0


*With 100,000 tests, there was 0 instances of the permutated samples having a smaller percentage of callbacks than the sample of black names.*

### Q3: Frequentist Approach

In [56]:
#use scipy built in ttest function to calculate pvalue
tstat, pvalue = stats.ttest_ind(b_call, w_call)
print("p = "+ str(pvalue))

p = 3.940802103128885e-05


*Both the bootstrap, and the frequentist approaches gave a p-value below 0.05. The Null hypothesis, that there is no difference between black and white callback percentages, is rejected. The difference is highly significant.*

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

### Q4: Write a story describing the statistical significance in the context of the original problem.

Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers. Of the 2,435 resumes with black-sounding names, 6.45% recieved a call back. The same number of resumes sent out white-sounding names received a 9.65% call back rate. 

After performing statistical analysis on the samples, it is clear that the difference in these values is not by chance, and that the same results would be observed if the experiment were performed again. Our confidence interval of 95% tells us that the true call back rate for black-sounding names is somewhere between 5.47% and 7.43%, where the true call back rate for white-sounding names is between 8.48% and 10.82%. 

This difference is highly significant, with a p-value of .000039. This means that there is only a .0039% chance of black-sounding names having the same call back rate as white-sounding names. 

### Q5: Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

The analysis conducted does not conclude that race/name was the most important factor, just that it IS and important factor. To determine the influcence of other factors in callback success, I would have to test all of the variables, likely with a chi square test. 