# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.graphics.gofplots import qqplot
import random

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [17]:
# number of callbacks for black-sounding names
callbacks_w = sum(data[data.race=='w'].call)
callbacks_b = sum(data[data.race=='b'].call)
print('The number of white people receive callbacks is: ' + str(callbacks_w))
print('The number of black people receive callbacks is: ' + str(callbacks_b))

The number of white people receive callbacks is: 235.0
The number of black people receive callbacks is: 157.0


In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [29]:
total_size = np.shape(data)
size_b = len(data[data.race == 'b'])
size_w = len(data[data.race == 'w'])
print('The size of this dataset is: ' + str(total_size))
print('This dataset contains the information of ' + str(size_b) + ' black people and ' + str(size_w) + ' white people')

The size of this dataset is: (4870, 65)
This dataset contains the information of 2435 black people and 2435 white people


<div class="span5 alert alert-success">
<p>Q1: Since the sample size is 4870, so it's large enough for both withe people and black people group. Thus we can apply the central limit theorem here. We should use 2 sample t-test because we want to compare 2 samples here.</p>
</div>

<div class="span5 alert alert-success">
    Q2: Set a 0.05 significance level
<p>
    Null hypothesis is: There's no difference between the mean of white and black people
        </p>
    Alternate hypothese is: The mean of call back for white and black people is different     
</div>

<div class="span5 alert alert-success">
<p>Q3: The computation of both methods is showing as below: </p>
</div>

In [33]:
b_data = data[data.race == 'b']
w_data = data[data.race == 'w']
b_call_mean = np.mean(b_data.call)
w_call_mean = np.mean(w_data.call)
b_var = np.var(b_data.call)
w_var = np.var(w_data.call)
print('The mean of black people receive interviewing call is: ' + str(b_call_mean))
print('The mean of white people receive interviewing call is: ' + str(w_call_mean))
print('The variance of black people receive interviewing call is: ' + str(b_var))
print('The variance of white people receive interviewing call is: ' + str(w_var))

The mean of black people receive interviewing call is: 0.064476386
The mean of white people receive interviewing call is: 0.09650924
The variance of black people receive interviewing call is: 0.06031918
The variance of white people receive interviewing call is: 0.08719521


In [37]:
t_score = (w_call_mean - b_call_mean)/np.sqrt((w_var/size_w) + (b_var/size_b))
p_value = st.t.sf(np.abs(t_score), size_b+size_w-2)*2
print('Obtained t-score is: ' + str(t_score))
print('Obtained p-value is: ' + str(p_value))

Obtained t-score is: 4.1155504738096065
Obtained p-value is: 3.926438474676631e-05


According to the p-value, it's smaller than our significance level which is 0.05. Thus we can reject H0, which means there is some differences for receiving interviewing calls between different race.

In [39]:
diff_mean = w_call_mean - b_call_mean
z_value = 1.95
print('The difference of the mean is: ' + str(diff_mean))

The difference of the mean is: 0.032032855


According to z-table: z-value of a 95% confidence intervel is 1.96.

In [45]:
m_error = z_value * np.sqrt(w_var/size_w + b_var/size_b)
print('The margin of error is: ' + str(m_error))
print('The confidence interval is: ' + str((diff_mean-m_error, diff_mean+m_error)))

The margin of error is: 0.015177572809487996
The confidence interval is: (0.01685528204557258, 0.047210427664548575)


In [46]:
def bootstrap_replicate_1d(data, func):
    """Generate bootstrap replicate of 1D data."""
    bs_sample = np.random.choice(data, len(data))
    return func(bs_sample)

In [47]:
def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""
    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)
    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func)
    return bs_replicates

In [50]:
def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1) - np.mean(data_2)

    return diff

In [52]:
empirical_diff = diff_of_means(w_data.call, b_data.call)
data_mean = np.mean(data.call)
w_shifted = w_data.call - w_call_mean + data_mean
b_shifted = b_data.call - b_call_mean + data_mean
bs_relipcates_w = draw_bs_reps(w_shifted, np.mean, size=10000)
bs_relipcates_b = draw_bs_reps(b_shifted, np.mean, size=10000)
bs_relipcates = bs_relipcates_w - bs_relipcates_b
p_value_boot = np.sum(bs_relipcates > empirical_diff) / len(bs_relipcates)
print('P-value is: ' + str(p_value_boot))

P-value is: 0.0


In [54]:
conf = np.percentile(bs_relipcates, [2.5, 97.5])
print('95% confidence interval is: ' + str(conf))

95% confidence interval is: [-0.01560577  0.01519505]


<div class="span5 alert alert-success">
<p> Q4: Since p-value of the hypothesis test in both method is smaller than the significance level 0.05, we can conclude that we reject the null hypothesis and there's a difference between the means of number of callback of two different race. Therefore, there's racial discrimination towards job seekers with similar background. However, we only consider the race difference and interviewing call back in this case, so the sounding of names might not be the only factor contributing to the different number of interviewing callback.</p>
</div>

<div class="span5 alert alert-success">
<p> Q5: Not necessarily. The analysis above indicates that the sounding of names (race) might have some impact for callback. However, we are still not sure about whether other variables are also have some impacts to the final result or whether race is the most important factor to measure the callbacks. To understand the relation between callbacks and other variables, we could run a regression test to see the correlations between different variables. </p>
</div>