# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [62]:
import pandas as pd
import numpy as np
from scipy import stats

In [63]:
df = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [102]:
df.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [125]:
#Descriptive stats of call by race
df['call'].groupby(df['race']).describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
b,2435.0,0.064476,0.245649,0.0,0.0,0.0,0.0,1.0
w,2435.0,0.096509,0.295346,0.0,0.0,0.0,0.0,1.0


<div class="span5 alert alert-success">
<p>Questions 1 and 2</p>
</div>

The reported callback rate for white-sounding resumes is almost 50% higher than the call back rate for black-sounding names, but is that difference statistically significant?

**Null Hypothesis:** The null hypothesis is that there is no real difference between the callback rates for white-sounding names and black-sounding names. The alternative hypothesis is that there is a difference.

Apliccability of the Central Limit Theorem:
"The general rule of thumb is that the sample size n is 'sufficiently large' if: np ≥ 5 and n(1 − p) ≥ 5"[(PSU.)](https://onlinecourses.science.psu.edu/stat414/node/179)

In [82]:
#Generating seperate arrays for white sounding and black-sounding call backs
w = df.call[df.race=='w']
b = df.call[df.race=='b']


#Checking if black-sounding sample size is large enough
n_b = b.count()
p_b = b.mean()
np_b = n_b * p_b
nq_b = n_b * (1-p_b)
print("np_b = ",round(np_b))
print("nq_b = ",round(nq_b))

#Checking if white-sounding sample size is large enough
n_w = w.count()
p_w = w.mean()
np_w = n_w * p_w
nq_w = n_w * (1-p_w)
print("np_w = ",round(np_w))
print("nq_w = ",round(nq_w))

np_b =  157.0
nq_b =  2278.0
np_w =  235.0
nq_w =  2200.0


All np's and nq's are greater than 5, so the sample size is large enough for the central limit theorem to apply and to use the normal approximation. Also, the independant variable was randomly assigned. Therefore, **the Central Limit Theorem applies** and a two-sample (binomial) **proportions test** is appropriate.

<div class="span5 alert alert-success">
<p>Question 3</p>
</div>

**Permutation Test**

The functions below come from Datacamp's ["Statistical Thinking in Python (Part 2)"](course.https://campus.datacamp.com/courses/statistical-thinking-in-python-part-2/hypothesis-test-examples?ex=2).

In [69]:
def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets
    data = np.concatenate((data1, data2))

    # Permute the concatenated array
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2


def draw_perm_reps(data_1, data_2, func, size):
    """Generate multiple permutation replicates."""
    # Initialize array of replicates
    perm_replicates = np.empty(size)
    
    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates


def rate_diff(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1)-np.mean(data_2)

    return diff

In [104]:
#Compute the observed difference in callback rates between black- and white-sounding names
observed_diff = rate_diff(w,b)
print('Observed Difference:',observed_diff)

#Draw permutation replicates. 
#This function imitates the null hypothesis by generating many pair of permutation samples where each observation is randomly assigned to a specific group, black-sounding or white-sounding, then taking the difference in callback rate for each set of samples.
perm_replicates = draw_perm_reps(w,b,rate_diff, size = 100000)

#Compute p-value, the percent of permutation replicates that are greater than the observed difference
p_value = np.sum(perm_replicates >= observed_diff) / len(perm_replicates)
print('P-value =', p_value)

#Computing a 95% confidence interval for the permutation replicates
conf_int = np.percentile(perm_replicates, [2.5, 97.5])
print('95% Confidence Interval:',conf_int)

Observed Difference: 0.03203285485506058
P-value = 3e-05
95% Confidence Interval: [-0.01560576  0.01560576]


The observed difference is way outside the 95% confidence interval and the p-value suggests there is less than a 1% probability we would see a difference this large if the callback rates for the two groups were truly the same. **The null hypothesis is rejected.** There is a statistically significant difference between the two groups or resumes, that should be the same other than the race/name randomly assigned to them.

**Frequentist Method**

In [124]:
#Chi Square Test
contingency_table = np.array([[np.sum(b),n_b],[np.sum(w),n_w]])

chi2, p_value, dof, expected = stats.chi2_contingency(contingency_table)

print('P-value =', p_value)

#Standard error formula: SEp1-p2 = sqrt [ p1(1-p1) / n1 + p2(1-p2) / n2 ]
standard_error = np.sqrt((p_w*(1-p_w)/n_w) + (p_b*(1-p_b)/n_b))

#For a 95% confidence interval, the z-score is 1.96
margin_of_error = standard_error*1.96

#95% confidence interval
conf_int = (observed_diff - margin_of_error, observed_diff - margin_of_error) 
print('95% Confidence Interval:',conf_int)

P-value = 0.0001855474718112937
95% Confidence Interval: (0.016777448506376254, 0.016777448506376254)


Again, the p-value suggests to reject the null hypothesis.

<div class="span5 alert alert-success">
<p> Question 4 and 5 </p>
</div>

In [119]:
perc_diff = (w.mean() - b.mean()) / b.mean()
print("Otherwise comparable resume's with white-sounding names were",round(perc_diff*100),"% more likely to get a callback than their counterparts with black-sounding names.")

Otherwise comparable resume's with white-sounding names were 50 % more likely to get a call back than their counterparts with black-sounding names.


Race/name is a statistically significant factor in call-back success. In this dataset, otherwise comparable resume's with white-sounding names were 50 % more likely to get a callback than their counterparts with black-sounding names. This illustrates how a person's name sounds and/or their race can impact a person's over-all job market success.

Although race/name is an important factor in callback success, it is not necessarily the most important factor in callback success. There could be other factors like gender or education that are more impactful. It would be interesting to revisit the data and attempt to see if any other factors have as large of an impact. 