# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [18]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [6]:
data.id.count()

4870

## 1. What test is appropriate for this problem? Does CLT apply?

The appropriate test for this problem is the Chi-Squared test because we are dealing with categorical variables (race and call backs) and we need to prove whether or not there is a relationship between them. To explore their relationship, crosstab (or contingency table) can be used with the $X^2$ test to show significance.

The Central Limit Theorem is not applicable in this situation because the variables we are dealing with are categorical, with a binomial distribution (either 'w' or 'b' for race and either '0' or '1' for callback). However, since the sample size is so large (n = 4870), we could use CLT to assume normal distribution and apply other approaches we will see in Q3 when we try bootstrapping and frequentist statistical approaches.

## 2. What are the null and alternate hypotheses?

Null: Race and call backs are not related (independent of one another).

Alternate: Race and call backs are related (dependent of one another).

Let's first apply the crosstab and $X^2$ test.

In [7]:
# Crosstab using race and call backs
cont_table = pd.crosstab(data.race, data.call)
cont_table.columns = ['No','Yes']
cont_table.index = ['Black','White']
cont_table

Unnamed: 0,No,Yes
Black,2278,157
White,2200,235


In [9]:
# Compute Chi Square and get p value
chi2, p, dof, ex = stats.chi2_contingency(cont_table)

print("chi2 =", chi2)
print("p =",p)

chi2 = 16.4490285842
p = 4.99757838996e-05


With $X^2$ very high (16.45) and p < 0.05 by a huge margin, we can reject the null hypothesis, which means that there is an association between race and call backs. Looking at the crosstab, unfortunately, black-sounding names are called back less frequently, but we do not know by how much exactly through a $X^2$ test. A $Z$ test will need to be used to get the difference, which we will explore in the next questions.

## 3a. Compute margin of error, confidence interval, and p-value. 

Critical values are used rather than confidence intervals for $X^2$ test since the distribution is not normal. P-value was already calculated above (p = 5e-05).

The margin of error, confidence interval, and p-values will be calculated when using both the bootstrapping and the frequentist statistical approaches in Q4. 

In [11]:
# calculate the critical value 95% confidence
chi2_crit = stats.chi2.ppf(q = 0.95, df = dof)
print("chi2_crit =", chi2_crit)

chi2_crit = 3.84145882069


As chi2 > chi2_crit (16.5 > 3.8), this information further supports rejecting the null hypothesis.

## 3b-4. Try using both the bootstrapping and the frequentist statistical approaches. Write a story describing the statistical significance in the context or the original problem.

CLT will apply here to assume normal distribution as the sample size is large and both bootstrapping and $Z$ test are suitable to approach this problem.

Null Hypothesis: There is __no difference__ in the mean callback rates between black-sounding and white-sounding names.

Alternative Hypothesis: There __is a difference__ in the mean callback rates between black-sounding and white-sounding names.

In [34]:
w = data[data.race=='w']
b = data[data.race=='b']

In [38]:
# First, we will try Bootstrapping
# define a bootstrap function
def bootstrap_replicate(data, func): 
    return func(np.random.choice(data, len(data)))

# then draw replicates
def draw_bs_reps(data, func, size = 1):
    rep = np.empty(size)
    
    for i in range(size):
        rep[i] = bootstrap_replicate(data, func)
        
    return rep

In [39]:
# Compute mean of pooled data: mean_count
mean_count = np.mean(np.concatenate((w.call, b.call)))

# Generate shifted data sets since
w_shifted = w.call - np.mean(w.call) + mean_count
b_shifted = b.call - np.mean(b.call) + mean_count

# Take 10,000 bootstrap replicates of the mean: bs_replicates
bs_replicates_w = draw_bs_reps(w_shifted, np.mean, size=10000)
bs_replicates_b = draw_bs_reps(b_shifted, np.mean, size=10000)

bs_replicates = bs_replicates_w - bs_replicates_b
print('bs diff of means =', bs_replicates)

empirical_diff_means = np.mean(w.call) - np.mean(b.call)
print('empirical diff of means =', empirical_diff_means)

# Compute and print p-value
p = np.sum(bs_replicates >= empirical_diff_means) / len(bs_replicates)
print('p =', p)

# Compute 95% confidence interval: conf_int
conf_int = np.percentile(bs_replicates, [2.5, 97.5])
print('CI =', conf_int)

bs diff of means = [ 0.0123203  -0.01806984 -0.00041071 ..., -0.00246409  0.00041066
 -0.00616019]
empirical diff of means = 0.03203285485506058
p = 0.0
CI = [-0.0151951   0.01560573]


The null hypothesis is rejected as there is a difference in mean call backs between white sounding and black sounding names (0.032), and p value is less than 0.05, which further supports the initial findings of the crosstab and $X^2$ test. 

Next, $Z$ test is used as a frequentist approach.

In [42]:
# First, determine standard error of mean
sem = np.sqrt(np.std(b.call)**2/len(b.call) + np.std(w.call)**2/len(w.call))
print('sem =', sem)

# Find Margin of Error
z_value = 1.96 # this is the value of z for 95% confidence

moe = z_value * sem
print('moe =', moe)

# Find Confidence Interval
f_CI = np.array([empirical_diff_means - moe, empirical_diff_means + moe])
print('95% Confidence Interval', f_CI)

# Calculates z value
z = empirical_diff_means / sem
print('Critical z-score = ', z)

# Calculate and save p-value:
p = stats.norm.sf(abs(z))
print('p =', p)

sem = 0.00778330835992
moe = 0.0152552843854
95% Confidence Interval [ 0.01677757  0.04728814]
Critical z-score =  4.11558342208
p = 1.93100640099e-05


The resulting $z$ score with a p value less than 0.05 also rejects the null hypothesis, supporting the initial conclusion that there is a difference between mean call back rates between white-sounding and black-sounding name. 

## 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

This analysis by no means show that race/name is the most important factor in callback success. We see an association, and proved the difference to be statistically significant, with white-sounding names receiving about 3% more callbacks than black-sounding names, but this is considering that all other factors stay equal. The low rate of callbacks even with race seem to suggest that some other factors like education or years of experience may be more important. 

To amend the analysis, a linear regression or a logistical regression would be used to measure impact size of different factors in callback success.