# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical resumes to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [27]:
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats import weightstats as ws # to apply z-test

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


### 1) What test is appropriate for this problem? Does Central Limit Theorem (CLT) apply?

In [62]:
w = data[data.race=='w']['call']
b = data[data.race=='b']['call']

In [66]:
print('Number of resumes with black-sounding names: {}'.format(len(b)) )
print('Number of resumes with white-sounding names: {}'.format(len(w)) )

Number of resumes with black-sounding names: 2435
Number of resumes with white-sounding names: 2435


In [73]:
print('Ratio of callbacks for black-sounding names: {:.4f}'.format(sum(b)/len(b)) )
print('Ratio of callbacks for white-sounding names: {:.4f}'.format(sum(w)/len(w)) )

Ratio of callbacks for black-sounding names: 0.0645
Ratio of callbacks for white-sounding names: 0.0965


N > 30 for both white-sounding applicants and black-sounding applicants implies a large number of samples, and since we can assume that the callback of one person does not affect the callback of the next, the data is independent and therefore the Central Limit Theorem holds. Let's use a permutation test to see if the sounding of a name makes a difference in whether or not the applicant gets a callback.

### 2) What are the null and alternate hypotheses?

The null hypothesis for this problem is that the sounding of the name has no bearing on the callback. The alternate hypothesis is that the callback is affected by the sounding of the name.

### 3) Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

#### Bootstrap approach:
First let's try the bootstrap approach using a permutation test for calculating the p-value, 95% confidence interval, and margin of error.

In [95]:
def permutation_sample(data1, data2):
    """Generate a permutation sample from two data sets."""

    # Concatenate the data sets: data
    data = np.concatenate((data1, data2))

    # Permute the concatenated array: permuted_data
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two: perm_sample_1, perm_sample_2
    perm_sample_1 = permuted_data[:len(data1)]
    perm_sample_2 = permuted_data[len(data1):]

    return perm_sample_1, perm_sample_2

def draw_perm_reps(data_1, data_2, func, size=1):
    """Generate multiple permutation replicates."""

    # Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation sample
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates

In [98]:
def frac_black_cbs(w, b):
    """Compute fraction of black-sounding names that recieved a callback."""
    frac = np.sum(b) / len(b)
    return frac


# Acquire permutation samples: perm_replicates
perm_reps = draw_perm_reps(w, b, frac_black_cbs, 10000)

# Compute and print p-value: p
bs_p = np.sum(perm_reps <= sum(b)/len(b)) / len(perm_reps)
print('permutation test p-value = {}'.format(bs_p))

# Compute and print 95% confidence interval and margin of error
bs_conf_int = np.percentile(perm_reps, [2.5, 97.5])
bs_moe = (bs_conf_int[1] - bs_conf_int[0])/2
print('95% CI (black-sounding names): [{:.4f}, {:.4f}]'.format(conf_int[0],conf_int[1]))
print('Margin of Error: {:.4f}'.format(bs_moe))

permutation test p-value = 0.0001
95% CI (black-sounding names): [0.0731, 0.0883]
Margin of Error: 0.0078


Let's also calculate the confidence interval for the callback ratio of resumes with white-sounding names.

In [100]:
def frac_white_cbs(w, b):
    """Compute fraction of black-sounding names that recieved a callback."""
    frac = np.sum(w) / len(w)
    return frac

perm_reps_w = draw_perm_reps(w, b, frac_white_cbs, 10000)


bs_conf_int_w = np.percentile(perm_reps_w, [2.5, 97.5])
bs_moe_w = (bs_conf_int_w[1] - bs_conf_int_w[0])/2


print('95% CI (white-sounding names): [{:.4f}, {:.4f}]'.format(bs_conf_int_w[0], bs_conf_int_w[1]))
print('Margin of Error: {:.4f}'.format(bs_moe_w))

95% CI (white-sounding names): [0.0727, 0.0883]
Margin of Error: 0.0078


#### Frequentist approach:

First we'll try a two-sample frequentist statistical t-test.

In [74]:
twosample_results = stats.ttest_ind(w, b)
tt_p = onesample_results[1]
print('t-test p-value: {}'.format(tt_p))

t-test p-value: 3.940802103128886e-05


and secondly, a two-sample frequentist statistical z-test.

In [78]:
zt_p = ws.ztest(x1=w, x2=b)[1]
print('z-test p-value: {}'.format(zt_p))

z-test p-value: 3.8767429116085706e-05


Now let's calculate the 95% confidence interval and margin of error for the difference in the proportions of resumes with white-sounding names that get a callback and resumes with black-sounding names that get a callback.

For large random samples, an (approximate) (1−α)100% confidence interval for p1−p2, the difference in two population proportions, is:


\begin{align}
(\hat{p}_1-\hat{p}_2)\pm z_{\alpha/2} \sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}}
\end{align}

After plugging in the proportions p1 and p2 with their corresponding number of samples n1 and n2, and the appropriate z-score for an alpha level of 0.05, we have:

In [81]:
p1 = sum(w)/len(w) # proportion of white sounding applicants that get a callback
p2 = sum(b)/len(b) # proportion of black sounding applicants that get a callback

f_conf_int = (p1-p2 - 1.96*np.sqrt(p1*(1-p1)/len(w) + p2*(1-p2)/len(b)), p1-p2 + 1.96*np.sqrt(p1*(1-p1)/len(w) + p2*(1-p2)/len(b)))
print('95% CI: [{:.4f}, {:.4f}]'.format(f_conf_int[0],f_conf_int[1]))
f_moe = (f_conf_int[1] - f_conf_int[0])/2
print('Margin of Error: {:.4f}'.format(f_moe))

95% CI: [0.0168, 0.0473]
Margin of Error: 0.0153


### 4) Write a story describing the statistical significance in the context or the original problem.

All three tests, permutation, t-test, and z-test show a p-value << 0.05 implying that we can reject the null hypothesis and the callback is affected by whether the name on the resume is white-sounding or black-sounding.

The 95% confidence interval on the permutation test shows that resumes with black-sounding names are between 7.3% and 8.8% likely to get a callback.

In addition, the frequentist statistical approach of calculating the confidence interval shows that we can say with 95% confidence that resumes with white sounding names are between 1.7% and 4.7% more likely to receive a callback from a potential employer than those with black-sounding names.

### 5) Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Although this analysis has shown that that race/name is an important factor in callback success, it does not show that it is the most important factor in callback success. No other columns/features have been analyzed in this analysis, so if we wanted to find the most important factor in the data given we would want to run the same analysis on all features/columns in the data set.