# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [1]:
#Importing necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

Let's examine the data to understand if there's a difference between the number of callbacks for different races:

In [22]:
#Counting the number of callbacks per race
[sum(data[data.race=='w'].call), sum(data[data.race=='b'].call)]

[235.0, 157.0]

There is, indeed, a substantial difference in number of callbacks for different races. Let's answer the questions from the top of the page.

One of the solutions for this problem is a Chi-square test, which is a tool to quantify how the observed counts differ from the expected counts. In Chi-square test, large deviations from what would be expected based on sampling variation (chance) alone provide strong evidence for alternative hypothesis. It's also called a goodness of fit test, as it's evaluating how well the observed data fit the expected distribution.

Chi-square test can be used only if the below conditions are met:

- Sampled observations must be independent
- Random sampling/assignment should be used
- If sampling without replacement, <10% of data should be used
- Each case should contribute to only one cell in the proportion table
- Each particular scenario (i.e. cell) should have at least 5 expected cases

According to the description of the data, the race category was assigned randomly to the resumes. 

The null and alternate hypotheses:

$$H_0: p_B = p_W$$
$$H_A: p_B \ne p_W$$

CLT applies because we assume that the samples are representative of the population. The observations in each sample are assumed to be independent since the sample was drawn randomly.

The contingency table for the test will look as following:

In [25]:
cont_table = pd.crosstab(index=data.call, columns=data.race)
cont_table

race,b,w
call,Unnamed: 1_level_1,Unnamed: 2_level_1
0.0,2278,2200
1.0,157,235


We can see that the conditions for the Chi-square test stated above are met. Let's do the Chi-square test:

In [26]:
chi2, pval, _, _ = stats.chi2_contingency(cont_table)
print("Chi-squared test statistic: {}".format(chi2))
print("p-value: {}".format(pval))

Chi-squared test statistic: 16.44902858418937
p-value: 4.997578389963255e-05


Since the p-value is very small, we reject the null hypothesis that race and callback rate are independent.

### Two-sample z-test

We can also use a two-proportion z-test, since both categorical variables have two categories. Chi-square test can determine if there's association between the race and callback variables, it doesn't quantify the difference between the two categorical variables.

The formula for the two-sample $z$-test:

$$z = \frac{\left( \hat{p}_W - \hat{p}_B \right) - 0}{\sqrt{\hat{p} (1 - \hat{p)} \left( \frac{1}{n_W} + \frac{1}{n_B}\right)}}$$

where

$$\hat{p} = \frac{y_W + y_B}{n_W + n_B}$$

Let's do the two-sample z-test:

In [23]:
w = data[data.race=='w']
b = data[data.race=='b']

n_w = len(w)
n_b = len(b)

prop_w = np.sum(w.call) / len(w)
prop_b = np.sum(b.call) / len(b)

prop_diff = prop_w - prop_b
phat = (np.sum(w.call) + np.sum(b.call)) / (len(w) + len(b))

z = prop_diff / np.sqrt(phat * (1 - phat) * ((1 / n_w) + (1 / n_b)))
pval = stats.norm.cdf(-z) * 2
print("Z score: {}".format(z))
print("P-value: {}".format(pval))

Z score: 4.108412152434346
P-value: 3.983886837585077e-05


In [27]:
moe = 1.96 * np.sqrt(phat * (1 - phat) * ((1 / n_w) + (1 / n_b)))
ci = prop_diff + np.array([-1, 1]) * moe
print("Margin of Error: {}".format(moe))
print("Confidence interval: {}".format(ci))

Margin of Error: 0.015281912310894095
Confidence interval: [ 0.01675094  0.04731477]


Again, the p-value is very small, so we reject the null hypothesis that white and black sounding names have the same callback rate. Since 0 is not in the confidence interval, we reject the null hypothesis with the same conclusion.

### Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

The tests above show a difference in callback rates for different races, but the initial dataset has many other variables beyond race. There may be other variables contributing to the difference in callback rates, and other methods (like linear regression) should be used to check for existence of confounding variables.