# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [18]:
import pandas as pd
import numpy as np
from scipy import stats
from beakerx import *

In [19]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [20]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [21]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [22]:
data.head()

In [23]:
len(data[data.race=='b']) + len(data[data.race=='w'])

4870

## 1. What test is appropriate for this problem? Does CLT apply?


### ANSWER:


__Two-sample z-test__ can be applied because our goal is to determine if two population means are equal.


__CLT applies__ because our sample size is pretty __large (n = 4870)__ and observations are __independent.__

## 2. What are the null and alternate hypotheses?

## ANSWER:

__Null Hypothesis (__$H_0$__):__ The callback rate for blacks $p_b$ equals the callback rate for whites $p_w$ ($p_w$ = $p_b$)

__Alternative Hypothesis (__$H_A$__):__ There is a significant (non-zero) difference between callback rate for black & white-sounding names ($p_w$ != $p_b$)

## 3. Compute margin of error, confidence interval, and p-value.

In [34]:
from scipy import stats

w = data[data.race=='w']
b = data[data.race=='b']

n_w = len(w)
n_b = len(b)

prob_w = np.sum(w.call) / len(w)
prob_b = np.sum(b.call) / len(b)

print ('probablity of receiving calls for whites : ', prob_w)
print ('probablity of receiving calls for blacks : ', prob_b)

probablity of receiving calls for whites :  0.09650924024640657
probablity of receiving calls for blacks :  0.06447638603696099


In [35]:
prob_diff = prob_w - prop_b
phat = (np.sum(w.call) + np.sum(b.call)) / (len(w) + len(b))

z = prob_diff / np.sqrt(phat * (1 - phat) * ((1 / n_w) + (1 / n_b)))
pval = stats.norm.cdf(-z) * 2
print("Z score: {}".format(z))
print("P-value: {}".format(pval))

Z score: 4.108412152434346
P-value: 3.983886837585077e-05


In [36]:
moe = 1.96 * np.sqrt(phat * (1 - phat) * ((1 / n_w) + (1 / n_b)))
ci = prob_diff + np.array([-1, 1]) * moe
print("Margin of Error: {}".format(moe))
print("Confidence interval: {}".format(ci))

Margin of Error: 0.015281912310894095
Confidence interval: [0.01675094 0.04731477]


p_value can be assumed to be close enough to __0__ and less than 0.05. __So the null hypothesis can be rejected.__ The populations do not have the same call back rate.


## 4. Write a story describing the statistical significance in the context or the original problem.


Our analysis does not necassarily mean that skin color is the most important factor in callback success.

In [37]:
cont_table = pd.crosstab(index=data.call, columns=data.race)
chi2, pval, _, _ = stats.chi2_contingency(cont_table)
print("Chi-squared test statistic: {}".format(chi2))
print("p-value: {}".format(pval))

Chi-squared test statistic: 16.44902858418937
p-value: 4.997578389963255e-05


__Chi-square__ test also yields the same result.

## 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?




Although the test shows that significant difference exist in callback rate, there are may be __other factors/ variables__ such as education and experience that may contribute to this differece. 