# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution



<div class="span5 alert alert-success">
<p>$H_{0}$ : $p_{W}$ - $p_{B}$ = 0

$H_{A}$ : $p_{W}$ - $p_{B}$ > 0 


$H_{0}$ : Proportion of callbacks to candidates with white sounding names is equal to proportion of callbacks to candidates with a white names, assuming $\alpha$ =0.05

$H_{A}$ : Proportion of callbacks to candidates with white sounding names is bigger than proportion of callbacks to candidates with a white names, assuming $\alpha$ =0.05

</p>
</div>



In [80]:
import pandas as pd
import numpy as np
from scipy import stats
import math

In [81]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [82]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [83]:
#2 sample proportion z test 

In [84]:
#black 
data_b = data[data.race=='b'].call
n_b = len(data[data.race=='b'])
n_b_call = sum(data[data.race=='b'].call)
p_b = n_b_call/n_b
sigma_b = p_b * (1-p_b)

In [85]:
#white
data_w = data[data.race=='w'].call
n_w = len(data[data.race=='w'])
n_w_call = sum(data[data.race=='w'].call)
p_w = n_w_call/n_w
sigma_w = p_w * (1-p_w)

In [86]:
sigma =  math.sqrt((p_b * (1-p_b))/n_b +(p_w * (1-p_w))/n_w)
sigma

0.0077833705866767544

In [94]:
import scipy.stats as st
sigma*st.norm.ppf(.95)

0.012802505339402668

In [88]:
p = (p_b * n_b + p_w * n_w) / (n_w + n_b)

In [89]:
SE = math.sqrt( p * ( 1 - p ) * ( (1/n_w) + (1/n_b) ) )

In [90]:
z = (p_w - p_b) / SE
z

4.108412152434346

In [91]:
2*(1-st.norm.cdf(z))

3.983886837577444e-05

Since the P-value (~0.0) is less than the significance level (0.05) null hypothesis is rejected.


<div class="span5 alert alert-success">
<p> Call back rate in this sample certainly suggests that the race of the person is an important factor in callback rate, however we can not concluded that it is THE MOST IMPORTANT factor, there are a lot more factors in this data set (and in general) that could influence hiring managers decision to call back, it is possible that other factors affecting the callback proporton, it is also possible that it is one of other independent variables(for example experience, or education) that are affected by the race of the applicant and then in turn that particular ind var affecting call back rate</p>
</div>