
### Examining racial discrimination in the US job market

#### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

#### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes.

#### Exercise
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Discuss statistical significance.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [3]:
import pandas as pd
import numpy as np
from scipy import stats

In [4]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')
len(data)

4870

In [5]:
# number of callbacks for balck-sounding names
B_callback = sum(data[data.race=='b'].call)
B_total = len(data[data.race=='b'])
print 'callback', B_callback
print 'resumes sent', B_total

callback 157.0
resumes sent 2435


In [6]:
W_callback = sum(data[data.race=='w'].call)
W_total = len(data[data.race=='w'])
print 'callback', W_callback
print W_total

callback 235.0
2435


In [28]:
#P1 = Black, P2 = White
P1 = B_callback/B_total
P2 = W_callback/W_total
alpha = .05
P_value = P2-P1
P_est_value = (W_callback+B_callback)/len(data)
Sample_dist = (((2*P_est_value)*(1-P_est_value))/len(data))**(1.0/2.0) #sample stnd dev
sigma = 0
Z_score = (P_value-0)/Sample_dist

Critical_Z = 1.96
CLM = Sample_dist/len(data)
M_error = 1.65*(Sample_dist/np.sqrt(len(data)))

print P_value, P_est_value, Sample_dist, Z_score, M_error
print "Centeral Limit Theorem:", CLM
print 'Margin of error:', M_error
print 'Confidence Interval: ', 95,'%'
print 'Alpha value:', 5
print 'P-Value', P_est_value

0.0320328542094 0.0804928131417 0.00551323664517 5.81017218579 0.00013035452089
Centeral Limit Theorem: 1.13208144665e-06
Margin of error: 0.00013035452089
Confidence Interval:  95 %
Alpha value: 5
P-Value 0.0804928131417


* 1) CLM, does apply to the current study.
* 2) The Null hypothesis is that there is 95% chance is within .078 of the two 
difference in  from
the callbacks, alternatly, there is a significan difference in the two. 
* 3)Margin of error:
    Confidence interval:
    P-Value
* 4)Statistical Difference, if the null hypothesis of there is no difference is ture when the p value is greater or equal to the alpha value. In this case 5.8 is greater than 5 and therefore showing that the null hypothesis is correct in that the name of the applicant does not matter when looking for a job. 