# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [9]:
data = pd.io.stata.read_stata("C:/Project_Files/racial_disc/us_job_market_discrimination.dta")

In [82]:
# number of callbacks for black-sounding names
print("total number of black-sounding resumes",sum(data['race']=='b'))
print("number of the callbacks for black-sounding names",sum(data['call'][data.race=='b'])) 

# number of callbacks for white-sounding names
print("total number of white-sounding resumes",sum(data['race']=='w'))
print("number of the callbacks for white-sounding names",sum(data['call'][data.race=='w']))


total number of black-sounding resumes 2435
number of the callbacks for black-sounding names 157.0
total number of white-sounding resumes 2435
number of the callbacks for white-sounding names 235.0


### 1. What test is appropriate for this problem? Does CLT apply?
       The main goal of this analysis is to determine the impact of race in resume call backs between black and white sounding names.we can use Z test for comparing the proportion of call backs between black and white sounding names.The sample size is large enough to meet the central limit theorem and also the distribution of proportion difference betweeen callback has normal distribution.

### 2. What are the null and alternate hypotheses?
    Null hypothesis is prove that there is no difference in call backs between black and white sounding names.This can be done by stating that there is no difference in proprtion of call backs between black and white sounding names.
    Alternative hyporthesis is to check if the call back proportion of blacks is lesser than white sounding names.
       

### 3. Compute margin of error, confidence interval, and p-value.
    

In [97]:
#compute the Standard error from the proportions
black_resumes = sum(data['race']=='b')
black_call_backs = sum(data['call'][data.race=='b'])
black_call_backs_proportion = black_call_backs/black_resumes
black_nocall_proportion = (1 - black_call_backs_proportion)


white_resumes = sum(data['race']=='w')
white_call_backs = sum(data['call'][data.race=='w'])
white_call_backs_proportion = white_call_backs/white_resumes
white_nocall_proportion = (1 - white_call_backs_proportion)

se = np.sqrt(((black_call_backs_proportion * black_nocall_proportion)/black_resumes) + 
             ((white_call_backs_proportion * white_nocall_proportion)/white_resumes))

print("standard error",se)

# compute margin of error based on 95 % confidence level 
margin_of_error = stats.norm.isf(0.025)
confidence_interval = [black_call_backs_proportion - white_call_backs_proportion - margin_of_error*se, 
                       black_call_backs_proportion - white_call_backs_proportion + margin_of_error*se ]
print("95% confidence interval: ({:2.3f}, {:2.3f})".format(confidence_interval[0],confidence_interval[1]))

# calculate the p value  
z_stat = (black_call_backs_proportion - white_call_backs_proportion) / se
p_value = stats.norm.cdf(z_stat)
print('p-value for differences in proprotion of call backs b/w black and white sounding names is:{:1.5f}'.format(p_value))

standard error 0.00778337058668
95% confidence interval: (-0.047, -0.017)
p-value for differences in proprotion of call backs b/w black and white sounding names is:0.00002


### 4.Write a story describing the statistical significance in the context or the original problem.
      The null hypothesis can be rejected based on the significance level of .05 , which means there is a significant difference in the number call backs recieved by the blacks and whites. As we can see from the data that whites recieve 50 % more calls than white. The confidence interval shows a clear difference where in the white have more call backs proportion in the lower and upper bound. 

### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?
       The basis of analysis is presumed on  white and black sounding names , there could be several other factors determining the call backs for an ad.We need to find the correlation of other factors on resume call backs to have a better understanding of racial discrimination from employers. 