# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
    Since the sample size large is fairly large, we can apply Z test. CLT is applied here because if we take enough
    samples from the same population, the proportions of call and race would be normally distributed.
    <br>
    The null hypothesis would be there is no difference in the call. The alternative hypothesis is the white sounding
    component recieved more calls. This is a one-tailed alternative hypothesis.
</div>

In [9]:
data.shape
data['race'].value_counts()

b    2435
w    2435
Name: race, dtype: int64

In [10]:
w = data[data.race=='w']
b = data[data.race=='b']

In [11]:
# Your solution to Q3 here

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

In [15]:
#Isolate race and call data
final_data=data.groupby(['race','call'])['call'].count()
final_data


race  call
b     0.0     2278
      1.0      157
w     0.0     2200
      1.0      235
Name: call, dtype: int64

In [20]:
#Calculate basic statistics required for the test of hypothesis
import numpy as np
b_prop=157/2435
w_prop=235/2435

#Estimate population proportion under null hypothesis
pop_prop=(157+235)/(len(data.index))
standard_error=np.math.sqrt(pop_prop*(1-pop_prop))*np.math.sqrt(1/2435+1/2435)

print('black_prop ', b_prop,'\nwhite_prop ', w_prop, '\npop_prop ',pop_prop, '\nstandard error ',standard_error)

black_prop  0.06447638603696099 
white_prop  0.09650924024640657 
pop_prop  0.08049281314168377 
standard error  0.007796894036170457


In [22]:
# use z test: one tail test: Ho=w_prop=b_prop, H1=w_prop>b_prop
from scipy.stats import norm
z=(w_prop-b_prop)/standard_error
p_value= format(1 - norm.cdf(4.11), '.8f')
print ('z value ',z, "\n value is ",p_value)
# we reject the null hypothesis that race has no impact on calls

z value  4.108412152434346 
 value is  0.00001978


In [42]:
#Bootstrap Method
w_pop=data[data['race']=='w']
b_pop=data[data['race']=='b']
hold_w_prop=[]
for i in range(1000):
    sample=np.random.choice(w_pop['call'],len(w_pop['call'])).mean()
    hold_w_prop.append(sample)
w_prop=np.array(hold_w_prop).mean()
hold_b_prop=[]
for i in range(1000):
    sample=np.random.choice(b_pop['call'],len(b_pop['call'])).mean()
    hold_b_prop.append(sample)
b_prop=np.array(hold_b_prop).mean()
#print ('white proportion ', w_prop, ' black proportion ', b_prop)
standard_error=np.math.sqrt(pop_prop*(1-pop_prop))*np.math.sqrt(1/2435+1/2435)

print('black_prop ', b_prop,'\nwhite_prop ', w_prop, '\npop_prop ',pop_prop, '\nstandard error ',standard_error)

black_prop  0.06449363 
white_prop  0.096349075 
pop_prop  0.08049281314168377 
standard error  0.007796894036170457


In [43]:
# use z test: one tail test: Ho=w_prop=b_prop, H1=w_prop>b_prop
from scipy.stats import norm
z=(w_prop-b_prop)/standard_error
p_value= format(1 - norm.cdf(4.11), '.8f')
print ('z value ',z, "\n value is ",p_value)
# we reject the null hypothesis that race has no impact on calls

z value  4.085658844751138 
 value is  0.00001978


In [25]:
# Calculations of margin of error and  confidence interval by frequentist approach
from scipy.stats import norm
pulled_standard_error=np.math.sqrt(w_prop*(1-w_prop)/2435+b_prop*(1-b_prop)/2435)
pulled_standard_error
#5% confidence interval
error_of_margin=pulled_standard_error*(norm.ppf(.975))
bounds=[w_prop-b_prop-error_of_margin, w_prop-b_prop+error_of_margin]
print ('error of margin ', error_of_margin, '\nlower bound and upper bound of CI:',bounds)
#Since the interval does not contain zero, there is a strong indication that the there is significant
#difference between the call rate between this group

error of margin  0.01525512602821483 
lower bound and upper bound of CI: [0.016777728181230755, 0.04728798023766041]


In [26]:
# Calculations of margin of error and  confidence interval by bootstrap approach
#Bootstrap Method
w_pop=data[data['race']=='w']
b_pop=data[data['race']=='b']
hold_w_prop=[]
for i in range(1000):
    sample=np.random.choice(w_pop['call'],len(w_pop['call'])).mean()
    hold_w_prop.append(sample)
w_prop=np.array(hold_w_prop).mean()
hold_b_prop=[]
for i in range(1000):
    sample=np.random.choice(b_pop['call'],len(b_pop['call'])).mean()
    hold_b_prop.append(sample)
b_prop=np.array(hold_b_prop).mean()
print ('white proportion ', w_prop, ' black proportion ', b_prop)

white proportion  0.09637535  black proportion  0.064320326


In [30]:
# Calculation of error of margin and confidence interval under bootstrap method
pulled_standard_error=np.math.sqrt(w_prop*(1-w_prop)/2435+b_prop*(1-b_prop)/2435)
#5% confidence interval
error_of_margin=pulled_standard_error*(norm.ppf(.975))
bounds=[w_prop-b_prop-error_of_margin, w_prop-b_prop+error_of_margin]
print ('error of margin ',error_of_margin,' lower and upper CI: ',bounds )
# Since the interval does not contain value zero, it suggests that mean call is significantly different.
# Also we can see that this is true for both frequentist as as bootstrap approach

error of margin  0.01524250309491544  lower and upper CI:  [0.01681252468800158, 0.047297530877832464]


In [31]:
# let's see any other variable has influence on the call. An influencial variable could be experience.
# We will consider 7 years or more as high experience (denote it by 1) and less than 7 years as low experience
# denote it by zero) and investigate its influence on call.
data['experience']=data['yearsexp'].apply(lambda x : 1 if x>6 else 0)
final_data=data.groupby(['experience','call'])['call'].count()
final_data

experience  call
0           0.0     2289
            1.0      163
1           0.0     2189
            1.0      229
Name: call, dtype: int64

In [35]:
# Basic statistics calculations
high_experience_prop=229/2435
low_experience_prop=163/2435
#Estimate population proportion under null hypothesis
pop_prop=(229+163)/(len(data.index))
standard_error=np.math.sqrt(pop_prop*(1-pop_prop))*np.math.sqrt(1/2452+1/2418)
print('high experience prop', high_experience_prop,'low experience prop ', low_experience_prop, '\npop prop ',pop_prop,
     "\nstandard error ", standard_error)


0.09404517453798768 0.06694045174537988
high experience prop 0.09404517453798768 low experience prop  0.06694045174537988 
pop prop  0.08049281314168377 
standard error  0.007797084059719233


In [38]:
# test information
# use z test: one tail test: Ho=no impact of experience, H1= high experience generates more calls
from scipy.stats import norm
z=(high_experience_prop-low_experience_prop)/standard_error
p_value= format(1 - norm.cdf(3.48), '.8f')
print ('test statistics ',z, ' p value ', p_value)
# we reject the null hypothesis
# But the p value with race is 0.00001978 (calculated earler), which is less than compared to experience. That suggests
# that the race is the most influencial variable

test statistics  3.476264021909726  p value  0.00025071


Conclusion: 
In this analysis, first we looked at whether race has any influence calls. We used both frequentist as
well as bootstrap approach. Both approaches clearly indicate that the race has a significant influence on calls.

Then we calculated confidence interval under both frequentist and bootstrap approach. Both approaches do not 
containthe value zero for the difference of proportions for race and which clearly indicates that there is an 
influence of race on call.

There may be other variables that influence the rate of call but they will be only confounding effect.

