# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [77]:
import pandas as pd
import numpy as np
from scipy import stats
from statistics import variance
from statsmodels.stats.proportion import proportions_ztest

In [78]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [79]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [80]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


1. What test is appropriate for this problem? Does CLT apply?

    - Appropriate test for this problem would be t test beacause we are comparing two groups here.
    - CLT can be applied as:
        - The sampled obervsations are independent
        - Random sampling is done.
        - Sample size is greater than 30

In [100]:
w = data[data.race=='w'].call
b = data[data.race=='b'].call

n = w + b

#Mean
bl_sample_mean = np.mean(b)
w_sample_mean = np.mean(w)

#Variance
print("Variance of white sounding names is % s " %(variance(w))) 
print("Variance of black sounding names is % s " %(variance(b))) 

var_b = (variance(b))
var_w = (variance(w))

#Lenght
bl_n = len(b)
w_n = len(w)

print('Total number of callbacks for white sounding names', w_n)
print('Total number of callbacks for black sounding names', bl_n)

Variance of white sounding names is 0.08723103062534693 
Variance of black sounding names is 0.06034396359580818 
Total number of callbacks for white sounding names 2435
Total number of callbacks for black sounding names 2435


In [98]:
# Proportion of success
success_b = sum(data[data.race=='b'].call)
success_w = sum(data[data.race=='w'].call)

freq_b = success_b/bl_n
freq_w = success_w/w_n

print('Callback rate for white sounding names', freq_w)
print('Callback rate for black sounding names', freq_b)

Callback rate for white sounding names 0.09650924024640657
Callback rate for black sounding names 0.06447638603696099


2. What are the null and alternate hypotheses?

    - Null hypothesis is H0- There is no significance difference in interview requests from employer for black and white candidates bl_smaple_mean - w_sample_mean =0
    - Alternative hypothesis is Ha - There is significance difference in interview requests from employer 
    bl_smaple_mean - w_sample_mean != 0

    - We will assume a 95% confidence level for this test and thus reject the null hypothesis if the test statistic > 1.96 standard deviations and has a large p-value.

In [97]:
# METHOD-1: Using t test method
race_ttest, race_pval = stats.ttest_ind(a=w, b=b, equal_var=False)
print('The T-Statistic is %d with a p-value of %s using t test' %(abs(race_ttest), race_pval))

# Method-2: The two-sample t-test for unpaired data:
diff_mean =  w_sample_mean - bl_sample_mean
print('Difference in mean is %s' %diff_mean)
x = (variance(w)/w_n) +  (variance(b)/bl_n)
sqrt = math.sqrt(x)

T = diff_mean/sqrt
print('Test Statistics is: %s' %T)

p = stats.t.sf(T,(bl_n+w_n-2))*2
print('The pvalue is:  %s' %p)

The T-Statistic is 4 with a p-value of 3.942941513645935e-05 using t test
Difference in mean is 0.03203285485506058
Test Statistics is: 4.114705349654057
The pvalue is:  3.940801102238752e-05


#### Tried both the methods and  both produce approx same results
As seen above the T-statistics is 4 and p values is very small. Null hypothesis is rejected.

In [92]:
# Confidence Interval
low_conf_int = diff_mean - 1.96*sqrt
high_conf_int = diff_mean + 1.96*sqrt
print('Confidence Intervals: ', low_conf_int, high_conf_int)

Confidence Intervals:  0.016774315013028555 0.0472913946970926


In [103]:
#standard error: t-test, two samples
std_err = np.sqrt(b.var()/bl_n + w.var()/w_n)
print("Standard Error:", std_err.round(4))

#degrees of freedom: t-test, two samples

DF = int((var_b/bl_n + var_w/w_n)**2 / (((var_b/bl_n)**2 / (bl_n-1)) + ((var_w/w_n)**2 / (w_n-1))))
print("Degrees of freedom:", DF)

#critical value: t-test, two samples
t_crit = stats.t.ppf(0.975,df=DF)

#margin of error:
margin = t_crit * std_err
print("Margin of error:",margin)

Standard Error: 0.0078
Degrees of freedom: 4711
Margin of error: 0.015262058161738945


4. Write a story describing the statistical significance in the context or the original problem.
    - Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?As per the hypothesis test, null hypothesis is rejected which means that there is significant difference in call back rate of white and black sounding names.

5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?
    - Race (white/black sounding names) can't be necessarily the only important factors in callback success.Multivariate analysis needs to be done to find out more about this. For example, education, year of exp might be the few variables which need to be considered for further analysis.


