# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='w'].call)

235.0

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


1) What test is appropriate for this problem? Does CLT apply?

<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough.

Central Limit Theorem assumes the following conditions:

i) Sample size should be "large enough". 

Since the data is large ( n > 30), this condition is met.

In [6]:
print(len(data))

4870


ii) np>= 10 and nq >= 10. The number of successes and failures of the samples should be greater than or equal to 10.

Success: There are 235 callbacks for white sounding names out of 2435 > 10. Similarly, there are 157 callbacks for black sounding names out of 2435 > 10.

Failure: There are 2200(2435 - 220) white sounding people who didn't receive a callback > 10. Similarly, there are 2278(2435 - 157) black sounding people who didn't receive a callback > 10.

Therefore, this condition is met.

In [11]:
w = data[data.race=='w']
b = data[data.race=='b']

In [12]:
w_len = len(data[data.race=='w'])
b_len = len(data[data.race=='b'])
print("Number of white-sounding names: " + str(w_len))
print("Number of black-sounding names: " + str(b_len))

Number of white-sounding names: 2435
Number of black-sounding names: 2435


In [15]:
w_calls = w[w.call==1]
b_calls = b[b.call==1]
w_calls_len = len(w_calls)
b_calls_len = len(b_calls)
print("Number of callbacks for white-sounding names: " + str(w_calls_len))
print("Number of callbacks for black-sounding names: " + str(b_calls_len))

Number of calls for white-sounding names: 235
Number of calls for black-sounding names: 157


iii) Independence. Sample size should be less than or equal to 10% of the population size.  We have seen this already in (i), hence this condition is also met.


iv) Randomization. The problem statement clearly states that the races are assigned randomly to the resumes when presented to the employer.

Hence CLT is applicable in this scenario and it is appropriate to use a significance test.

2) What are the null and alternate hypotheses?

Null Hypothesis: Proportion of white callbacks is equal to proportion of black callbacks.

Alternate Hypothesis: Proportion of white callbacks is not equal to proportion of black callbacks.

3) Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.


Lets try using Bootstrapping two sample hypothesis test:

In [65]:
alpha = 0.05
empirical_diff = np.mean(w.call) - np.mean(b.call)

def bootstrap_sample(data,func):
    return func(np.random.choice(data,size=len(data)))


def bootstrap_replicate(data,func,size=1):
    
    bs_replicate = np.empty(size)
    
    for i in range(size):
        bs_replicate[i] = bootstrap_sample(data,func)
        
    return bs_replicate

shifted_white = w.call - np.mean(w.call) + np.mean(data.call)
shifted_black = b.call - np.mean(b.call) + np.mean(data.call)


w_bs_replicate = bootstrap_replicate(shifted_white,np.mean,10000)
b_bs_replicate = bootstrap_replicate(shifted_black,np.mean,10000)

bs_replicate_diff = w_bs_replicate - b_bs_replicate

p = np.sum(bs_replicate_diff >= empirical_diff)/len(bs_replicate_diff)

if p <= alpha:
    print("We can reject the null hypothesis with p-value:" + str(p))
else:
    print("We cannot reject the null hypothesis with p-value: " + str(p))

We can reject the null hypothesis with p-value:0.0


In [73]:
#confidence interval for white sounding names callbacks
w_bs = bootstrap_replicate(w.call,np.sum,10000)
w_conf_int = np.percentile(w_bs,[2.5,97.5])
print("The number of white sounding names callbacks lies between " + str(w_conf_int[0]) + " and " + str(w_conf_int[1]) + " with 95% confidence.")

The number of white sounding names callbacks lies between 207.0 and 263.0 with 95% confidence.


In [70]:
b_bs = bootstrap_replicate(b.call,np.sum,10000)
b_conf_int = np.percentile(b_bs,[2.5,97.5])
print("The number of black sounding names callbacks lies between " + str(b_conf_int[0]) + " and " + str(b_conf_int[1]) + " with 95% confidence.")

The number of black sounding names callbacks lies between 134.0 and 181.0 with 95% confidence.


The formula to calculate margin of error = (z-value* std.dev) /sqrt(n)

In [40]:
w_z = stats.norm.ppf(1-(alpha/2))
w_mer = w_z * np.std(w.call)/np.sqrt(len(w.call))
print(w_mer)

0.011728427811859931


In [41]:
b_z = stats.norm.ppf(1-(alpha/2))
b_mer = b_z * np.std(b.call)/np.sqrt(len(b.call))
print(b_mer)

0.009754954131940304


Lets try the frequentist approach now.

In [46]:
w_call_prop = (w_calls_len/w_len)
b_call_prop = (b_calls_len/b_len)
print("Proportion of white callbacks(P1): %.2f" % w_call_prop)
print("Proportion of black callbacks(P2): %.2f" % b_call_prop)

Proportion of white callbacks(P1): 0.10
Proportion of black callbacks(P2): 0.06


Since the inference conditions are met and sample size is large enough (n > 30), two sample Z-test will be performed.

z statistic for two sample is calculated as P1 - P2/sigma1 - sigma2

sigma = sqrt(Pc * (1-Pc) * (1/n1 + 1/n2))
where Pc is combined proportion; n1 and n2 are totals of each proportion.

In [48]:
Pc = (w_calls_len + b_calls_len)/(w_len + b_len)
print("Combined sample proportion is " + str(Pc))

Combined sample proportion is 0.08049281314168377


In [49]:
StandardError = np.sqrt(Pc * (1- Pc) * (1/w_len + 1/b_len))
print("Standard Error : " + str(StandardError))

Standard Error : 0.007796894036170457


In [56]:
z_statistic = (w_call_prop - b_call_prop)/StandardError

print("Z-Statistic is: %.3f" % z_statistic)

Z-Statistic is: 4.108


In [60]:
p = stats.norm.sf(abs(z_statistic))

print("p-value is : %.5f" % p )

if p <= alpha:
    print("We can reject the null hypothesis.")
else:
    print("We cannot reject the null hypothesis.")

p-value is : 0.00002
We can reject the null hypothesis.


4) Write a story describing the statistical significance in the context or the original problem.

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

The proportion of applicants with white-sounding names that get callbacks is higher than applicants with black-sounding names. From the results ran from 10,000 trials , it is clearly evident that there is significant difference though we have assumed no difference between the two groups.Hence there is evidence of discrimination in the hiring process. 

Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Though the problem statement clearly states that race/name was assigned randomly on the resumes when presented to the employer, our analysis shows evidence that there is discrimination in the hiring process based on race. Considering other factors like honors, experience might help determine if race/name was the only differentiating process.