# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [38]:
import pandas as pd
import numpy as np
from scipy import stats

In [39]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

**Examine the data**

In [40]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [41]:
#Dataframe of variables relevant for analysis: race and call

df = data[['race', 'call']]
df.head()

Unnamed: 0,race,call
0,w,0.0
1,w,0.0
2,b,0.0
3,b,0.0
4,w,0.0


In [42]:
#Separate dataframes by race
df_white = df[df.race=='w']
print(w)

df_black = df[df.race=='b']
print(b)

#2435 entries for each race

     race  call
0       w   0.0
1       w   0.0
4       w   0.0
5       w   0.0
6       w   0.0
11      w   0.0
13      w   0.0
15      w   0.0
16      w   0.0
18      w   0.0
21      w   0.0
22      w   0.0
23      w   0.0
24      w   0.0
27      w   0.0
30      w   0.0
32      w   0.0
34      w   0.0
35      w   0.0
39      w   0.0
40      w   0.0
42      w   0.0
43      w   0.0
46      w   0.0
48      w   0.0
51      w   0.0
52      w   0.0
54      w   0.0
56      w   0.0
58      w   0.0
...   ...   ...
4811    w   0.0
4813    w   0.0
4814    w   0.0
4816    w   0.0
4817    w   0.0
4819    w   0.0
4822    w   0.0
4824    w   0.0
4826    w   1.0
4829    w   1.0
4830    w   0.0
4831    w   0.0
4834    w   0.0
4836    w   0.0
4838    w   0.0
4839    w   0.0
4843    w   1.0
4845    w   0.0
4846    w   1.0
4847    w   1.0
4851    w   0.0
4852    w   0.0
4854    w   0.0
4855    w   0.0
4860    w   0.0
4861    w   1.0
4862    w   0.0
4863    w   0.0
4867    w   0.0
4869    w   0.0

[2435 r

In [43]:
#number of callbacks for black-sounding names
blk_callback = sum(data[data.race=='b'].call)

# number of callbacks for white-sounding names
white_callback = sum(data[data.race=='w'].call)

print('Proportion of callbacks among blacks', blk_callback/2435)
print('Proportion of callbacks among whites', white_callback/2435)


Proportion of callbacks among blacks 0.064476386037
Proportion of callbacks among whites 0.0965092402464


## 1. What test is approriate for this problem? Does the CLT apply?

Each resume callback is a random Bernoulli variable which takes one of two values, 1 (receive call) or 0 (no call). The number of successful calls in a sequence of n independent resumes, which are randomly assigned black or white-sounding names, follows a binomal distribution.

The Central Limit Theorem applies even to binomial populations provided that the minimum of np and n(1-p) is at least 5, where n refers to the sample size and p is the probability of "success" on any given trial. 

#Question: In this case would I look at total population (n=4870) or do it among the races with proportions of callback I got above. For example, 2435*0.064 for black-sounding and 2435*0.096 for white-sounding.

A two-proportion z-test is approriate for this problem as we are comparing difference between two proportions and the sample size (n= 4870) is greater than 30.  


#Question: Z-test ot T-test?

## 2. What are the null and alternate hypotheses?

**Null hypothesis**: The percentage of callbacks is the same for both black-sounding names and white-sounding names.

**Alternate hypothesis**: The percentage of callbacks for black-sounding names and white-sounding names differs.

## Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

**Bootstrap approach**

In [44]:
def permutation_sample(data_1, data_2):
    data = np.concatenate((data_1, data_2))
    
    #Permutate data
    permuted_data = np.random.permutation(data)
    
    perm_sample_1 = permuted_data[:len(data_1)]
    perm_sample_2 = permuted_data[len(data_1):]
    
    return perm_sample_1, perm_sample_2


def draw_perm_reps(data_1, data_2, func, size = 1):
    #Generate multiple permutation replicats.
    
    #Initialize array of replicates: perm_replicates
    perm_replicates = np.empty(size)

    for i in range(size):
        #Generate permutation sample.
        perm_sample_1, perm_sample_2 = permutation_sample(data_1, data_2)
        
        #Compute test statistic(func)
        perm_replicates[i] = func(perm_sample_1, perm_sample_2)

    return perm_replicates


def diff_of_proportion(data_1, data_2):
    proportion_1 = np.sum(data_1) / len(data_1)
    proportion_2 = np.sum(data_2) / len(data_2)
    
    return proportion_1 - proportion_2


In [45]:
#Calculate empirical difference of proportions
empirical_diff_proportion = diff_of_proportion(df_white['call'], df_black['call'])

#Create permutations of data
perm_replicates = draw_perm_reps(df_white['call'], df_black['call'], diff_of_proportion, size = 1000)

#Calculate p-value
p = np.sum(perm_replicates >= empirical_diff_proportion) / len(perm_replicates)

print('Bootstrap approach p-value:', p)

Bootstrap approach p-value: 0.0


In [46]:
conf_interval = np.percentile(perm_replicates, [2.5, 97.5])
print('Bootstrap approach confidence interval:', conf_interval)

#Not sure if this correct?

Bootstrap approach confidence interval: [-0.01478439  0.01478439]


**Frequentist Test** - two proportion z-test

In [51]:
# Calculate margin of error, confidence interval, and p-value

#Z value for 95% confidence interval
z_value = 1.96



#Proportions of callbacks for black and white-sounding names
prop_blk = sum(df[df.race=='b'].call)/len(df[df.race=='b'].call)

prop_white = sum(df[df.race=='w'].call)/len(df[df.race=='b'].call)



#Standard error difference between proportions
#formula : SEp1-p2 = sqrt [ p1(1-p1) / n1 + p2(1-p2) / n2 ]

std_err_diff= np.sqrt((prop_blk * (1- prop_blk)/ (len(df_black)))
                          + (prop_white * (1- prop_white)/ (len(df_white))))                        


#Margin of error
margin_of_err = z_value * std_err_diff

print('Margin of Error:', margin_of_err)




#Confidence interval
p_diff = prop_white - prop_blk

conf_int = (p_diff - margin_of_err, p_diff + margin_of_err)

print('Confidence interval:', conf_int)

Margin of Error: 0.0152554063499
Confidence interval: (0.016777447859559147, 0.047288260559332024)


In [71]:
# Calculate Z score

#Overall sample proportion of successful callbacks
prop_success = sum(df.call) / len(df.call)
print(prop_success)

#Formula for z-score: https://www.socscistatistics.com/tests/ztest/

z_score = ((prop_white - prop_blk) - 0) / np.sqrt((prop_success * (1 - prop_success)) * ((1/ len(df_black)) + (1/len(df_white))))
        
print('Z-score:', z_score)

#Calculate p-value

p_value = stats.norm.sf(abs(z_score))*2
print('P-value:', p_value)

# As the absolute value of the Z score is larger than 1.96,  reject the null hypothesis.

0.0804928131417
Z-score: 4.10841215243
P-value: 3.98388683759e-05


Statsmodels also has library for two proportion Z-Test

In [70]:
import statsmodels.api as sm

z_score, p_value = sm.stats.proportions_ztest([sum(df_white.call), sum(df_black.call)], [len(df_white), len(df_black)])
print('Z-score:', z_score)
print('p-value:', p_value)

  from pandas.core import datetools


Z-score: 4.10841215243
p-value: 3.98388683759e-05


<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

## 4. Write a story describing the statistical significance in the context or the original problem.

Since the p-value is less than 0.05, we can reject the null hypothesis.
There appears to be significant difference in the proportion of callbacks for resumes with black-sounding names
and for resumes with white-sounding names.

## 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

This analysis suggests that race/name is an important factor in callback success. However, it does not necessarily indicate that it is the most important decider on whether an employer calls an applicant or not. For one, the analysis only examines race/name, and there may be correlations between other features and callback success. A few features that come to find are gender, location of employer, and type of industry. For instance, males may be more successful in the hiring process for certain jobs than females and vice-versa, regardless of race. Further analyses need to control for such features in order to determine if race/name is the most important factor in callback success.