# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [4]:
import pandas as pd
import numpy as np
from scipy import stats

In [5]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [6]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [7]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [8]:
len(data[data.race=='b']), len(data[data.race=='w'])

(2435, 2435)

## Summary Statistics of Callbacks with respect to Race

A better way to visualise the data above would be through pivot table. The results are shown below.

**Note:**

In this case, since we're dealing with only 2 categories for each type of data (Failure/Success and Black/White, it makes sense to label these data manually. However, in the case that the categories are significantly large, we'll have to employ for loops to construct the Total row and column.

In [114]:
aggfunc = {
    'id': 'count',
}

callback_df = data.pivot_table('id', ['call'], 'race', aggfunc=aggfunc)
callback_df.index = ['Failure', 'Success']
callback_df.columns = ['Black', 'White']

callback_df.loc['Total'] = [ callback_df['Black'].sum(), callback_df['White'].sum()]
callback_df['Total'] = callback_df['Black'] + callback_df['White']

callback_df

Unnamed: 0,Black,White,Total
Failure,2278,2200,4478
Success,157,235,392
Total,2435,2435,4870


In [115]:
data.head()

Unnamed: 0_level_0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
Callback,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


## Introduction

We need to find if there is a correlation between the race of a person and the likelihood of getting a callback. We have 2435 samples each for white sounding and black sounding names.

We proceed to calculate the number of callbacks for each group. At first glance, we discover that people with white sounding names get significantly more callbacks than people with black sounding names. 

The aim of this analysis is to find out if the difference is indeed significant or if the names and callbacks are statistically independent.

## Type of Test

Firstly, it must be stated that the Central Limit Theorem applies to this problem. This is because the sample sizes for boith groups are much greater than 30 and the observations are independent of any other observation made. In other words, no observation affects any other observation in our given dataframe.

The names are qualitative in nature; they are either black or white. Similarly, the callbacks are also qualitative in nature; they are either a success or a failure. Therefore, we are trying to check for correlation between two qualitative features.

Since we're trying to find out if there is a correlation between qualitative names and callbacks, we will be applying the **Chi Square Test for Independece.**

A lookup table can be constructed from the callback_df above to visualise the data better.

In [116]:
callback_df

Unnamed: 0,Black,White,Total
Failure,2278,2200,4478
Success,157,235,392
Total,2435,2435,4870


## Null and Alternate Hypothesis

To do hypothesis testing, we define the following:

* **Null Hypothesis:** There is no relation between names and callbacks
* **Alternate Hypothesis:** There is a relationship between names and callbacks

We are going to assume that the null hypothesis is true. Also, the significance level $\alpha$ is assumed to be 10% or 0.1.

Let p0 denote the probability of a failed callback and p1 denote the probability of a successful callback. The first step of our testing is to calculate these quantities.

In [124]:
total = callback_df.loc['Total', 'Total']
group_total = callback_df.loc['Total', 'Black']

In [125]:
#Likelihood of getting and not getting a callback for the entire sample
p1 = callback_df.loc['Success', 'Total']/total
p0 = callback_df.loc['Failure', 'Total']/total

p0, p1

(0.91950718685831623, 0.080492813141683772)

In [126]:
estimate_array = np.array([[p1, p1], [p0, p0]])
group_totals = np.array([[group_total], [group_total]]) #It is the same for both races in this case

estimate_array = np.array(estimate_array * group_totals)

estimate_array

array([[  196.,   196.],
       [ 2239.,  2239.]])

Excluding the Total row and column, there are 2 rows and 2 columns in our Callback DataFrame. Therefore, the **degree of freedom is 1.**

In [127]:
#actual_array = np.array([], [])
success_array = np.array(callback_df.loc['Success', ['Black', 'White']])
failure_array = np.array(callback_df.loc['Failure', ['Black', 'White']])

actual_array = np.array([success_array, failure_array])
actual_array

array([[ 157,  235],
       [2278, 2200]])

In [128]:
chi2 = (estimate_array - actual_array) ** 2 / estimate_array
chi2 = np.sum(chi2)

chi2

16.879050414270225

In [129]:
df = 1
p = 1 - stats.chi2.cdf(chi2, df)
p

3.9838868375885461e-05

The p-value obtained is extremely small as compared to the threshold $\alpha$ of 0.1. This implies that we have to reject the null hypothesis. In other words, **there is a clear correlation between the race of a candidate and the success in getting a callback based on resume.**

Since the significance level, $\alpha$ is 10%, this directly implies that the confidence level is 90%.

## Significance of the Difference (Z-Test)

Based on the chi square significance test performed above, it seems pretty clear that there is a correlation between race and successful callbacks.

One other statistic that we can compute is if the difference in success is statistically significant or not. If it is, then we can unanimously conclude that there is racial discrimimation taking place in the industry.

In [134]:
white_mean = callback_df.loc['Success', 'White']/callback_df.loc['Total', 'White']
white_std = callback_df.loc['Success', 'White'] * ((1 - white_mean) ** 2) + callback_df.loc['Failure', 'White'] * ((0 - white_mean) ** 2)
white_std = np.sqrt(white_std/group_total)

white_mean, white_std

(0.096509240246406572, 0.29528834517039093)

In [135]:
black_mean = callback_df.loc['Success', 'Black']/callback_df.loc['Total', 'Black']
black_std = callback_df.loc['Success', 'Black'] * ((1 - black_mean) ** 2) + callback_df.loc['Failure', 'Black'] * ((0 - black_mean) ** 2)
black_std = np.sqrt(black_std/group_total)

black_mean, black_std

(0.064476386036960986, 0.24559963697158382)

### Hypothesis Testing

The following hypotheses are defined:

* **Null Hypothesis**: There is no difference in mean of callbacks for blacks and whites.
* **Alternate Hypothesis**: There is a difference in mean of callbacks for blacks and whites

Assuming $\alpha$ to be 0.1 and the null hypothesis to be true.

In [136]:
h0_mean = 0

mean_diff = white_mean - black_mean
sigma_diff = np.sqrt((white_std**2)/group_total  + (black_std**2)/group_total)
mean_diff, sigma_diff

(0.032032854209445585, 0.0077833705866767544)

In [137]:
z = (mean_diff - h0_mean) / sigma_diff
z

4.1155504357300003

In [138]:
p = (1-stats.norm.cdf(z))*2
p

3.862565207524149e-05

In [139]:
error = z * np.sqrt((mean_diff * (1-mean_diff))/group_total)
error

0.01468610309481066

The p-value obtained is much lower than the significance level and almost identical to the p-value obtained in the chi square significance test. Therefore, we can safely reject the null hypothesis. This test further strengthens our initial claim of a correlation existing between race and callbacks.

The margin of error is 1.4%.

## Conclusion and Final Remarks

1. There is a clear correlation between the race and the callback success of a particular person. This could imply that there is active discrimination taking place in the industry based on race.
2. However, we cannot conclude that name is the most important factor for callback success. While a correlation has been established, this does not directly imply causation. Other parameters such as education and work experience may also have a role to play and the relationship between all these variables have not been established to arrive at a definitive conclusion.

A possible amendment to this analysis is to check if names and callback success correlate to some third variable (such as work experience, education or age). If the correlation there is strong, we can offer an alternative hypothesis that both these variables are being influenced by a third variable and hence is correlated but not causated.