# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution
</div>
****

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
df = pd.io.stata.read_stata('us_job_market_discrimination.dta')

In [5]:
# number of callbacks for black-sounding names
sum(df[df.race=='b'].call)

157.0

In [9]:
pd.set_option('display.max_columns', None)
df.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,occupbroad,workinschool,email,computerskills,specialskills,firstname,sex,race,h,l,call,city,kind,adid,fracblack,fracwhite,lmedhhinc,fracdropout,fraccolp,linc,col,expminreq,schoolreq,eoe,parent_sales,parent_emp,branch_sales,branch_emp,fed,fracblack_empzip,fracwhite_empzip,lmedhhinc_empzip,fracdropout_empzip,fraccolp_empzip,linc_empzip,manager,supervisor,secretary,offsupport,salesrep,retailsales,req,expreq,comreq,educreq,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,1,0,0,1,0,Allison,f,w,0.0,1.0,0.0,c,a,384.0,0.98936,0.0055,9.527484,0.274151,0.037662,8.706325,1.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,6,1,1,1,0,Kristen,f,w,1.0,0.0,0.0,c,a,384.0,0.080736,0.888374,10.408828,0.233687,0.087285,9.532859,0.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,1,1,0,1,0,Lakisha,f,b,0.0,1.0,0.0,c,a,384.0,0.104301,0.83737,10.466754,0.101335,0.591695,10.540329,1.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,5,0,1,1,1,Latonya,f,b,1.0,0.0,0.0,c,a,384.0,0.336165,0.63737,10.431908,0.108848,0.406576,10.412141,0.0,5,,1.0,,,,,,,,,,,,0.0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,5,1,1,1,0,Carrie,f,w,1.0,0.0,0.0,c,a,385.0,0.397595,0.180196,9.876219,0.312873,0.030847,8.728264,0.0,some,,1.0,9.4,143.0,9.4,143.0,0.0,0.204764,0.727046,10.619399,0.070493,0.369903,10.007352,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


### 1.) What test is appropriate for this problem? Does the CLT apply?

Firstly, the field named call can be defined as a bernoulli random variable which can take on the value of 1 or 0 with a probability of p and 1-p respectively. The probability distribution of the number call backs is binomaially distributed because a binomial distribution is a sum of n independent, identically distributed Bernoulli variables with parameter p. 

The probability distribution of the number of call backs is a normal distribution, meaning the central limit theorem applies, when the following three conditions are met: the observations of are independent which we can assume and If X ~ B(n, p) and if n is large and/or p is close to ½, then X is approximately N(np, npq).  The central limit thoerem can apply because even if the origin variables are not normally distributed because the normalized sum of successful callbacks can tend towards a normal distribution.  


The fact that the probability distribution of the number of call backs is a binomial distribution is the basis of a hypothesis test, a "two proportion z-test", for the value of p using x/n, the sample proportion as the estimator of p, in a common test statistic.  

The null hypothesis (H0) is that the test proportions, p and 1-p, are equal 
The alternative hypoethesis (H1) is that the test proportions, p and 1-p, are not equal.  







### 2.) What are the null and alternative hypotheses?

The null hypothesis is that the proportion of call backs for black sounding names is equal to the proportion of call backs for white sounding names.

The alternative hypothesis is tha the proportion of call backs for black sounding names is not equal to the proportion of call backs for white sounding names.  

### 3.) Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

#### Bootsrap Method

The boostrap hypothesis test is used when the goal is to create a hypothesis test by taking many samples with replacement of the population.  

The null hypothesis is that the proportion of callbacks for black sounding names is equal to the proportion of call backs for white sounding names.  A bootstrap hypothesis test generates many random samples of a shifted black call back dataset where the mean black callback proportion is shifted to zero. This shift is acheived by subtracting each black callback value by the mean of the  black callback population.  This shift to zero was similarly done to the white callback data where the mean white callback proportion is shifted to zero.  10,000 samples of randomly selected observations were taken from the white and black callback groups and each sample had a sample size of 10,000. The difference in mean of each pair of black and white callback samples is checked to see if it is less than or equal to the the empirical difference in mean of population. The proportion of the 10,000 samples that are less than or equal to the mean of the population signify if we can reject the null hypothesis.

The p-value was 0.0.

In [65]:
df_w = df[df.race == 'w']['call']
df_b = df[df.race == 'b']['call']


def diff_of_means(data_1, data_2):
    """Difference in means of two arrays."""

    # The difference of means of data_1, data_2: diff
    diff = np.mean(data_1)-np.mean(data_2)

    return diff

def bootstrap_replicate_1d(data, func):
    return func(np.random.choice(data, size=len(data)))

def draw_bs_reps(data, func, size=10000):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func)

    return bs_replicates

empirical_diff_means=diff_of_means(df_w, df_b)
# Generate shifted arrays
df_w_shifted = df_w - np.mean(df_w) 
df_b_shifted = df_b - np.mean(df_b) 
# Compute 10,000 bootstrap replicates from shifted arrays
bs_replicates_w = draw_bs_reps(df_w_shifted, np.mean, size=10000)
bs_replicates_b = draw_bs_reps(df_b_shifted, np.mean, size=10000)
bs_replicates = bs_replicates_w-bs_replicates_b
p = np.sum(bs_replicates>=empirical_diff_means) / len(bs_replicates)
print('p-value =', p)

p-value = 0.0


A small p-value means that the null hypothesis can be rejected.

In [59]:
conf_int = np.percentile(bs_replicates, [2.5, 97.5])
print('Bootstrapping Approach confidence interval:', conf_int)

Bootstrapping Approach confidence interval: [-0.01437372  0.01519507]


####  Frequentist Method

A frequentist statistical test is used to check the proportion of values in the population that are not centered around the mean of population by a certain signifance level but around the hypothesized mean of 98.6 degrees.  A test statistic such as a z-value is used in the frequentist method instead of many samples as used in the boostrap method. A proportion z-test is used because the values for mean of the call back rate are proportions, in addition, the number of observations for each group is sufficiently large (n>=30), and the observations are independent. 

A two sample z-test should be used because there are two group present and we are checking if the difference in means of the two groups is statistically significant. 

In [84]:
#The z-value represents the margin of error as well as the confidence interval
z_value=1.96

#White sounding names' mean and standard deviation
m_w=sum(df_w)/len(df_w)
w_std=np.sqrt((m_w * (1 - m_w) / len(df_w)))

#Black sounding names' mean and standard deviation
m_b=sum(df_b)/len(df_b)
b_std=np.sqrt((m_b * (1 - m_b) / len(df_b)))

#Calculate the Margin of Error
std_err = np.sqrt((w_std** 2 + b_std ** 2))
margin_err_diff = z_value * std_err
print('Frequentist Statistical Test Margin of Error:', margin_err_diff)

diff_of_mean=m_w-m_b
conf_int = (diff_of_mean - margin_err_diff, diff_of_mean + margin_err_diff)
print('Frequentist method Confidence Interval:', conf_int)

#To calculate a Z-score, we need the overall population proportion of callbacks
p_overall = (sum(df_w) + sum(df_b)) / (len(df_w) + len(df_b))


Z_score = (m_w - m_b - 0) / (np.sqrt( (p_overall * (1 - p_overall)) * ( 1 / len(df_w) + 1 / len(df_b))))

print('Z-score:', Z_score)



Frequentist Statistical Test Margin of Error: 0.015255406349886438
Frequentist method Confidence Interval: (0.016777447859559147, 0.047288260559332024)
Z-score: 4.108412152434346


The confidence level is 95% which has a corresponding z-value of 1.96.  Since the Z-score from the two-sample proportion z-test was greater than 1.96, the null hypothesis can be rejected.

### 4.) Write a story describing the statistical significance in the context or the original problem.

The original question was whether race has significant impact on the rate of callbacks for a resume. In order to figure out this question, a hypothesis test was set up to check if the difference of means in callback rate for black sounding names on resumes verse the difference of means in callback rate for white sounding names was stastically significant.  Two types of hypothesis tests were used, the bootstrap method and the frequentist method.  The p-value for the bootstrap method was 0.0.  This means 0% of the sampled applications had a mean difference between white and black sounding names that was greater than the emprirical mean. This means that the null hypothesis was reject and it was found that race does impact rate of callbacks for a resume.  In addition, the test statistic found with the frequentist method has a corresponding p-value less than 0.05 which means the null hypothesis can also be rejected using this test.   
   
It can be concluded that since the difference in means of white and black sounding names are statistically significant, race impacts resume callbackrate.

The analysis done maintains the story that race plays a factor in the United States job market.  

### 5.)  Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?


Even though the analysis done shows that race impacts callback success, it does not tell us whether race is the most important factor in callback success.  It is possible that other factors not tested are more important than race in terms of determining callback success.  In order to test whether other factors that are more important, the callback rate can be calculated for each value in a field and each callback rate for each value can be substracted from the maximum callback rate for a value in the field.  The larger the difference, the more important this field can be in callback success.  The difference between white and black sounding names only had an approximately 3% difference in callback rate.  There maybe other fields with values in them that cause greater differences in callback rates.  

