# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
from __future__ import division

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
print 'Number of callbacks for black-sounding names:', sum(data[data.race=='b'].call)

Number of callbacks for black-sounding names: 157.0


In [4]:
# number of callbacks for white-sounding names
print 'Number of callbacks for white-sounding names:', sum(data[data.race=='w'].call)

Number of callbacks for white-sounding names: 235.0


In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [6]:
data.shape

(4870, 65)

<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

## 1) What test is appropriate for this problem? Does CLT apply?

** In this dataset, the CLT applies, because there are far more than 30 observations in the dataset. A bootstrap hypothesis test would be appropriate for this problem, because the goal is to compare the means between two datasets separated by percieved race. **

## 2) What are the null and alternate hypotheses?

** Null hypothesis: There is no statistically significant difference between the callback rates of applicants with white-sounding names and black-sounding names. **

**Alternative hypothesis: There is a statistically significant difference between the callback rates of applicants with white-sounding names and black-sounding names.**


In [7]:
w = data[data.race=='w']
b = data[data.race=='b']

In [8]:
w.shape

(2435, 65)

In [9]:
b.shape

(2435, 65)

In [10]:
sum(w.call)

235.0

In [11]:
b.call = b[['call']].astype(int)
w.call = w[['call']].astype(int)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value


In [12]:
w_call = w.call
b_call = b.call

In [13]:
print 'Callback rate for applicants with white-sounding names:', 100*(sum(w_call)/len(w)),'%'
print 'Callback rate for applicants with black-sounding names:', 100*(sum(b_call)/len(b)),'%'

Callback rate for applicants with white-sounding names: 9.65092402464 %
Callback rate for applicants with black-sounding names: 6.4476386037 %


# 3) Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.

In [14]:
# Your solution to Q3 here

In [15]:
#Define bootstrap replicate function
def bootstrap_replicate_1d(data, func):
    return func(np.random.choice(data, size=len(data)))
def draw_bs_reps(data, func, size=1):
    """Draw bootstrap replicates."""

    # Initialize array of replicates: bs_replicates
    bs_replicates = np.empty(shape=size)

    # Generate replicates
    for i in range(size):
        bs_replicates[i] = bootstrap_replicate_1d(data, func)

    return bs_replicates

In [16]:
#Define function to calculate callback rate of a dataset
def callback(data):
    frac = np.sum(data) / len(data)
    return frac

In [17]:
#Calculate callback rate for the bootstrap replicates of white-sounding applicants
bs_replicates = draw_bs_reps(w_call, callback, size = 10000)

In [18]:
#Calculate p-value of bootstrap replicate
p = np.sum(bs_replicates <= sum(b_call)/len(b)) / len(bs_replicates)
print 'p-value of bootstrap test =', p

p-value of bootstrap test = 0.0


**With a p-value of 0, not a single bootstrap replicate of the set for white-sounding applicants had a lower callback rate than the original set of black-sounding applicants. The result of this test strongly suggests a statistically significant difference between the callback rates of applicants with white-sounding and black-sounding names.**

In [19]:
bootstrap_conf_int = np.percentile(bs_replicates, [2.5, 97.5])
print 'Bootstrap 95% confidence interval =', 100*bootstrap_conf_int, '% callback rate'
print 'Margin of error = +/-', 100*(bootstrap_conf_int[1] - bootstrap_conf_int[0])/2, '%'

Bootstrap 95% confidence interval = [  8.50102669  10.84188912] % callback rate
Margin of error = +/- 1.1704312115 %


**According to the 95% confidence interval from the bootstrap method for the white-sounding applicants, any callback rate for white-souding applicants that falls below 8.50% or above 10.84% can be considered abnormal. The callback rate for black-souunding applicants in the original set (6.45%) falls below this interval. The margin of error, which is derived from dividing the range of the confidence interval by 2, is +/- 1.17%. ** 

In [20]:
#Frequentist

In [21]:
stat, p = stats.ttest_ind(b_call, w_call)
print 'Two-sample t-test result:' 
print 'Test statistic = %.3f, p-value = %.3f' % (stat, p) 
print ''
alpha = 0.05
if p > alpha:
    print('No statistically significant difference between the callback rates can be found')
    print('The null hypothesis should not be rejected.')
else:
    print('A statistically significant difference between the callback rates appears to exist.')
    print('Therefore, the null hypothesis can be rejected.')

Two-sample t-test result:
Test statistic = -4.115, p-value = 0.000

A statistically significant difference between the callback rates appears to exist.
Therefore, the null hypothesis can be rejected.


In [22]:
freq_conf_int = list(stats.t.interval(alpha = 0.95, df= 2434, 
                                      loc = 100*np.mean(w_call), scale = 100*(np.std(w_call)/np.sqrt(len(w_call)))))
print 'Frequentist: 95% confidence interval =', freq_conf_int, '% callback rate'
print 'Margin of error = +/-', (freq_conf_int[1] - freq_conf_int[0])/2, '%'

Frequentist: 95% confidence interval = [8.4774839134489408, 10.824364135832374] % callback rate
Margin of error = +/- 1.17344011119 %


**The two-sample t-test also resulted in a p-value of 0, which strongly suggested a rejection of the null hypothesis. The 95% frequentist confidence interval was similar to the bootstrap interval, with a normal range of callback rates for white-sounding applicants from 8.48% to 10.82%. The margin of error for this test is also +/- 1.17%. **

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

# 4) Write a story describing the statistical significance in the context of the original problem.

**Racial discrimination has been a major problem in American society for hundreds of years, and still continues to this day. A study with 4870 fake resumes, split evenly between applicants with white-sounding names and black-sounding names, was done to compare the callback rates of job candidates based on what their race was percieved to be. Aside from the names and racial differences, both groups had identical resumes. **

**The applicants with white-sounding names had a higher callback rate than the applicants with black-sounding names. The key question of this problem is whether these differences in callback rates are statistically significant. The null hypothesis assumed that the callback rates in the job market that was studied were not different based on the percieved race of the applicants. However, several statistical tests on the dataset were done, and they indicated that the null hypothesis is highly unlikely to be true, and it could be rejected. The results of the tests support the alternative hypothesis, which suggests that the overall job market that was studied has racial discrimination. **

# 5) Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

**No, this analysis did not contain any evaluation of other variables aside from race or name, so no conclusion can be drawn about how important race is in comparison to other possible variables. To evaluate whether race is more important than other factors, additional variables can be evaluated in a similar way that race was evaluated, and perhaps a regression analysis can determine which variables are the most effective predictors of callback success. **