# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

## Data processing

In [1]:
from __future__ import print_function

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [5]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [6]:
data.columns

Index(['id', 'ad', 'education', 'ofjobs', 'yearsexp', 'honors', 'volunteer',
       'military', 'empholes', 'occupspecific', 'occupbroad', 'workinschool',
       'email', 'computerskills', 'specialskills', 'firstname', 'sex', 'race',
       'h', 'l', 'call', 'city', 'kind', 'adid', 'fracblack', 'fracwhite',
       'lmedhhinc', 'fracdropout', 'fraccolp', 'linc', 'col', 'expminreq',
       'schoolreq', 'eoe', 'parent_sales', 'parent_emp', 'branch_sales',
       'branch_emp', 'fed', 'fracblack_empzip', 'fracwhite_empzip',
       'lmedhhinc_empzip', 'fracdropout_empzip', 'fraccolp_empzip',
       'linc_empzip', 'manager', 'supervisor', 'secretary', 'offsupport',
       'salesrep', 'retailsales', 'req', 'expreq', 'comreq', 'educreq',
       'compreq', 'orgreq', 'manuf', 'transcom', 'bankreal', 'trade',
       'busservice', 'othservice', 'missind', 'ownership'],
      dtype='object')

### 1. What test is appropriate for this problem? Does CLT apply?

The dependent variable is 'call', which reflects if the applicant got a call or not from the employer, so there are two possible states. We are interested in how many people got call backs or not, so we are interested in proportions. We will use a chi-squared test of equality of proportions. CLT does apply, since the applicant names were randomly assigned; it was a randomized experiment. also, the total number of samples times each proportion (black or white sounding names) is greater than 15.

### 2. What are the null and alternate hypotheses?

The null hypothesis is that the callback rate for black sounding names is the same as the callback rate for white sounding names. The alternative hypothesis is that the callback rate for black sounding names is not the same as the callback rate for white sounding names. We will calculate this with a significance level of 0.05.

So, what are the proportions of callbacks for black sounding and white sounding names?

In [8]:
data.groupby(['race','call']).size()

race  call
b     0.0     2278
      1.0      157
w     0.0     2200
      1.0      235
dtype: int64

<table>
<caption>Observed Frequencies</caption>
<tr><th> </th><th>Black sounding</th><th>White sounding</th><th>Total</th></tr>
<tr><th>no callback</th><td>2278</td><td>2200</td><td>4478</td></tr>
<tr><th>a callback</th><td>157</td><td>235</td><td>392</td></tr>
<tr><th>Total</th><td>2435</td><td>2435</td><td>4870</td></tr>
</table>

The expected joint frequencies can be calculated from the marginal distribution.


In [9]:
nocallratio = (2278+2200)/4870
acallratio = (157+235)/4870
print('ratio of no callbacks is {:.3f}'.format(nocallratio))
print('ratio of a callback is {:.3f}'.format(acallratio))
print('expected rate of no callbacks for black or white sounding names: {:.3f}'.format(nocallratio*2435))
print('expected rate of a callback for black or white sounding names: {:.3f}'.format(acallratio*2435))


ratio of no callbacks is 0.920
ratio of a callback is 0.080
expected rate of no callbacks for black or white sounding names: 2239.000
expected rate of a callback for black or white sounding names: 196.000


<table>
<caption>Expected Frequencies</caption>
<tr><th> </th><th>Black sounding</th><th>White sounding</th></tr>
<tr><th>no callback</th><td>2239</td><td>2239</td></tr>
<tr><th>a callback</th><td>196</td><td>196</td></tr>
</table>


In [10]:
#manual calculation of chi-squared
chisq_man = (2278-2239)**2/2239+(2200-2239)**2/2239+(157-196)**2/196+(235-196)**2/196
print('Manually calculated chisq: {:.3f}'.format(chisq_man))
#function from statsmodels
from statsmodels.stats.proportion import proportions_chisquare
chisq,pvalue,table = proportions_chisquare([157,235], [2435,2435])
print('chisq: {:.3f}'.format(chisq))
print('pvalue: {:.3e}'.format(pvalue))

Manually calculated chisq: 16.879
chisq: 16.879
pvalue: 3.984e-05


The pvalue is below 0.05, so we reject the null hypothesis, and the callback rate for black sounding names and white sounding names is not equal.

### 3. Compute margin of error, confidence interval, and p-value.

Margin of error is z-score for significance level of 0.05 times standard error. Confidence interval is expected proportion +/- margin of error.

In [11]:
z95 = 1.96
N = 4870
standard_error = np.sqrt(nocallratio*acallratio/N)
margin_error = z95*standard_error
print('Standard error: {:.3e}'.format(standard_error))
print('Margin of error: {:.3e}'.format(margin_error))

Standard error: 3.898e-03
Margin of error: 7.641e-03


In [12]:
print('Confidence interval: ({0:.3f}, {1:.3f})'.format(acallratio-margin_error,acallratio+margin_error))

Confidence interval: (0.073, 0.088)


In [13]:
print('Observed callback ratio for black sounding names: {:.3f}'.format(156/2435))
ztest = (156/2435-acallratio)/standard_error
print('ztest: {:.3f}'.format(ztest))

Observed callback ratio for black sounding names: 0.064
ztest: -4.214


The ztest value of -4.21 is in the rejection zone of -1.96. Similarly, the observed callback ratio for black sounding names is 0.064, which is outside of the confidence interval of (0.073, 0.088).

### 4. Write a story describing the statistical significance in the context or the original problem.

The researchers are probing if there is bias in responses to job applications just based on cultural associations with names. Since there were two high quality and two lower quality resumes sent to every employer, and the names were randomly chosen to be white or black sounding, if there was no statistical difference between names that sound black and those that sound white, they should have the same rate of response. Black sounding names did have fewer callbacks than white sounding names, at 157 callbacks and 235 callbacks respectively. The standard error scales inversely with the square root of the number of samples, so with 4870 total samples, the observed difference in the callbacks is statistically significant. 

### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

Race is not necessarily the most important factor in callback success - it is simply the factor that the researchers wanted to probe in relation to callbacks. If the question was which factors are the most important for callback success, then all the other columns listed in the data should be analyzed for statistical significance to callbacks. In this case I'd rather use machine learning to try to determine which factors are the most important.