# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
import math

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks by race
sum(data[data.race=='b'].call),sum(data[data.race=='w'].call) #identifies the number of call backs by race

(157.0, 235.0)

In [8]:
len(data[data.race=='b']),len(data[data.race=='w']) #ascertains that there are an equal number of applicants in the pool

(2435, 2435)

For this dataset, I used a chi-test, as these are categorical variables. It gave me a means to determine the degree to which the different outcomes are a result or random chance, or if some sort of structural racism is in place. 
* Null Hypothesis--Individuals with African American sounding names and individuals with white sounding names receive callbacks at the same rate
* Alternate Hypothesis--Individuals with African American sounding names and individuals with white sounding names do not receive callbacks at the same rate.
* The expected values I used were based on the number of total callbacks, and assumed a 50/50 rate of callback success. I plugged the actual and the expected into the chisquared method. 

In [14]:
actual=np.array([157, 235])
expected=np.array([196,196])
stats.chisquare(actual, expected)

Power_divergenceResult(statistic=15.520408163265307, pvalue=8.1619303597043846e-05)

This test had 1 degree of freedom and this chi squared stat was highly significant. Indeed, the p-value was less than 0.005. 

I next calculated the confidence interval at 95% confidence and the margin of error. I calculated it as a proportion of the whole. 

In [25]:
afamp=157/392
a=afamp+1.96*(math.sqrt((afamp*(1-afamp))/392))
b=afamp-1.96*(math.sqrt((afamp*(1-afamp))/392))
print("The Confidence Interval = " + str(b)+", " +str(a))
moe=a-b
print("The Margin of Error = "+ str(moe))

The Confidence Interval = 0.3520024990340879, 0.44901790912917733
The Margin of Error = 0.0970154100950894


# Conclusions

* This exercise suggests that one factor that can influence hiring is if a name sounds African American or white. This functions as one cognitive bias that employers exercise (consciously or unconsciously). One potential fix is to screen resumes with the names anonymized, removing this aspect of cognitive bias from the system until later on in the hiring process.
* This exercise does not control for class, educational status, qualifications, previous salary, regional racism, etc., and thus cannot be used uncritically. Further analysis would actually control for these things, and ask, all other factors being equal, are African American sounding applicants less likely to be hired? Of course, systemic racism makes it more difficult to control for these factors, as these are prestige goods that are monopolized by white Americans. 
* Further, the actual amount of successful applicants was quite low in this data. There were only 392 successful applicants out of 4870 total applicants, so it is difficult to know how representative this data is. Maybe it is applicants for a highly seletive position? 
