# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [49]:
from IPython.core.interactiveshell import InteractiveShell
import pandas as pd
import numpy as np
from scipy import stats

InteractiveShell.ast_node_interactivity = "all" # show all outputs

data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

# number of callbacks for black-sounding names
print("Number of callbacks for black-sounding names: {0}\n".format(sum(data[data.race=='b'].call)))

data.head()

Number of callbacks for black-sounding names: 157.0



Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


__Q1.__ What test is appropriate for this problem? Does CLT apply?  
__A1.__ If we are only interested in the _race_ and _call_ columns, we can do a simple chi-squared test of independence. The Central Limit Theorem "applies" here in that these results should be generalizable to the population.

----

__Q2.__ What are the null and alternate hypotheses?  
__A2.__ The hypotheses are:
<ul><li>Ho: Race and callback are independent of each other
<li>Ha: Race and callback are dependent on each other</ul>

----
__Q3.__ Compute margin of error, confidence interval, and p-value. 

In [50]:
from scipy.stats import chi2_contingency

crosstab_2x2 = pd.crosstab(data['race'], data['call'], margins=True)
chi2 = chi2_contingency(crosstab_2x2)

print("2x2 table of observed data:")
crosstab_2x2
observed = pd.DataFrame([[2278, 157, 2435], [2200, 235, 2435], [4478, 392, 4870]],
                         columns=['0.0', '1.0', 'All'],
                         index=['b', 'w', 'All'])

print('')
print("2x2 table of expected data (under Ho):")
expected = pd.DataFrame(chi2[3],
                        columns=['0.0', '1.0', 'All'],
                        index=['b', 'w', 'All']
                       ).applymap(int)
expected


print("")
print("2x2 table of standardized residuals With 95% confidence, the absolute value should be < 2 if Ho holds in that cell. +/- indicate over/under representation in a cell:")
difference = observed - expected
SE = pd.DataFrame([
                    [(2239*(1-(2435/4870))*(1-(4478/4870)))**.5, (196*(1-(2435/4870))*(1-(392/4870)))**.5, 1],
                    [(2239*(1-(2435/4870))*(1-(4478/4870)))**.5, (196*(1-(2435/4870))*(1-(392/4870)))**.5, 1],
                    [1, 1, 1]
                  ], 
                  columns=['0.0', '1.0', 'All'],
                  index=['b', 'w', 'All'])
std_resid = (difference / SE)
std_resid

print("")
print("p-value for chi-squared test of independence (1 df): {0}".format(chi2[1]))

2x2 table of observed data:


call,0.0,1.0,All
race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
b,2278,157,2435
w,2200,235,2435
All,4478,392,4870



2x2 table of expected data (under Ho):


Unnamed: 0,0.0,1.0,All
b,2239,196,2435
w,2239,196,2435
All,4478,392,4870



2x2 table of standardized residuals With 95% confidence, the absolute value should be < 2 if Ho holds in that cell. +/- indicate over/under representation in a cell:


Unnamed: 0,0.0,1.0,All
b,4.108412,-4.108412,0.0
w,-4.108412,4.108412,0.0
All,0.0,0.0,0.0



p-value for chi-squared test of independence (1 df): 0.0020403793672093755


__A3.__ Because I used the chi-squared test of independence, there is no margin of error, nor confidence interval (we're not computing a mean statistic here). Instead, I have computed a table of standardized residuals, which show if a cell is over/under represented in the observed data, as compared to what you would expect if the variables were independent.

---

__Q4.__ Write a story describing the statistical significance in the context or the original problem.  
__A4.__ What's in a name? Researchers responded to resumes using "black-sounding" and "white-sounding" names to see how callback response rates differed. According to data from their 4,870-resume study, the callback rates _do_ indeed differ between black- and white-sounding names. White-sounding names received significantly more callbacks and black-sounding names received significantly less than what you would expect if names and the chance of getting called back were unrelated.

---

__Q5.__ Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?  
__A5.__ No, the test for independence does not indicate that race/name is the most important factor in callback success because we only looked at two columns and saw that they are dependent on each other. To identify the "most important factor" in this dataset related to callback success, we would need to run a logistic regression and do backward variable selection to find the column that most significantly affects callback success. Even then, we would only have this dataset to work with, so we would need to make generalizations carefully with that information.