# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats
from scipy.stats import binom

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

### Methodology

It is not clear what each column in the data frame precisely represents from the instructions or from the resources.  The website says two high and two low quality resumes were sent to each job posting, but does not saying whether the equal quality ones were identical in all but name.  This point is further complicated by the fact that it says names were randomly assigned, which would seem to imply that, i.e., the two high quality resumes for a single job are only split between a white and black sounding name roughly half the time.

Due to these issues, for this analysis all factors except for `race` and `call` will be ignored/assumed equal.

In [3]:
# Reduce to data of interest:
df=data[['race','call']]

# Split by race:
dfb=df[df.race=='b']
dfw=df[df.race=='w']

### Initial Statistics
First look at the summary statistics for each race.

In [4]:
# Compute black summary statistics:
b_total=len(dfb)
b_sum=np.sum(dfb.call)
b_mean=np.mean(dfb.call)
b_std=np.std(dfb.call)

# Print black summary statistics:
print('black\n' + 'total: ' + str(b_total) + '\n' + 
      'calls: ' + str(b_sum) + '\n' + 
      'rate: ' + str(b_mean) + '\n' +
      'std: ' + str(b_std) + '\n')

# Compute white summary statistics:
w_total=len(dfw)
w_sum=np.sum(dfw.call)
w_mean=np.mean(dfw.call)
w_std=np.std(dfw.call)

# Print white summary statistics:
print('white\n' + 'total: ' + str(w_total) + '\n' + 
      'calls: ' + str(w_sum) + '\n' + 
      'rate: ' + str(w_mean) + '\n' +
      'std: ' + str(w_std))

black
total: 2435
calls: 157.0
rate: 0.0644763857126236
std: 0.24559901654720306

white
total: 2435
calls: 235.0
rate: 0.09650924056768417
std: 0.29528486728668213


At first glance, it appears that resumes with white names do fare better than resumes with black names.

### Bernoulli Trials

As a side note, this data can be understood as two Binomial distributions where each applicant is a single Bernoulli trial.  That is, for instance the probability that white resume will receive a callback is `p=.097`.

Suppose that the chance of a black resume receiving a call is actually equal to the sample rate of a white resume receiving a call, namely `p=.097`.  We can calculate the probability of at most 157  out of 2435 resumes receiving a call using the cumulative distribution function for the binomial distribution.

In [5]:
binom.cdf(b_sum,b_total,w_mean)

9.788967433838575e-09

Similarly, suppose the chance of a white resume receiving a call is actually equal to the sample rate of a black resume receiving a call, namely `p=.064`.  We can calculate the probability of at least 235 out of 2435 resumes receiving a call.

In [6]:
1-binom.cdf(w_sum-1,w_total,b_mean)

1.0036038666783043e-09

Both of these chances are extremely close to 0.

### Exercises

**1) What test is appropriate for this problem?  Does CLT apply?**

A two-samples *t* test is most appropriate for determining whether mean call back rates for black and white names are different.  Using the *t* test requires the data to be normal.  For large `n`, `np`, and `nq`, the bionimial distribution is well-approximated by a normal distribution.  These conditions are all satisfied in this case, so it is okay to use the *t* test.

The CLT simply says that the means computed in the summary statistics come from a normal distribution whose mean equals the population mean and variance equal the population variance divided by `n=2435`, which is a plenty large sample size.  The assumption that each sample comes from the same population distribution falls under our assume that all other factors are equal.  We do not know the population variance, but it can be safely assumed to be finite.  It can further be estimated to equal the variance of the sample.

In [7]:
# Black CLT std:
print(b_std/np.sqrt(2435))

# White CLT std:
print(w_std/np.sqrt(2435))

0.004977108869798699
0.0059840016981803105


This results in the black population mean being drawn from a distribution with a standard deviation of .005 and the white population mean being drawn from a distribution with a standard deviation of .006.  These small standard deviations give an intial indication that the difference of .032 between the two samples' means is likely significant.

**2) What are the null and alternate hypotheses?**

The null hypothesis for the two sample *t* test is that the two populations have an equal callback rate.  The alternate hypothesis is that the black population has a lower callback rate thant the white population.

**3) Compute the margin of error, confidence interval, and *p* value.**

We will take $\alpha$=.05 in our computations.

In [8]:
# Compute the confidence intervals of the mean callback rates.

b_confidence=stats.t.interval(.95, b_total-1, loc=b_mean, scale=b_std)
print('black confidence interval: ' + str(b_confidence))

w_confidence=stats.t.interval(.95, w_total-1, loc=w_mean, scale=w_std)
print('white confidence interval: ' + str(w_confidence))

black confidence interval: (-0.4171283287811107, 0.5460811002063579)
white confidence interval: (-0.48252640136287106, 0.6755448824982394)


These intervals are not very illuminating, in part because they range below 0 when they should be restricted to the interval from 0 to 1.  The margin of error is clearly large, but the exact number is meaningless.

In [9]:
# Run a two sample t test:
print(stats.ttest_ind(dfb.call,dfw.call))

Ttest_indResult(statistic=-4.114705290861751, pvalue=3.940802103128886e-05)


On the other hand, the results of the *t* test are very clear: with `p=4e-10`, it is obvious that the true callback rate for the black population is lower than the true callback rate for the white population. 

**4) Write a story describing the statistical significance in the context or the original problem.**

The confidence intervals computed for both ethnicities are too large and include nonsensical values, so cannot be used to compare average callback rates between ethnicities.

However, we still have some confidence regarding what the mean callback rate for each population might be because of the central limit theorem: the sample means came from a normal distribution with standard deviations of roughly .005.  Because 98% of the data lies within 3 standard deviations of the mean for a normal distribution and the difference between the two sample population means is .032, it seems highly unlikely that these two sample means were drawn fromt the same normal distribution.

Indeed, a two sample *t* test yielded `p=4e-10`, an exceedingly small value that unequivocally implies that the white callback rate is higher than the black callback rate.

**5) Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?**

This analysis solely considered the variables race and callback.  While race certainly plays a part, this in no way implies that race is the most important factor.  Quite the opposite, this analysis assumed that all other factors were negligible.

Further study could include analysis of educational background, pertinent skillsets, and prior work experience.  These factors, in my unsubstantiated prediction, are likely more important in dertermining callback rate.  Regardless, racism is at play when two resumes that are identical in all but name consistently call the white name more often than the black one.

Other studies with different data sets could analyze the whether the black population is disadvantaged in their skillset.  If so, this would cause there to be fewer qualified black resumes and thus a lower callback rate.  This would be an implicit systemic form of racism rather than an explicit indualistic one, but one that needs addressing no less.