
### Examining racial discrimination in the US job market

#### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

#### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes.

#### Exercise
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Discuss statistical significance.

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

 ###  1. What test is appropriate for this problem? Does CLT apply?
The typical test for this situation would be what is commonly known as a "two-proportion *z*-test." This test uses the so-called "normal approximation to the binomial distribution."

###   2. What are the null and alternate hypotheses?


**Null hypothesis:** Race has no impact on the rate of callbacks for resumes. In other words, the callback rates for black-sounding names and for white-sounding names are the same.

**Alternate hypothesis:** Race has an impact on the rate of callbacks for resumes. In other words, the callback rates for black-sounding names and for white-sounding names are different.

###   3. Compute margin of error, confidence interval, and p-value.
   

In [4]:
# Sample sizes:
nb = len(data[data.race=='b'])
nw = len(data[data.race=='w'])
# Sample proportions:
pb = sum(data[data.race=='b'].call)/nb
pw = sum(data[data.race=='w'].call)/nw
# Sigma:
sig = np.sqrt((pb*(1-pb))/nb + (pw*(1-pw))/nw)
# Test statistic:
z = (pw - pb) / sig

Margin of error (95% confidence):

In [5]:
E = stats.norm.ppf(.975) * sig
E

0.015255126028214831

95% confidence interval for difference in callback rates:

In [6]:
(pw - pb - E, pw - pb + E)

(0.016777728181230755, 0.047287980237660412)

*p*-value:

In [7]:
2*(1 - stats.norm.cdf(z))

3.862565207524149e-05

###   4. Discuss statistical significance.


The *p*-value is very small (and the confidence interval does not contain zero), indicating a statistically significant difference between the callback rates for black-sounding names and for white-sounding names.

### Further Exploration:
Building upon Allen Downey's ``HypothesisTest`` class, we can create a test for a difference in proportions "by permutation":

In [8]:
class HypothesisTest(object):
    """Represents a hypothesis test."""

    def __init__(self, data):
        """Initializes.

        data: data in whatever form is relevant
        """
        self.data = data
        self.MakeModel()
        self.actual = self.TestStatistic(data)
        self.test_stats = None

    def PValue(self, iters=1000):
        """Computes the distribution of the test statistic and p-value.

        iters: number of iterations

        returns: float p-value
        """
        self.test_stats = np.array([self.TestStatistic(self.RunModel()) 
                                       for _ in range(iters)])

        count = sum(self.test_stats >= self.actual)
        return count / iters

    def MaxTestStat(self):
        """Returns the largest test statistic seen during simulations.
        """
        return max(self.test_stats)

    def PlotHist(self, label=None):
        """Draws a Cdf with vertical lines at the observed test stat.
        """
        ys, xs, patches = pyplot.hist(ht.test_stats, color=COLOR4)
        pyplot.vlines(self.actual, 0, max(ys), linewidth=3, color='0.8')
        pyplot.xlabel('test statistic')
        pyplot.ylabel('count')

    def TestStatistic(self, data):
        """Computes the test statistic.

        data: data in whatever form is relevant        
        """
        raise UnimplementedMethodException()

    def MakeModel(self):
        """Build a model of the null hypothesis.
        """
        pass

    def RunModel(self):
        """Run the model of the null hypothesis.

        returns: simulated data
        """
        raise UnimplementedMethodException()


In [9]:
class DiffPropsPermute(HypothesisTest):
    """Tests a difference in proportions by permutation."""

    def TestStatistic(self, data):
        """Computes the test statistic.

        data: data in whatever form is relevant        
        """
        group1, group2 = data
        test_stat = abs(sum(group1)/len(group1) - sum(group2)/len(group2))
        return test_stat

    def MakeModel(self):
        """Build a model of the null hypothesis.
        """
        group1, group2 = self.data
        self.n, self.m = len(group1), len(group2)
        self.pool = np.hstack((group1, group2))

    def RunModel(self):
        """Run the model of the null hypothesis.

        returns: simulated data
        """
        np.random.shuffle(self.pool)
        data = self.pool[:self.n], self.pool[self.n:]
        return data

Now we calculate the *p*-value, which is very similar in magnitude to the *p*-value calculated earlier, indicating again a statistically significant difference.

In [10]:
ht_data = (data[data.race=='b'].call, data[data.race=='w'].call)
ht = DiffPropsPermute(ht_data)
p_value = ht.PValue(iters=100000)
p_value

3.0000000000000001e-05