# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value. Try using both the bootstrapping and the frequentist statistical approaches.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
+ Formulas for the Bernoulli distribution: https://en.wikipedia.org/wiki/Bernoulli_distribution

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

In [3]:
data = pd.io.stata.read_stata('data/us_job_market_discrimination.dta')

In [4]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [5]:
# number of callbacks for white-sounding names
sum(data[data.race=='w'].call)

235.0

In [6]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


<div class="span5 alert alert-success">
<p>Your answers to Q1 and Q2 here</p>
</div>

### 1. A hypothesis z-test is the appropriate test to compare if two proportions are equal and to evaluate the null hypothesis that there is no difference betwen the two. The sample size is > 30; the sampling method is simple random sampling; each sample has two possible outcomes: call or no call-back;  and the sample includes at least 10 successes and 10 failures. We can determine if the proportion of sucesses/failures for each group is different. The Central Limit theorum does apply in categorical data.

### 2. The null hypothesis is that the probability of a white-sounding name getting call back is the same as that from a black-sounding name: p(b)=p(w). The alternate hypothesis is that the probability of a white-sounding name getting call back is not the same as that from a black-sounding name: p(b)=!p(w).


In [7]:
w = data[data.race=='w']
b = data[data.race=='b']

### 3: Compute margin of error, confidence interval, and p-value.

In [8]:
# proportion for black names receiving call back:
p_b=sum(b.call)/len(b.call)
print ('black sounding name getting call back ', p_b)
# proportion for white names receiving call back:
p_w=sum(w.call)/len(w.call)
print ('white sounding name getting call back ', p_w)
# Finding standard error for proportions (different for hypothesis test)
se= np.sqrt((p_b*(1 - p_b)/(len(b))) + (p_w*(1 - p_w) /(len(w))))

print('The standard error for proportion of receiving call back is: {:.5}'.format(se))

black sounding name getting call back  0.06447638603696099
white sounding name getting call back  0.09650924024640657
The standard error for proportion of receiving call back is: 0.0077834


In [9]:
# the average call back proportion:
diff=p_w-p_b
print (' The average call back proportion is: {:.5}'.format(diff))

 The average call back proportion is: 0.032033


In [10]:
# calculate the confidence interval
zCritical1Tail = stats.norm.ppf(0.975)
print('The Z Critical at 2 tail 95% confidence: {:.5}'.format(zCritical1Tail))
margin = abs(zCritical1Tail*se)
print('The true proportion mragin is +/- %0.6F around the point estimate {:.5}'.format(margin))

CI = [ diff + margin, diff - margin]
print('The difference in call backs between white and black sounding names are from {:.3}'.format(CI[1]), 'to {:.3}'.format(CI[0]))

The Z Critical at 2 tail 95% confidence: 1.96
The true proportion mragin is +/- %0.6F around the point estimate 0.015255
The difference in call backs between white and black sounding names are from 0.0168 to 0.0473


In [11]:
# calculate p value:

# assume null hypothesis is true, the difference in call back proportion is:
null = 0

# observed difference in call back proportion:
ave = (sum(data.call)/(len(data.call)))

# standard deviation:
std = np.sqrt((ave*(1 - ave)/(len(b))) + (ave*(1 - ave) /(len(w))))
z = (diff - null)/std 

p_values = stats.norm.sf(abs(z))*2
print("Z-score is equal to : %6.3F  p-value equal to: %6.7F" % (z,p_values))

Z-score is equal to :  4.108  p-value equal to: 0.0000398


Since p-value is 0.0000388 < 0.025, we should reject the null hypothesis: there is a significant in call back between black-sounding and white-sounding names.

<div class="span5 alert alert-success">
<p> Your answers to Q4 and Q5 here </p>
</div>

In [12]:
# bootstrap method

# Define permutation functions:

def permutation_sample(data1, data2):
    """Return a permutation sample from two data sets."""

    # Concatenate the data sets:
    data = np.concatenate((data1, data2))

    # Permute the concatenated array:
    permuted_data = np.random.permutation(data)

    # Split the permuted array into two samples:
    permutation_sample_1 = permuted_data[:len(data1)]
    permutation_sample_2 = permuted_data[len(data1):]

    return permutation_sample_1, permutation_sample_2


def draw_permutation_replicates(data_1, data_2, func, size=1):
    """Draw multiple permutation replicates."""

    # Initialize array of permutation replicates:
    permutation_replicates = np.empty(size)

    for i in range(size):
        # Generate permutation samples:
        permutation_sample_1, permutation_sample_2 = permutation_sample(data_1, data_2)

        # Compute the test statistic:
        permutation_replicates[i] = func(permutation_sample_1, permutation_sample_2)

    return permutation_replicates

def diff_of_call_proportions(data_1, data_2):
    """Return the difference in means of two arrays."""

    # The difference of means of data_1, data_2:
    return np.sum(data_1)/len(data_1)-np.sum(data_2)/len(data_2)


In [13]:
perm_reps = draw_permutation_replicates(w.call, b.call, diff_of_call_proportions, 10000)
conf_int = np.percentile(perm_reps, [2.5, 97.5])
print('The confidence interval for the difference in call back probability between black and white'
      + 'sounding names is:', conf_int)

The confidence interval for the difference in call back probability between black and whitesounding names is: [-0.01560575  0.01478439]


In [14]:
std_err = np.std(perm_reps)
p = np.sum(perm_reps >= diff) / len(perm_reps)

print('The standard error for call back probability is' , std_err, 'and thep vlaue is', p)
  

The standard error for call back probability is 0.007802283545743536 and thep vlaue is 0.0001


### 4. Write a story describing the statistical significance in the context or the original problem.

The probability of a white sounding name getting call back is 0.097 and the probability of a black-sounding name getting call back is 0.064. There is a significant difference in the call back probability between black and white sounding names, and the difference is 0.032 with white sounding name having a higher chance getting call back.

### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

No, the analysis does not provide which factor is the most important in the callback susccess. It solely examine the binary racial factor, not the others. In order to assess which factors is the most important, we need to evaluate all of the facotrs against the call back success.