# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

<div class="span5 alert alert-info">
### Exercises
You will perform a statistical analysis to establish whether race has a significant impact on the rate of callbacks for resumes.

Answer the following questions **in this notebook below and submit to your Github account**. 

   1. What test is appropriate for this problem? Does CLT apply?
   2. What are the null and alternate hypotheses?
   3. Compute margin of error, confidence interval, and p-value.
   4. Write a story describing the statistical significance in the context or the original problem.
   5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

You can include written notes in notebook cells using Markdown: 
   - In the control panel at the top, choose Cell > Cell Type > Markdown
   - Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet


#### Resources
+ Experiment information and data source: http://www.povertyactionlab.org/evaluation/discrimination-job-market-united-states
+ Scipy statistical methods: http://docs.scipy.org/doc/scipy/reference/stats.html 
+ Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet
</div>
****

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
data = pd.io.stata.read_stata(r'C:\Users\babsab\Google Drive\Courses\Springboard\EDA\racial_disc\data\us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(data[data.race=='b'].call)

157.0

In [4]:
data.head()

Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [5]:
# count sample sizes
num_app = data[['race', 'call']].groupby('race').count()
n_b = num_app.iloc[0,0]
n_w = num_app.iloc[1,0]
print n_b, n_w

2435 2435


In [6]:
# compute sample means
call_rate = data[['race', 'call']].groupby('race').mean()
m_b = call_rate.iloc[0,0]
m_w = call_rate.iloc[1,0]
print m_b, m_w

0.0644764 0.0965092


In [7]:
# compute sample std-deviations
std_d = data[['race', 'call']].groupby('race').std()
s_b = std_d.iloc[0,0]
s_w = std_d.iloc[1,0]
print s_b, s_w

0.24565 0.295349


### 1. What test is appropriate for this problem? Does CLT apply?

A two-sample t test is appropriate for this problem. Yes, CLT applies

### 2. What are the null and alternate hypothesis

Null Hypothesis H0: No difference in call rate i.e. mu_b = mu_w

Alternate Hypothesis H1: There is a difference in the call rate i.e. w_b != mu_w

### 3. Compute margin of error, confidence interval, p-value

In [8]:
# let's first find the pooled estimate of the common std-dev

s_p = (((n_b-1)*s_b**2 + (n_w-1)*s_w**2) / (n_w + n_b - 2) )**0.5
s_p

0.27163854301890733

In [9]:
# standard error

std_err = s_p * (1.0/n_b + 1.0/n_w)**0.5
std_err

0.0077849693568973945

In [10]:
# confidence interval
# as the sample size is very large we use normal distribution
# for 95% the z value is 1.96

# under the null hypothesis, the difference between means should be 0
ci_low = 0 - 1.96 * std_err
ci_hi = 0 + 1.96 * std_err

print 'confidence interval limits:', ci_low, ci_hi
print 'observed difference in means:', m_w - m_b

confidence interval limits: -0.0152585399395 0.0152585399395
observed difference in means: 0.0320329


In [11]:
z_obs = (m_w - m_b) / std_err 
z_obs

4.1147053233651887

The p_val is < 0.001. We reject the null hypothesis in the favor of alternative hypothesis

## Story

We conducted a social experiment to see if there is discrimination on the basis of race when the hiring decision are made in organizations. We tracked the resume submissions of 2435 people. We randomly assigned identical résumés to black-sounding or white-sounding names to study the impact of racial discrimination.

From our sample, we found that 6.4% black people were called back as opposed to 9.6% white people, which means that white people were favored more by 3.2 percentage points. 

Next, we perform a statistical analysis to find out whether this difference was due to chance or real discrimination.
For this analysis, we first assume that there is no discrimination on the basis of name/race. Then we compute the standard error, confidence interval and p-value.

We find that our difference of call back rates (or means) is outside of the 95% confidence interval. Also, the p-value is less that 0.0001. This means that the difference observed in the call back rates is statistically very significant and there does exist a discrimination on the basis of race.

### 5. Does your analysis mean that race/name is the most important factor in callback success? Why or why not? If not, how would you amend your analysis?

To find an answer to this problem we need to find out whether the black and white samples were otherwise equally competent. For this purpose we compute the average of all other attributes for both samples and compare them side by side

In [12]:
av_data = data.groupby('race').mean().reset_index().set_index('race')

In [13]:
av_comp = av_data.T.reset_index()

In [14]:
av_comp.head()

race,index,b,w
0,education,3.616016,3.620945
1,ofjobs,3.658316,3.664476
2,yearsexp,7.829569,7.856263
3,honors,0.051335,0.054209
4,volunteer,0.414374,0.408624


In [15]:
st_data = data.groupby('race').std().reset_index().set_index('race')
st_comp = st_data.T.reset_index()

In [16]:
race_comp = st_comp.merge(av_comp, on='index')
race_comp.head(2)

race,index,b_x,w_x,b_y,w_y
0,education,0.73306,0.696609,3.616016,3.620945
1,ofjobs,1.21915,1.219345,3.658316,3.664476


In [17]:
race_comp.columns = ['attrib', 'm_b', 'm_w', 's_b', 's_w']
race_comp.head()

Unnamed: 0,attrib,m_b,m_w,s_b,s_w
0,education,0.73306,0.696609,3.616016,3.620945
1,ofjobs,1.21915,1.219345,3.658316,3.664476
2,yearsexp,5.010764,5.079228,7.829569,7.856263
3,honors,0.220725,0.226477,0.051335,0.054209
4,volunteer,0.492715,0.491681,0.414374,0.408624


In [18]:
# see if the difference between all other attributes is statistically significant
rc = race_comp
n_b = n_w = 2435
rc['std_err'] = (((n_b-1)*rc.s_b**2 + (n_w-1)*rc.s_w**2) / (n_w + n_b - 2) )**0.5 * (1.0/n_b + 1.0/n_w)**0.5
rc['err_mgn'] = 1.96 * rc.std_err
rc['signif'] = abs(rc.m_w - rc.m_b) > abs(rc.err_mgn)
rc

Unnamed: 0,attrib,m_b,m_w,s_b,s_w,std_err,err_mgn,signif
0,education,0.73306,0.696609,3.616016,3.620945,0.103703,0.203258,False
1,ofjobs,1.21915,1.219345,3.658316,3.664476,0.104933,0.205669,False
2,yearsexp,5.010764,5.079228,7.829569,7.856263,0.224773,0.440555,False
3,honors,0.220725,0.226477,0.051335,0.054209,0.001513,0.002965,True
4,volunteer,0.492715,0.491681,0.414374,0.408624,0.011794,0.023115,False
5,military,0.302511,0.289653,0.101848,0.092402,0.002787,0.005462,True
6,empholes,0.497177,0.497606,0.445996,0.450103,0.012841,0.025168,False
7,occupspecific,148.021857,148.255302,216.744969,214.530595,6.180108,12.113013,False
8,occupbroad,2.043125,2.033334,3.487885,3.475154,0.099778,0.195565,False
9,workinschool,0.496369,0.496714,0.560986,0.558111,0.016036,0.031431,False


In [19]:
rc_sig = rc[rc['signif']].reset_index()
rc_sig

Unnamed: 0,index,attrib,m_b,m_w,s_b,s_w,std_err,err_mgn,signif
0,3,honors,0.220725,0.226477,0.051335,0.054209,0.001513,0.002965,True
1,5,military,0.302511,0.289653,0.101848,0.092402,0.002787,0.005462,True
2,15,call,0.24565,0.295349,0.064476,0.096509,0.002352,0.00461,True


### Other possible causes of difference in call rate
We observe that the other difference between the two groups is that white people have more honors and black people have more military service. These might have contributed to the observed call rates. One possible solution is to repeat the experiment by assigning names in such a way that the military service and honors are also equalized. This is costly. Another solution is to remove the samples where the military and honors are different i.e. remove the white people who have more honors and black people who have military service from the 'data' DataFrame and then re-run the computation.

In [20]:
data[['race', 'honors', 'military']].groupby('race').sum()

Unnamed: 0_level_0,honors,military
race,Unnamed: 1_level_1,Unnamed: 2_level_1
b,125.0,248.0
w,132.0,225.0


This means that we need to remove 7 white people who have honors

In [21]:
data.sort_values(by=['race', 'honors', 'military'], ascending=[False, False, True])[['race', 'honors', 'military']].reset_index().head(8)

Unnamed: 0,index,race,honors,military
0,5,w,1,0
1,18,w,1,0
2,40,w,1,0
3,105,w,1,0
4,110,w,1,0
5,169,w,1,0
6,176,w,1,0
7,206,w,1,0


In [22]:
temp_data = data.sort_values(by=['race', 'honors', 'military'], ascending=[False, False, True]).reset_index().drop(range(7), axis=0)

In [23]:
data_eq_hon = temp_data.set_index('index')
data_eq_hon.head()

Unnamed: 0_level_0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
206,b,10,4,3,5,1,0,0,1,323,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,
210,b,10,4,3,4,1,0,0,1,323,...,1.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,Private
241,b,10,4,3,4,1,0,0,1,323,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,Private
292,b,10,4,2,6,1,0,0,0,266,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,Private
310,b,10,4,2,5,1,0,0,1,313,...,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,Private


In [24]:
data_eq_hon[['race', 'honors', 'military']].groupby('race').sum()

Unnamed: 0_level_0,honors,military
race,Unnamed: 1_level_1,Unnamed: 2_level_1
b,125.0,248.0
w,125.0,225.0


In [25]:
temp_data1 = data_eq_hon.sort_values(by=['race', 'honors', 'military', ], ascending=[True, True, False]).reset_index().drop(range(23), axis=0)

In [26]:
data_eq_hon_mil = temp_data1.set_index('index')
data_eq_hon_mil.head()

Unnamed: 0_level_0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
607,b,12,3,3,6,0,1,1,0,316,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,Private
618,b,12,3,2,12,0,1,1,0,188,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,
637,b,12,4,5,4,0,0,1,1,34,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,Private
675,b,12,3,3,3,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,Private
715,265,12,4,4,2,0,1,1,0,267,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,


In [27]:
data_eq_hon_mil[['race', 'honors', 'military']].groupby('race').sum()

Unnamed: 0_level_0,honors,military
race,Unnamed: 1_level_1,Unnamed: 2_level_1
b,125.0,225.0
w,125.0,225.0


In [28]:
av_data = data_eq_hon_mil.groupby('race').mean().reset_index().set_index('race')

In [29]:
av_comp = av_data.T.reset_index()

In [30]:
av_comp.head()

race,index,b,w
0,education,3.618159,3.620675
1,ofjobs,3.65796,3.668451
2,yearsexp,7.858209,7.861203
3,honors,0.051824,0.051483
4,volunteer,0.410033,0.408567


In [31]:
st_data = data_eq_hon_mil.groupby('race').std().reset_index().set_index('race')
st_comp = st_data.T.reset_index()

In [32]:
race_comp = st_comp.merge(av_comp, on='index')
race_comp.head(2)

race,index,b_x,w_x,b_y,w_y
0,education,0.731278,0.696581,3.618159,3.620675
1,ofjobs,1.223104,1.218607,3.65796,3.668451


In [33]:
race_comp.columns = ['attrib', 'm_b', 'm_w', 's_b', 's_w']
race_comp.head()

Unnamed: 0,attrib,m_b,m_w,s_b,s_w
0,education,0.731278,0.696581,3.618159,3.620675
1,ofjobs,1.223104,1.218607,3.65796,3.668451
2,yearsexp,5.000685,5.085516,7.858209,7.861203
3,honors,0.221718,0.221026,0.051824,0.051483
4,volunteer,0.491941,0.49167,0.410033,0.408567


In [34]:
# see if the difference between all other attributes is statistically significant
rc = race_comp
n_b = n_w = 2435
rc['std_err'] = (((n_b-1)*rc.s_b**2 + (n_w-1)*rc.s_w**2) / (n_w + n_b - 2) )**0.5 * (1.0/n_b + 1.0/n_w)**0.5
rc['err_mgn'] = 1.96 * rc.std_err
rc['signif'] = abs(rc.m_w - rc.m_b) > abs(rc.err_mgn)

In [35]:
rc_sig = rc[rc['signif']].reset_index()
rc_sig

Unnamed: 0,index,attrib,m_b,m_w,s_b,s_w,std_err,err_mgn,signif
0,15,call,0.246738,0.295166,0.065091,0.096376,0.002357,0.004619,True


### Conclusion
Now, we see that the only significant difference exists in the call rate when all other attributes have been equalized. Therefore, the discrimination on the basis of race does exist.