# Examining Racial Discrimination in the US Job Market

### Background
Racial discrimination continues to be pervasive in cultures throughout the world. Researchers examined the level of racial discrimination in the United States labor market by randomly assigning identical résumés to black-sounding or white-sounding names and observing the impact on requests for interviews from employers.

### Data
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call from employers or not.

Note that the 'b' and 'w' values in race are assigned randomly to the resumes when presented to the employer.

In [6]:
% matplotlib inline

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
import random
from scipy.stats import t, norm



In [2]:
df = pd.io.stata.read_stata('../data/us_job_market_discrimination.dta')

In [3]:
# number of callbacks for black-sounding names
sum(df[df.race=='w'].call)

235.0

In [4]:
df.head()


Unnamed: 0,id,ad,education,ofjobs,yearsexp,honors,volunteer,military,empholes,occupspecific,...,compreq,orgreq,manuf,transcom,bankreal,trade,busservice,othservice,missind,ownership
0,b,1,4,2,6,0,0,0,1,17,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
1,b,1,3,3,6,0,1,1,0,316,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
2,b,1,4,1,6,0,0,0,0,19,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
3,b,1,3,4,6,0,1,0,1,313,...,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,
4,b,1,3,3,22,0,0,0,0,313,...,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,Nonprofit


In [5]:
(rows, cols) = data.shape
print("There are a total of {} observations in the data set ".format(rows))

NameError: name 'data' is not defined

In [None]:
color = sns.color_palette()
sns.set(style="white")
cnt_srs = df.race.value_counts()

plt.figure(figsize=(12,8))
with sns.color_palette("husl"):
 sns.barplot(cnt_srs.index, cnt_srs.values, alpha=0.8)
plt.ylabel('Number of Observations', fontsize=12)
plt.xlabel('Counts by Race', fontsize=12)
plt.title('Count of rows in each dataset', fontsize=15)
plt.xticks(rotation='vertical')
sns.set_style("whitegrid")
sns.despine(offset=10)
sns.despine(left=True)

plt.margins(0.6)

plt.show()

__Fig. 1__: (above) There are 4870 observations and an equal number of observations grouped by race - 'b'(Blacks) and 'w'(Whites). 

 

#### There are 4870 observations in the data-set. N > 30.  
In the dataset provided, each row represents a resume. The 'race' column has two values, 'b' and 'w', indicating black-sounding and white-sounding. The column 'call' has two values, 1 and 0, indicating whether the resume received a call( success)  from employers or not(fail). The callbacks are categorical. Hence, we would use proportions.  Yes, The Central Limit Theorem does apply. The sampling distribution of propotions is very nearly normal for N >=30 even when the population is non-normal. Therefore a z-test is appropriate 

Here are the hypotheses
#### H<sub>0</sub> : Race does not have any impact on the callbacks for resume( claim )
#### H<sub>a </sub>: Race does have significant impact on callbacks for resume

In [None]:
# let's seperate the data frames by race
df_w = df[df.race=='w']
df_b = df[df.race=='b']

Next we calculate the proportions 

In [None]:
#Calculating the proportions per racial group 
prop_b = ((sum(df_b.call))/(len(df_b.call)))
prop_w = ((sum(df_w.call))/(len(df_w.call)))
prop_b, prop_w # p-hats for both black and white

It is observed that the callback on resumes with black sounding names  have lower propotion of success of __6.45% vs. 9.65%__ success for white sounding names'

### Calculations of test statistic

 We choose the test statistic to be a difference in the proportions of the Black population and White population.
 Under the null hypothesis the difference in mean of these proportions  will be 0
 
 Let's begin by calculating the proportions  of success of both black and white sounding named samples together
 
 


In [None]:
p_both = sum(df.call)/len(df.call)
print("The proportion of the successful callbacks is {} ".format(p_both))

#### Standard Deviation 

Next, let us caclulate the Standard Deviation of the test statistic( difference in proportions)

se_both = np.sqrt(pq/Nb + pq/Nw)

In [None]:
se_both = np.sqrt((p_both*(1 - p_both)/(len(df_b))) + (p_both*(1 - p_both) /(len(df_w))))
print("The standard deviation  is {} ".format(se_both))
 

#### Margin of Error 

The Margin of error at 0.05 significance level or at 95% confidence interval is  1.96 * se_both
print("The proportion of the successful callbacks is {} ".format(p_both))


In [None]:
ME = 1.96 * se_both
print("The Margin of error is {} ".format(ME))

#### Confidence Interval

In [None]:
#### Confidence Interval 
test_replicate = prop_b - prop_w
print("The Confidence Interval is of error is Pa-b +/- ME".format(test_replicate,ME)) 
print("The Confidence Interval is of error is {} +/- {}".format(test_replicate,ME)) 

#### z - statistic

The z statistic would then be caclulated as follows

z =  <hat>P</hat> - u / SE

In [None]:
P = prop_w - prop_b
u = 0 ## Under null hypothesis the difference is 0
SE = se_both

z_stat = (P - u) / (SE)

print("The z-stat  is {} ".format(z_stat))

#### p-value
Finally, let's compute the p-value 

In [None]:
p  = stats.norm.sf(abs(z_stat))*2 #twoside
print("The p-value  is {} ".format(p))

With a p-value of 0.000098388, at a significance level of 0.025( two tailed ), we __reject the null hypothesis claim __  and conclude that there is a significant difference between black-sounding and white-sounding names in the number of callbacks of resumes.

## CONCLUSION
<div class="span5 alert alert-success">
<p> SUMMARY </p>
With a p-value of 0.000098388, at a significance level of 0.025 ( two tailed ), we reject the null hypothesis claim and conclude that there is a significant difference between black-sounding and white-sounding names in the number of callbacks of resumes. We can clainm that based in the race, there is a perceived bias in callbacks received for White-sounding names vs. black.sounding.
 
</div>

