# Investigating Racial Discrimination in the Labor Market

## Introduction

This project aims to explore whether there is racial discrimination against African-Americans in the labor market. We use two datasets: an observational survey from Chicago and Boston in 2001, and a randomized experiment conducted by [Bertrand and Mullainathan (2004)](./BertrandMullainathan2004.pdf). The survey dataset includes variables such as race, years of experience, education level, and employment status. The experimental dataset includes variables such as race, gender, education, computer skills, number of jobs listed, years of experience, and callback for interviews.

## Data Description

### Observational Survey Data
- **black**: Dummy variable for race (0 = White, 1 = Black)
- **yearsexp**: Years of work experience
- **somecol_more**: 1 if college dropout or higher, 0 otherwise
- **employed**: 1 if employed, 0 if unemployed

### Randomized Experiment Data
- **id**: Unique identifier for each observation
- **male**: Dummy variable for gender (0 = Female, 1 = Male)
- **black**: Dummy variable for race (0 = White-sounding name, 1 = Black-sounding name)
- **education**: Education level (0 to 4)
- **computerskills**: 1 if resume mentions computer skills, 0 otherwise
- **ofjobs**: Number of jobs listed on the resume
- **yearsexp**: Years of work experience
- **call**: 1 if applicant was called back, 0 otherwise

## Methodology

To investigate potential racial discrimination, we performed the following steps:
1. **Covariate Balance**: Checked if key variables (somecol_more, yearsexp) are balanced across races using t-tests.
2. **Employment Status**: Tested if employment rates differ between races.
3. **Callback Rates**: Examined if callback rates differ between races in the experimental data.
4. **Gender Analysis**: Analyzed callback rates conditional on gender.

We used two-sided t-tests to determine if there are significant differences in means between groups.

In [3]:
## Analysis and Results

### Covariate Balance

import pandas as pd
import numpy as np
from scipy import stats

# Load the survey data
data_survey = pd.read_csv('assets/survey.csv')

# T-test for somecol_more
black_col = data_survey[data_survey["black"] == 1]["somecol_more"]
white_col = data_survey[data_survey["black"] == 0]["somecol_more"]
ttest1_1 = stats.ttest_ind(black_col, white_col)
print("T-test for somecol_more:", ttest1_1)

T-test for somecol_more: Ttest_indResult(statistic=-11.24095598344323, pvalue=3.7514090837794205e-29)


The p-value is close to 0, indicating a significant difference in somecol_more between African-Americans and Whites.

In [4]:

# T-test for employment status
black_employed = data_survey[data_survey["black"] == 1]["employed"]
white_employed = data_survey[data_survey["black"] == 0]["employed"]
ttest1_4 = stats.ttest_ind(black_employed, white_employed)
print("T-test for employment status:", ttest1_4)

T-test for employment status: Ttest_indResult(statistic=-7.1420030594715485, pvalue=9.802071342176244e-13)


The p-value is close to 0, indicating a significant difference in employment status between races.

In [6]:
# Load the experimental data
data_random = pd.read_csv('assets/random_exper.csv')

# T-test for callback rates
data_black_call = data_random[data_random["black"] == 1]["call"]
data_white_call = data_random[data_random["black"] == 0]["call"]
ttest2_4_1 = stats.ttest_ind(data_black_call, data_white_call)
print("T-test for callback rates:", ttest2_4_1)

T-test for callback rates: Ttest_indResult(statistic=-4.114705266723095, pvalue=3.9408025140695284e-05)


The p-value is 3.94e-5, indicating significant discrimination in callback rates.

In [7]:
# T-test for callback rates by gender
data_female_b_call = data_random[(data_random["male"] == 0) & (data_random["black"] == 1)]["call"]
data_female_w_call = data_random[(data_random["male"] == 0) & (data_random["black"] == 0)]["call"]
ttest2_5_1 = stats.ttest_ind(data_female_b_call, data_female_w_call)
print("T-test for female callback rates:", ttest2_5_1)

data_male_b_call = data_random[(data_random["male"] == 1) & (data_random["black"] == 1)]["call"]
data_male_w_call = data_random[(data_random["male"] == 1) & (data_random["black"] == 0)]["call"]
ttest2_5_2 = stats.ttest_ind(data_male_b_call, data_male_w_call)
print("T-test for male callback rates:", ttest2_5_2)

T-test for female callback rates: Ttest_indResult(statistic=-3.6369213964305627, pvalue=0.0002796319942029361)
T-test for male callback rates: Ttest_indResult(statistic=-1.9501711134984252, pvalue=0.05140448724722174)


Both p-values indicate significant differences in callback rates by race for both females and males.


## Conclusion

Our analysis reveals significant differences in employment status and callback rates between African-Americans and Whites, suggesting the presence of racial discrimination in the labor market. The randomized experiment further confirms this finding, with significant disparities in callback rates for both genders. These results highlight the need for policies and practices to address racial biases in hiring.

## References

- Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg More Employable than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94(4), 991-1013.