Skip to content

edjzhang/zipbiaschecker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

zipbiaschecker

One challenge of assessing algorithmic racial bias is sometimes that the data are missing (not collected as part of sign-up forms, for example) or unavailable for privacy reasons. In these cases, zipcode-level bias is an indirect measure. We can go one step further by analyzing Census data that contain racial demographic data by zip code. This package helps run this indirect check by looking at the correlation between the algorithmic output and the percentage of Black, Hispanic, and Indigenous people in that zip code.

Installation

This package can be installed using the command below:

pip install zipbiaschecker

Example

In this example, the data is taken from the Illinois Department of Public Health COVID statistics as of 7/15/20. We will examine the correlation between the positive rate of testing by zip code vs. the demographics of the zip code to check the disparate impact of COVID on racial minorities.

import pandas as pd
from zipbiaschecker import zipbiaschecker as zbc

df = pd.read_csv('zipbiaschecker/data/example/2020_07_15_illinois_covid_data.csv')
df['positive_rate'] = df['Positive Cases'] / df['Tested']
print(df.shape)
df.head()
(646, 4)
Zip Tested Positive Cases positive_rate
0 60002 1925 130 0.067532
1 60004 9441 406 0.043004
2 60005 4771 255 0.053448
3 60007 4191 383 0.091386
4 60008 4672 380 0.081336

To interpret the cell below, we see that the rate of positive cases has a positive correlation of about .278 with the proportion of Black people in the zip code, .585 with the proportion of Hispanic people in the zip code, and .108 with the proportion of Indigenous people in the zip code.

zip_bias_checker = zbc.ZipBiasChecker()
zip_bias_checker.check_bias(df, zip_col_name='Zip', target_col_name='positive_rate')
1 row(s) could not be matched out of 646

percent_black         0.277773
percent_hispanic      0.585238
percent_indigenous    0.107945
Name: positive_rate, dtype: float64

Documentation notebook for process to generate reference data

In the notebooks folder, the process to map zip codes to demographic data is documented in a Jupyter notebook. To run the notebook, clone this repository to obtain the data used.

About

A quick check for racial bias using zipcode-level Census data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published