zipbiaschecker

One challenge of assessing algorithmic racial bias is sometimes that the data are missing (not collected as part of sign-up forms, for example) or unavailable for privacy reasons. In these cases, zipcode-level bias is an indirect measure. We can go one step further by analyzing Census data that contain racial demographic data by zip code. This package helps run this indirect check by looking at the correlation between the algorithmic output and the percentage of Black, Hispanic, and Indigenous people in that zip code.

Installation

This package can be installed using the command below:

pip install zipbiaschecker

Example

In this example, the data is taken from the Illinois Department of Public Health COVID statistics as of 7/15/20. We will examine the correlation between the positive rate of testing by zip code vs. the demographics of the zip code to check the disparate impact of COVID on racial minorities.

import pandas as pd
from zipbiaschecker import zipbiaschecker as zbc

df = pd.read_csv('zipbiaschecker/data/example/2020_07_15_illinois_covid_data.csv')
df['positive_rate'] = df['Positive Cases'] / df['Tested']
print(df.shape)
df.head()

(646, 4)

	Zip	Tested	Positive Cases	positive_rate
0	60002	1925	130	0.067532
1	60004	9441	406	0.043004
2	60005	4771	255	0.053448
3	60007	4191	383	0.091386
4	60008	4672	380	0.081336

To interpret the cell below, we see that the rate of positive cases has a positive correlation of about .278 with the proportion of Black people in the zip code, .585 with the proportion of Hispanic people in the zip code, and .108 with the proportion of Indigenous people in the zip code.

zip_bias_checker = zbc.ZipBiasChecker()
zip_bias_checker.check_bias(df, zip_col_name='Zip', target_col_name='positive_rate')

1 row(s) could not be matched out of 646

percent_black         0.277773
percent_hispanic      0.585238
percent_indigenous    0.107945
Name: positive_rate, dtype: float64

Documentation notebook for process to generate reference data

In the notebooks folder, the process to map zip codes to demographic data is documented in a Jupyter notebook. To run the notebook, clone this repository to obtain the data used.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
notebooks		notebooks
zipbiaschecker		zipbiaschecker
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebooks

notebooks

zipbiaschecker

zipbiaschecker

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

setup.py

setup.py

Repository files navigation

zipbiaschecker

Installation

Example

Documentation notebook for process to generate reference data

About

Releases

Packages

Languages

License

edjzhang/zipbiaschecker

Folders and files

Latest commit

History

Repository files navigation

zipbiaschecker

Installation

Example

Documentation notebook for process to generate reference data

About

Resources

License

Stars

Watchers

Forks

Languages