GitHub - genhealth/genhealth.github.io: GenHealth's submission to the ExpeditionHacks Bias Detection Challenge

Overview

GenHealth's submission to the NIH NCAT's Bias Detection Tools in HealthCare Challenge

Setup

Here's how to setup the project locally:

We recommend using virtualenv to create a local environment for this project. The following steps will create a virtual environment and install the required packages:

virtualenv -p python3.10 genhealth_bias
source genhealth_bias/bin/activate
pip install -r requirements.txt

Running the Scripts

Measure Disparity

The measure_disparity.py script by default expects a CSV file as input. For convenience, we have included a demo CSV file which includes diabetes admission and treatment information in the demo directory:

The script can be run with the following command line arguments:

Name	Description	Expected Values	Example
probability-col	A column containing the model's predicted probability of the outcome being true (1)	float between 0 and 1	0.7
binary-outcome-col	The binary outcome predicted for the row	0/1 or y/n or true/false	true
pos-outcome-indicator	The label indicating a "true" value from the binary outcome column. (Case sensitive)	string	1
sample-weights-col	The sample weights	float	5.2
protected-classes	A comma separated list of columns containing protected classes	pro1,pro2,pro3	sex_f,race_african_american
reference-classes	A comma separated list of columns containing reference classes	ref1,ref2,ref3	sex_m,race_white
input-file	The location of the input file	An absolute or relative path to a file	demo/compass_data_with_predict_probability.csv
debug	Enable additional debug logging	True/False	True

Example:

python measure_disparity.py  \
    --probability-col=predict_probability \
    --binary-outcome-col=readmitted \
    --protected-classes=race_AfricanAmerican,race_Asian,race_Hispanic,gender_Female \
    --reference-classes=sex_Male,race_Caucasian \
    --input-file=demo/diabetes_with_predict_probability.csv \
    --pos-outcome-indicator=1 \
    --reference-classes=race_Caucasian,gender_Male

Expected Output:

| protected_or_reference   | class                |   num_samples |   equal_odds_difference |   equal_odds_ratio |   demographic_parity_difference |   demographic_parity_ratio |
|--------------------------|----------------------|---------------|-------------------------|--------------------|---------------------------------|----------------------------|
| protected                | race_AfricanAmerican |         19210 |                   0.044 |              0.923 |                           0.034 |                      0.934 |
| protected                | race_Asian           |           641 |                   0.107 |              0.705 |                           0.128 |                      0.738 |
| protected                | race_Hispanic        |          2037 |                   0.052 |              0.857 |                           0.039 |                      0.919 |
| protected                | gender_Female        |         54708 |                   0.005 |              0.993 |                           0.006 |                      0.987 |
| reference                | race_Caucasian       |         76099 |                   0.025 |              0.961 |                           0.001 |                      0.998 |
| reference                | gender_Male          |         47055 |                   0.001 |              0.997 |                           0.007 |                      0.986 |

Mitigate Disparity

The mitigate_disparity.py script by default also expects a CSV file as input. For convenience, we have included a demo CSV file demo/diabetes_with_predict_probability.csv which can be used to test the output.

Name	Description	Expected Values	Example
input-file	The absolute or relative path to the train file	directory + filename	./demo/diabetes.csv
test-file	The absolute or relative file to the test file. If no test file is defined, the code will divide the input file into a train/test split	directory + filename	./demo/diabetes.csv
protected-classes	Comma separated list of sensitive classes	string	race_AfricanAmerican,race_Hispanic
reference-classes	Comma separated list of reference classes	string	race_White,sex_M
binary-outcome-col	Column containing binary outcome data on patient	0/1 or y/n or true/false	1
use-pos-weights	Use autogenerated positive weights. Note: This will ignore the sample-weights if those are specified.	True or False	True
sample-weights-col	Sample weights column. Note: specifying use-pos-weights will cause this to be ignored.	string	sample_weights
do-hyperparameter-optimization	Optional flag to perform hyperparameter optimization	True or False	True
train-test-split	Train/test ratio split of data. Default's to 0.8	Float	0.7
random-state	Seed for initializing the random state. Default to 0	Integer	2
pos-outcome-indicator	The label indicating a "true" value from the binary outcome column. (Case sensitive)	string	1
debug	Enable additional debug logging	True/False	True

To run the demo script with a custom file, use the following command:

python mitigate_disparity.py \
    --binary-outcome-col=readmitted \
    --protected-classes=race_AfricanAmerican,race_Asian,race_Hispanic,gender_Female \
    --input-file=demo/diabetes.csv \
    --pos-outcome-indicator=1 \
    --reference-classes=race_Caucasian,gender_Male

Expected output:

Base model bias metric prior to mitigation strategy

| model   | protected_or_reference   | class                |   num_samples |   equal_odds_difference |   equal_odds_ratio |   demographic_parity_difference |   demographic_parity_ratio |
|---------|--------------------------|----------------------|---------------|-------------------------|--------------------|---------------------------------|----------------------------|
| base    | protected                | race_AfricanAmerican |          3817 |                   0.038 |              0.856 |                           0.033 |                      0.912 |
| base    | protected                | race_Asian           |           142 |                   0.127 |              0.716 |                           0.112 |                      0.674 |
| base    | protected                | race_Hispanic        |           401 |                   0.014 |              0.941 |                           0.017 |                      0.95  |
| base    | protected                | gender_Female        |         10962 |                   0.026 |              0.946 |                           0.01  |                      0.972 |
| base    | reference                | race_Caucasian       |         15278 |                   0.062 |              0.882 |                           0.023 |                      0.937 |
| base    | reference                | gender_Male          |          9392 |                   0.023 |              0.952 |                           0.007 |                      0.98  |



Bias-optimized model metrics

| model   | protected_or_reference   | class                |   num_samples |   equal_odds_difference |   equal_odds_ratio |   demographic_parity_difference |   demographic_parity_ratio |
|---------|--------------------------|----------------------|---------------|-------------------------|--------------------|---------------------------------|----------------------------|
| fair    | protected                | race_AfricanAmerican |          3817 |                   0.025 |              0.944 |                           0.016 |                      0.971 |
| fair    | protected                | race_Asian           |           142 |                   0.132 |              0.798 |                           0.07  |                      0.867 |
| fair    | protected                | race_Hispanic        |           401 |                   0.061 |              0.873 |                           0.001 |                      0.999 |
| fair    | protected                | gender_Female        |         10962 |                   0.028 |              0.957 |                           0.01  |                      0.982 |
| fair    | reference                | race_Caucasian       |         15278 |                   0.017 |              0.974 |                           0.001 |                      0.999 |
| fair    | reference                | gender_Male          |          9392 |                   0.027 |              0.96  |                           0.008 |                      0.986 |

Background and Design

The measure_disparity.py file uses the fairlearn library's built-in functions to derive the equalized_odds_difference, equalized_odds_ratio, demographic_parity_difference, and demographic_parity_ratio values for the input data.

The mitigate_disparity.py file creates a binary classifier that dynamically adjusts weights to minimize the equal odds difference between the protected and reference classes. It also feeds the results into the measure_disparity.py script to measure the reduction in bias after training.

License

Distributed under the BSD 3 License. See license.txt for more information.

Contact

Ricky Sahu
Eric Marriott
Ethan Siegel

Project Link: https://github.com/genhealth/genhealth.github.io

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
css		css
demo		demo
js		js
.gitignore		.gitignore
AI_Healthcare_Bias_Challenge_-_GenHealth.pdf		AI_Healthcare_Bias_Challenge_-_GenHealth.pdf
README.md		README.md
__init__.py		__init__.py
example.ipynb		example.ipynb
example.png		example.png
index.html		index.html
measure_disparity.py		measure_disparity.py
mitigate_disparity.py		mitigate_disparity.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Setup

Running the Scripts

Measure Disparity

Mitigate Disparity

Background and Design

License

Contact

About

Releases

Packages

Contributors 2

Languages

genhealth/genhealth.github.io

Folders and files

Latest commit

History

Repository files navigation

Overview

Setup

Running the Scripts

Measure Disparity

Mitigate Disparity

Background and Design

License

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages