# Intersectional Fairness Metrics Demonstration

This jupyter notebook demonstrates the functionality of the `metrics` module of our fairness package for evaluating the difference in performance of predictive health models across intersectional groups.

In [1]:
import fairness.metrics as fm

In this notebook we will use observatins from 30 fictional participants who are given age and gender labels, model predictions, and true diagnoses, to show how functions from our package should be used.

In [2]:
age_labels = ['Younger', 'Older', 'Older', 'Younger', 'Older', 'Older',
              'Younger', 'Older', 'Older', 'Younger', 'Older', 'Younger',
              'Older', 'Younger', 'Younger', 'Older', 'Older', 'Older',
              'Younger', 'Older', 'Younger', 'Older', 'Younger', 'Older',
              'Younger', 'Younger', 'Younger', 'Older', 'Older', 'Older']

sex_labels = ['Male', 'Female', 'Male', 'Male', 'Female', 'Male',
              'Male', 'Female', 'Male', 'Female', 'Female', 'Male',
              'Female', 'Male', 'Female', 'Male', 'Female', 'Female',
              'Female', 'Female', 'Female', 'Female', 'Female', 'Male',
              'Male', 'Male', 'Male', 'Male', 'Male', 'Female']            

model_predictions = [1, 1, 1, 1, 1, 1, 
                     0, 0, 0, 0, 0, 0,
                     1, 0, 1, 0, 1, 1,
                     0, 1, 0, 1, 0, 0,
                     0, 0, 0, 0, 1, 1]

true_diagnoses = [1, 1, 1, 0, 1, 0,
                  0, 1, 0, 1, 1, 0,
                  0, 0, 1, 1, 0, 0, 
                  0, 1, 0, 1, 1, 0,
                  1, 0, 0, 1, 1, 1]

all_group_labels_dict = {'Age': age_labels,
                         'Sex': sex_labels}

print(all_group_labels_dict)

{'Age': ['Younger', 'Older', 'Older', 'Younger', 'Older', 'Older', 'Younger', 'Older', 'Older', 'Younger', 'Older', 'Younger', 'Older', 'Younger', 'Younger', 'Older', 'Older', 'Older', 'Younger', 'Older', 'Younger', 'Older', 'Younger', 'Older', 'Younger', 'Younger', 'Younger', 'Older', 'Older', 'Older'], 'Sex': ['Male', 'Female', 'Male', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Female']}


##Â Accuracies

The `fairness.metrics` module provides functions for evaluating and comparing the overall accuracy of a model for different intersectional groups. In this context, 'overall accuracy' refers to the proportion of model predictions which are correct, regardless of whether those predictions are that someone does or does not have the disease being predicted.

The `intersect_acc` function allows us to input multiple group labels, and find the accuracy of a model for people who fall into all of the specified categories.

In [3]:
acc_younger_female = fm.intersect_acc(
                         group_labels_dict={'Age': 'Younger', 'Sex': 'Female'},
                         subject_labels_dict = all_group_labels_dict,
                         predictions = model_predictions,
                         true_statuses = true_diagnoses)

print("Overall accuracy (younger females):", acc_younger_female)

acc_older_male = fm.intersect_acc(
                     group_labels_dict={'Age': 'Older', 'Sex': 'Male'},
                     subject_labels_dict = all_group_labels_dict,
                     predictions = model_predictions,
                     true_statuses = true_diagnoses)

print("Overall accuracy (older males):", acc_older_male)

Overall accuracy (younger females): 0.6
Overall accuracy (older males): 0.5714285714285714


Using the `all_intersect_accs` function, we can find the accuracy for all intersectional groups represented in the dataset.

In [4]:
all_accs = fm.all_intersect_accs(subject_labels_dict=all_group_labels_dict,
                                 predictions=model_predictions,
                                 true_statuses=true_diagnoses)
print("Overall accuracy in different groups:", all_accs)

Overall accuracy in different groups: {'Older + Female': 0.5, 'Older + Male': 0.5714285714285714, 'Younger + Female': 0.6, 'Younger + Male': 0.75}


The `max_intersect_acc_diff` and `max_intersect_acc_ratio` functions allow us to find the greatest disparity in accuracy between intersectional groups.

In [5]:
max_acc_diff = fm.max_intersect_acc_diff(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses)
print("Maximum intersectional accuaracy difference:", max_acc_diff)

max_acc_rat = fm.max_intersect_acc_ratio(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses,
                  natural_log=False)
print("Maximum intersectional accuaracy ratio:", max_acc_rat)

max_acc_rat_log = fm.max_intersect_acc_ratio(
                      subject_labels_dict=all_group_labels_dict,
                      predictions=model_predictions,
                      true_statuses=true_diagnoses,
                      natural_log=True)
print("Natural log of maximum intersectional accuaracy ratio:",
      max_acc_rat_log)

Maximum intersectional accuaracy difference: 0.25
Maximum intersectional accuaracy ratio: 1.5
Natural log of maximum intersectional accuaracy ratio: 0.4054651081081644


## False Negative Rates

The false negative rate (FNR) of a predictive model is the proportion of people who do actually have the disease who are assiged a negative prediction by the model. The `fairness.metrics` module provides functions for evaluating and comparing the FNR of a model for different intersectional groups.

The `intersect_fnr` function allows us to input multiple group labels, and find the false negative rate of a model for people who fall into all of the specified categories.

In [6]:
fnr_younger_female = fm.intersect_fnr(
                         group_labels_dict={'Age': 'Younger', 'Sex': 'Female'},
                         subject_labels_dict = all_group_labels_dict,
                         predictions = model_predictions,
                         true_statuses = true_diagnoses)

print("Overall FNR (younger females):", fnr_younger_female)

fnr_older_male = fm.intersect_fnr(
                     group_labels_dict={'Age': 'Older', 'Sex': 'Male'},
                     subject_labels_dict = all_group_labels_dict,
                     predictions = model_predictions,
                     true_statuses = true_diagnoses)

print("Overall FNR (older males):", fnr_older_male)

Overall FNR (younger females): 0.6666666666666666
Overall FNR (older males): 0.5


Using the `all_intersect_fnrs` function, we can find the false negative rates for all intersectional groups represented in the dataset.

In [7]:
all_fnrs = fm.all_intersect_fnrs(subject_labels_dict=all_group_labels_dict,
                                 predictions=model_predictions,
                                 true_statuses=true_diagnoses)
print("FNR in different groups:", all_fnrs)

FNR in different groups: {'Older + Female': 0.2857142857142857, 'Older + Male': 0.5, 'Younger + Female': 0.6666666666666666, 'Younger + Male': 0.5}


The `max_intersect_fnr_diff` and `max_intersect_fnr_ratio` functions allow us to find the greatest disparity in false negative rates between intersectional groups.

In [8]:
max_fnr_diff = fm.max_intersect_fnr_diff(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses)
print("Maximum intersectional FNR difference:", max_fnr_diff)

max_fnr_rat = fm.max_intersect_fnr_ratio(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses,
                  natural_log=False)
print("Maximum intersectional FNR ratio:", max_fnr_rat)

max_fnr_rat_log = fm.max_intersect_fnr_ratio(
                      subject_labels_dict=all_group_labels_dict,
                      predictions=model_predictions,
                      true_statuses=true_diagnoses,
                      natural_log=True)
print("Natural log of maximum intersectional FNR ratio:",
      max_fnr_rat_log)

Maximum intersectional FNR difference: 0.38095238095238093
Maximum intersectional FNR ratio: 2.3333333333333335
Natural log of maximum intersectional FNR ratio: 0.8472978603872037


## False Positive Rates

The false negative rate (FNR) of a predictive model is the proportion of people who do not actually have the disease who are assiged a positive prediction by the model. The `fairness.metrics` module provides functions for evaluating and comparing the FPR of a model for different intersectional groups.

The `intersect_fpr` function allows us to input multiple group labels, and find the false positive rate of a model for people who fall into all of the specified categories.

In [9]:
fpr_younger_female = fm.intersect_fpr(
                         group_labels_dict={'Age': 'Younger', 'Sex': 'Female'},
                         subject_labels_dict = all_group_labels_dict,
                         predictions = model_predictions,
                         true_statuses = true_diagnoses)

print("Overall FPR (younger females):", fpr_younger_female)

fpr_older_male = fm.intersect_fpr(
                     group_labels_dict={'Age': 'Older', 'Sex': 'Male'},
                     subject_labels_dict = all_group_labels_dict,
                     predictions = model_predictions,
                     true_statuses = true_diagnoses)

print("Overall FPR (older males):", fpr_older_male)

Overall FPR (younger females): 0.0
Overall FPR (older males): 0.3333333333333333


Using the `all_intersect_fprs` function, we can find the false positive rates for all intersectional groups represented in the dataset.

In [10]:
all_fprs = fm.all_intersect_fprs(subject_labels_dict=all_group_labels_dict,
                                 predictions=model_predictions,
                                 true_statuses=true_diagnoses)
print("FPR in different groups:", all_fprs)

FPR in different groups: {'Older + Female': 1.0, 'Older + Male': 0.3333333333333333, 'Younger + Female': 0.0, 'Younger + Male': 0.16666666666666666}


The `max_intersect_fpr_diff` and `max_intersect_fpr_ratio` functions allow us to find the greatest disparity in false positive rates between intersectional groups.

In [11]:
max_fpr_diff = fm.max_intersect_fpr_diff(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses)
print("Maximum intersectional FPR difference:", max_fpr_diff)

max_fpr_rat = fm.max_intersect_fpr_ratio(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses,
                  natural_log=False)
print("Maximum intersectional FPR ratio:", max_fpr_rat)

max_fpr_rat_log = fm.max_intersect_fpr_ratio(
                      subject_labels_dict=all_group_labels_dict,
                      predictions=model_predictions,
                      true_statuses=true_diagnoses,
                      natural_log=True)
print("Natural log of maximum intersectional FPR ratio:",
      max_fpr_rat_log)

Maximum intersectional FPR difference: 1.0
Maximum intersectional FPR ratio: nan
Natural log of maximum intersectional FPR ratio: nan


Here, the maximum intersectional FPR ratio is undefined because younger females have a false positive rate of 0.

## False Omission Rates

The false omission rate (FOR) of a predictive model is the proportion of people who receive a negative test result who do in fact have the disease. The `fairness.metrics` module provides functions for evaluating and comparing the FOR of a model for different intersectional groups.

The `intersect_for` function allows us to input multiple group labels, and find the false omission rate of a model for people who fall into all of the specified categories.

In [12]:
for_younger_female = fm.intersect_for(
                         group_labels_dict={'Age': 'Younger', 'Sex': 'Female'},
                         subject_labels_dict = all_group_labels_dict,
                         predictions = model_predictions,
                         true_statuses = true_diagnoses)

print("Overall FOR (younger females):", for_younger_female)

for_older_male = fm.intersect_for(
                     group_labels_dict={'Age': 'Older', 'Sex': 'Male'},
                     subject_labels_dict = all_group_labels_dict,
                     predictions = model_predictions,
                     true_statuses = true_diagnoses)

print("Overall FOR (older males):", for_older_male)

Overall FOR (younger females): 0.5
Overall FOR (older males): 0.5


Using the `all_intersect_fors` function, we can find the false omission rates for all intersectional groups represented in the dataset.

In [13]:
all_fors = fm.all_intersect_fors(subject_labels_dict=all_group_labels_dict,
                                 predictions=model_predictions,
                                 true_statuses=true_diagnoses)
print("FOR in different groups:", all_fors)

FOR in different groups: {'Older + Female': 1.0, 'Older + Male': 0.5, 'Younger + Female': 0.5, 'Younger + Male': 0.16666666666666666}


The `max_intersect_for_diff` and `max_intersect_for_ratio` functions allow us to find the greatest disparity in false omission rates between intersectional groups.

In [14]:
max_for_diff = fm.max_intersect_for_diff(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses)
print("Maximum intersectional FOR difference:", max_for_diff)

max_for_rat = fm.max_intersect_for_ratio(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses,
                  natural_log=False)
print("Maximum intersectional FOR ratio:", max_for_rat)

max_for_rat_log = fm.max_intersect_for_ratio(
                      subject_labels_dict=all_group_labels_dict,
                      predictions=model_predictions,
                      true_statuses=true_diagnoses,
                      natural_log=True)
print("Natural log of maximum intersectional FOR ratio:",
      max_for_rat_log)

Maximum intersectional FOR difference: 0.8333333333333334
Maximum intersectional FOR ratio: 6.0
Natural log of maximum intersectional FOR ratio: 1.791759469228055


## False Discovery Rates

The false discovery rate (FDR) of a predictive model is the proportion of people who receive a negative test result who do in fact have the disease. The `fairness.metrics` module provides functions for evaluating and comparing the FDR of a model for different intersectional groups.

The `intersect_fdr` function allows us to input multiple group labels, and find the false discovery rate of a model for people who fall into all of the specified categories.

In [15]:
fdr_younger_female = fm.intersect_fdr(
                         group_labels_dict={'Age': 'Younger', 'Sex': 'Female'},
                         subject_labels_dict = all_group_labels_dict,
                         predictions = model_predictions,
                         true_statuses = true_diagnoses)

print("Overall FDR (younger females):", fdr_younger_female)

fdr_older_male = fm.intersect_fdr(
                     group_labels_dict={'Age': 'Older', 'Sex': 'Male'},
                     subject_labels_dict = all_group_labels_dict,
                     predictions = model_predictions,
                     true_statuses = true_diagnoses)

print("Overall FDR (older males):", fdr_older_male)

Overall FDR (younger females): 0.0
Overall FDR (older males): 0.3333333333333333


Using the `all_intersect_fdrs` function, we can find the false discovery rates for all intersectional groups represented in the dataset.

In [16]:
all_fdrs = fm.all_intersect_fdrs(subject_labels_dict=all_group_labels_dict,
                                 predictions=model_predictions,
                                 true_statuses=true_diagnoses)
print("FDR in different groups:", all_fdrs)

FDR in different groups: {'Older + Female': 0.375, 'Older + Male': 0.3333333333333333, 'Younger + Female': 0.0, 'Younger + Male': 0.5}


The `max_intersect_fdr_diff` and `max_intersect_fdr_ratio` functions allow us to find the greatest disparity in false discovery rates between intersectional groups.

In [17]:
max_fdr_diff = fm.max_intersect_fdr_diff(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses)
print("Maximum intersectional FDR difference:", max_fdr_diff)

max_fdr_rat = fm.max_intersect_fdr_ratio(
                  subject_labels_dict=all_group_labels_dict,
                  predictions=model_predictions,
                  true_statuses=true_diagnoses,
                  natural_log=False)
print("Maximum intersectional FOR ratio:", max_fdr_rat)

max_fdr_rat_log = fm.max_intersect_fdr_ratio(
                      subject_labels_dict=all_group_labels_dict,
                      predictions=model_predictions,
                      true_statuses=true_diagnoses,
                      natural_log=True)
print("Natural log of maximum intersectional FDR ratio:",
      max_fdr_rat_log)

Maximum intersectional FDR difference: 0.5
Maximum intersectional FOR ratio: nan
Natural log of maximum intersectional FDR ratio: nan


Here, the maximum intersectional FDR ratio is undefined because younger females have a false discovery rate of 0.