## Fairness Analysis of NamSor's Gender API Endpoint using Aequitas

In [1]:
import pandas as pd
import seaborn as sns
from aequitas.group import Group
from aequitas.bias import Bias
from aequitas.fairness import Fairness
from aequitas.plotting import Plot

# import warnings; warnings.simplefilter('ignore')

%matplotlib inline

In [2]:
df = pd.read_csv("data/compas_gender_predictions.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,entity_id,first,last,sex,sex_pred,race,score,label_value
0,0,1,miguel,hernandez,Male,Male,Other,0.999286,1
1,1,3,kevon,dixon,Male,Male,African-American,0.95672,1
2,2,4,ed,philo,Male,Male,African-American,0.968813,1
3,3,5,marcu,brown,Male,Male,African-American,0.622665,1
4,4,6,bouthy,pierrelouis,Male,Male,Other,0.509131,1


In [3]:
# Non String columns will lead to problems later so we have to find out if there are any
non_attr_cols = ['id', 'model_id', 'entity_id', 'score', 'label_value', 'rank_abs', 'rank_pct']
attr_cols = df.columns[~df.columns.isin(non_attr_cols)]  # index of the columns that are
df.columns[(df.dtypes != object) & (df.dtypes != str) & (df.columns.isin(attr_cols))]

Index(['Unnamed: 0'], dtype='object')

In [4]:
# And delete them.
df = df.drop(['Unnamed: 0'], axis=1)
df.head()

Unnamed: 0,entity_id,first,last,sex,sex_pred,race,score,label_value
0,1,miguel,hernandez,Male,Male,Other,0.999286,1
1,3,kevon,dixon,Male,Male,African-American,0.95672,1
2,4,ed,philo,Male,Male,African-American,0.968813,1
3,5,marcu,brown,Male,Male,African-American,0.622665,1
4,6,bouthy,pierrelouis,Male,Male,Other,0.509131,1


In [5]:
df.shape

(7214, 8)

In [73]:
g = Group()
xtab, _ = g.get_crosstabs(df, attr_cols=["sex"], score_thresholds= {'score': [0.95]})

model_id, score_thresholds 0 {'score': [0.95]}


In [74]:
absolute_metrics = g.list_absolute_metrics(xtab)

In [75]:
t = 0.95

In [76]:
f_pp = df[((df['sex'] == 'Female') & (df['score'] >= t))]
f_pp.count()

entity_id      1109
first          1109
last           1109
sex            1109
sex_pred       1109
race           1109
score          1109
label_value    1109
dtype: int64

In [77]:
f_pn = df[((df['sex'] == 'Female') & (df['score'] < t))]
f_pn.count()

entity_id      286
first          286
last           286
sex            286
sex_pred       286
race           286
score          286
label_value    286
dtype: int64

In [78]:
f_p = df[((df['sex'] == 'Female') & (df['label_value'] == 1))]
f_p.count()

entity_id      1289
first          1289
last           1289
sex            1289
sex_pred       1289
race           1289
score          1289
label_value    1289
dtype: int64

In [79]:
f_n = df[((df['sex'] == 'Female') & (df['label_value'] == 0))]
f_n.count()

entity_id      106
first          106
last           106
sex            106
sex_pred       106
race           106
score          106
label_value    106
dtype: int64

In [80]:
f_tn = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] < t))]
f_tn.count()

entity_id      64
first          64
last           64
sex            64
sex_pred       64
race           64
score          64
label_value    64
dtype: int64

In [81]:
f_tp = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] >= t))]
f_tp.count()

entity_id      1067
first          1067
last           1067
sex            1067
sex_pred       1067
race           1067
score          1067
label_value    1067
dtype: int64

In [82]:
f_fn = df[((df['sex'] == 'Female') & (df['label_value'] == 1) & (df['score'] < t))]
f_fn.count()

entity_id      222
first          222
last           222
sex            222
sex_pred       222
race           222
score          222
label_value    222
dtype: int64

In [83]:
f_fp = df[((df['sex'] == 'Female') & (df['label_value'] == 0) & (df['score'] >= t))]
f_fp.count()

entity_id      42
first          42
last           42
sex            42
sex_pred       42
race           42
score          42
label_value    42
dtype: int64

In [84]:
xtab[[col for col in xtab.columns if col not in absolute_metrics]]

Unnamed: 0,model_id,score_threshold,k,attribute_name,attribute_value,pp,pn,fp,fn,tn,tp,group_label_pos,group_label_neg,group_size,total_entities
0,0,0.95_ore,1552,sex,Female,286,1109,64,1067,42,222,1289,106,1395,7214
1,0,0.95_ore,1552,sex,Male,1266,4553,204,4411,142,1062,5473,346,5819,7214


In [None]:
# This looks very weird. See https://github.com/dssg/aequitas/issues/84
# To be continued ...

In [85]:
xtab[['attribute_name', 'attribute_value'] + absolute_metrics].round(2)

Unnamed: 0,attribute_name,attribute_value,tpr,tnr,for,fdr,fpr,fnr,npv,precision,ppr,pprev,prev
0,sex,Female,0.17,0.4,0.96,0.22,0.6,0.83,0.04,0.78,0.18,0.21,0.92
1,sex,Male,0.19,0.41,0.97,0.16,0.59,0.81,0.03,0.84,0.82,0.22,0.94


In [None]:
fnr = aqp.plot_group_metric(xtab, 'fnr')

In [None]:
fnr_original = aqp.plot_group_metric(xtab_original, 'fnr')

### View group metrics for only groups over a certain size threshold
Extremely small group sizes increase standard error of estimates, and could be factors in prediction error such as false negatives. Use the `min_group` parameter to vizualize only those sample population groups above a user-specified percentage of the total sample size. When we remove groups below 5% of the sample size, we are left with only two of the six 'race' groups, as there are much smaller groups in that attribute category than in 'sex' or 'age_cat' (age cateogry). 

In [None]:
fnr = aqp.plot_group_metric(xtab, 'fnr', min_group_size=0.05)

In [None]:
fnr_original = aqp.plot_group_metric(xtab_original, 'fnr', min_group_size=0.05)

### Visualizing multiple user-specified absolute group metrics across all population groups

The charts below display the all calculated group metrics across each attribute, colored based on absolute metric magnitude. The group size is included in parentheses for context.

We can see that the largest 'race' group, African Americans, are predicted positive more often than any other race group (predicted positive rate `PPR` of 0.66), and are more likely to be incorrectly classified as 'high' risk (false positive rate `FPR` of 0.45) than incorrectly classified as 'low' or 'medium' risk (false negative rate `FNR` of 0.28). Note that Native Americans are predicted positive at a higher _prevalence_ `PPREV`in relation to their group size than all other 'race groups' (predicted prevalence of 0.67). 

In [None]:
p = aqp.plot_group_metric_all(xtab, metrics=['ppr','pprev','fnr','fpr'], ncols=4)

In [None]:
p_original = aqp.plot_group_metric_all(xtab_original, metrics=['ppr','pprev','fnr','fpr'], ncols=4)

### Visualizing default absolute group metrics across all population groups
#### Default absolute group metrics
When visualizing more than one absolute group metric, you can specify a list of metrics, specify `'all'` metrics, or use the Aequitas default metrics by not supplying an argument:
- Predicted Positive Group Rate Disparity (pprev), 
- Predicted Positive Rate Disparity (ppr),  
- False Discovery Rate (fdr), 
- False Omission Rate (for)
- False Positive Rate (fpr)
- False Negative Rate (fnr)

The charts below display the default group metrics calculated across each attribute, colored based on number of samples in the attribute group. 

Note that the 45+ age category is almost twice as likely to be incorrectly included in an intervention group (false discovery rate `FDR` of 0.46) than incorrectly excluded from intervention (false omission rate `FOR` 0.24). We can also see that the model is equally likely to predict a woman as 'high' risk as it is for a man (false positive rate `FPR` of 0.32 for both Male and Female).

In [None]:
a = aqp.plot_group_metric_all(xtab, ncols=3)

In [None]:
a_original = aqp.plot_group_metric_all(xtab_original, ncols=3)

[Back to Top](#top_cell)
<a id='disparities'></a>

## What levels of disparity exist between population groups?

### _Aequitas Bias() Class_
We use the Aequitas `Bias()` class to calculate disparities between groups based on the crosstab returned by the `Group()` class **`get_crosstabs()`** method described above. Disparities are calculated as a ratio of a metric for a group of interest compared to a base group. For example, the False Negative Rate Disparity for black defendants vis-a-vis whites is:
$$Disparity_{FNR} =  \frac{FNR_{black}}{FNR_{white}}$$ 

Below, we use **`get_disparity_predefined_groups()`** which allows us to choose reference groups that clarify the output for the practitioner. 

The Aequitas `Bias()` class includes two additional get disparity functions: **`get_disparity_major_group()`** and **`get_disparity_min_metric()`**, which automate base group selection based on sample majority (across each attribute) and minimum value for each calculated bias metric, respectively.  

The **`get_disparity_predefined_groups()`** allows user to define a base group for each attribute, as illustrated below. 

#### Disparities Calculated Calcuated:

| Metric | Column Name |
| --- | --- |
| True Positive Rate Disparity | 'tpr_disprity' |
| True Negative Rate | 'tnr_disparity' |
| False Omission Rate | 'for_disparity' |
| False Discovery Rate | 'fdr_disparity' |
| False Positive Rate | 'fpr_disparity' |
| False NegativeRate | 'fnr_disparity' |
| Negative Predictive Value | 'npv_disparity' |
| Precision Disparity | 'precision_disparity' |
| Predicted Positive Ratio$_k$ Disparity | 'ppr_disparity' |
| Predicted Positive Ratio$_g$ Disparity | 'pprev_disparity' |


Columns for each disparity are appended to the crosstab dataframe, along with a column indicating the reference group for each calculated metric (denoted by `[METRIC NAME]_ref_group_value`). We see a slice of the dataframe with calculated metrics in the next section.

In [None]:
b = Bias()

#### Disparities calculated in relation to a user-specified group for each attribute

In [None]:
bdf = b.get_disparity_predefined_groups(xtab, original_df=df, 
                                        ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'}, 
                                        alpha=0.05, check_significance=True, 
                                        mask_significance=True)
bdf.style

In [None]:
bdf_original = b.get_disparity_predefined_groups(xtab_original, original_df=df_original, 
                                        ref_groups_dict={'race':'Caucasian', 'sex':'Male', 'age_cat':'25 - 45'}, 
                                        alpha=0.05, check_significance=True, 
                                        mask_significance=True)
bdf_original.style

The `Bias()` class includes a method to quickly return a list of calculated disparities from the dataframe returned by the **`get_disparity_`** methods.

In [None]:
# View disparity metrics added to dataframe
bdf[['attribute_name', 'attribute_value'] +
     b.list_disparities(bdf) + b.list_significance(bdf)].style

In [None]:
bdf_original[['attribute_name', 'attribute_value'] +
     b.list_disparities(bdf_original) + b.list_significance(bdf_original)].style

[Back to Top](#top_cell)
<a id='interpret_disp'></a>

### How do I interpret calculated disparity ratios?
The calculated disparities from the dataframe returned by the `Bias()` class **`get_disparity_`** methods are in relation to a reference group, which will always have a disparity of 1.0.

The differences in False Positive Rates, noted in the discussion of the `Group()` class above, are clarified using the disparity ratio (`fpr_disparity`). Black people are falsely identified as being high or medium risks 1.9 times the rate for white people. 

As seen above, False Discovery Rates have much less disparity (`fdr_disparity`), or fraction of false postives over predicted positive in a group. As reference groups have disparity = 1 by design in Aequitas, the lower disparity is highlighted by the `fdr_disparity` value close to 1.0 (0.906) for the race attribute group 'African-American' when disparities are calculated using predefined base group 'Caucasian'. Note that COMPAS is calibrated to  balance False Positive Rate and False Discovery Rates across groups.

[Back to Top](#top_cell)
<a id='disparity_calc'></a>

### How does the selected reference group affect disparity calculations?

Disparities calculated in the the Aequitas `Bias()` class based on the crosstab returned by the `Group()` class **`get_crosstabs()`** method can be derived using several different base gorups. In addition to using user-specified groups illustrated above, Aequitas can automate base group selection based on dataset characterisitcs:

#### Evaluating disparities calculated in relation to a different 'race' reference group
Changing even one attribute in the predefined groups will alter calculated disparities. When a differnet pre-defined group 'Hispanic' is used, we can see that Black people are 2.1 times more likely to be falsely identified as being high or medium risks as Hispanic people are (compared to 1.9 times more likely than white people), and even less likely to be falsely identified as low risk when compared to Hispanic people rather than white people.

In [None]:
hbdf = b.get_disparity_predefined_groups(xtab, original_df=df, 
                                         ref_groups_dict={'race':'Hispanic', 'sex':'Male', 'age_cat':'25 - 45'},
                                         alpha=0.05,
                                         check_significance=True,
                                         mask_significance=False, 
                                         selected_significance=['fpr', 'for', 'fdr'])

In [None]:
hbdf_original = b.get_disparity_predefined_groups(xtab_original, original_df=df_original, 
                                         ref_groups_dict={'race':'Hispanic', 'sex':'Male', 'age_cat':'25 - 45'},
                                         alpha=0.05,
                                         check_significance=True,
                                         mask_significance=False, 
                                         selected_significance=['fpr', 'for', 'fdr'])

In [None]:
# View disparity metrics added to dataframe
hbdf[['attribute_name', 'attribute_value'] +  
     b.list_disparities(hbdf) + b.list_significance(hbdf)]

In [None]:
hbdf_original[['attribute_name', 'attribute_value'] +  
     b.list_disparities(hbdf_original) + b.list_significance(hbdf_original)]

#### Disparities calculated in relation to sample population majority group (in terms of group prevalence) for each attribute
The majority population groups for each attribute ('race', 'sex', 'age_cat') in the COMPAS dataset are 'African American', 'Male', and '25 - 45'. Using the **`get_disparity_major_group()`** method of calculation allows researchers to quickly evaluate how much more (or less often) other groups are falsely or correctly identified as high- or medium-risk in relation to the group they have the most data on.

In [None]:
majority_bdf = b.get_disparity_major_group(xtab, original_df=df)

In [None]:
majority_bdf_original = b.get_disparity_major_group(xtab_original, original_df=df_original)

In [None]:
majority_bdf[['attribute_name', 'attribute_value'] +  b.list_disparities(majority_bdf)]

In [None]:
majority_bdf_original[['attribute_name', 'attribute_value'] +  b.list_disparities(majority_bdf_original)]

#### Disparities calculated in relation to the minimum value for each metric

When you do not have a pre-existing don’t frame of reference or policy context for the dataset (ex: Caucasians or males historically favored), you may choose to view disparities in relation to the group with the lowest value for every disparity metric, as then every group's value will be at least 1.0, and relationships can be evaluated more linearly.


Note that disparities are much more varied, and may have larger magnitude, when the minimum value per metric is used as a reference group versus one of the other two methods.

In [None]:
min_metric_bdf = b.get_disparity_min_metric(df=xtab, original_df=df,
                                            check_significance=True)
min_metric_bdf.style

In [None]:
min_metric_bdf_original = b.get_disparity_min_metric(df=xtab_original, original_df=df_original,
                                            check_significance=True)
min_metric_bdf_original.style

[Back to Top](#top_cell)
<a id='disparity_viz'></a>

## How do I visualize disparities in my model?
To visualize disparities in the dataframe returned by one of the `Bias()` class **`get_disparity_`** methods use one of two methods in the Aequitas `Plot()` class:

A particular disparity metric can be specified with **`plot_disparity()`**. To plot a single disparity, a metric and an attribute must be specified.

Disparities related to a list of particular metrics of interest or `'all'` metrics can be plotted with **`plot_disparity_all()`**.  At least one metric or at least one attribute must be specified when plotting multiple disparities (or the same disparity across multiple attributes). For example, to plot PPR and and Precision disparity for all attributes, specify `metrics=['ppr', 'precision']` with no attribute specified, and to plot default metrics by race, specify `attributes=['race']` and with no metrics specified.

**Reference groups are displayed in grey, and always have a disparity = 1.** Note that disparities greater than 10x reference group will are visualized as 10x, and disparities less than 0.1x reference group are visualized as 0.1x.

Statistical siginificance (at a default value of 0.05) is denoted by two asterisks (**) next to a treemap square's value.

### Visualizing disparities between groups in a single user-specified attribute for a single user-specified disparity metric

The treemap below displays precision disparity values calculated using a predefined group, in this case the 'Caucasian' group within the race attribute, sized based on the group size and colored based on disparity magnitude. We can see from asterisks that the disparities between the 'Caucasian' reference population group and both the 'African-American' and 'Other' race population groups are statistically significant at the 5% level.

**Note**: Groups are visualized at no less than 0.1 times the size of the reference group, and no more than 10 times the size of the reference group.

In [None]:
aqp.plot_disparity(bdf, group_metric='fpr_disparity', attribute_name='race', significance_alpha=0.05)

In [None]:
aqp.plot_disparity(bdf_original, group_metric='fpr_disparity', attribute_name='race', significance_alpha=0.05)

When another group, 'Hispanic', is the reference group, the colors change to indicate higher or lower disparity in relation to that group. Treemap square sizes may also be adjusted, as group size limits for visualization are in relation to the reference group (minimum 0.1 times reference group size and maximum 10 times the reference group size).

In [None]:
aqp.plot_disparity(hbdf, group_metric='fpr_disparity', attribute_name='race', significance_alpha=0.05)

In [None]:
aqp.plot_disparity(hbdf_original, group_metric='fpr_disparity', attribute_name='race', significance_alpha=0.05)

### Visualizing disparities between all groups for a single user-specified disparity metric

The treemaps belows display False Positive Rate disparities calculated based on predefined reference groups ('race' attribute: Hispanic, 'sex' attribute: Male, 'age_cat' attribute: 25-45), sized based on group size, and colored based on disparity magnitude. 

It is clear that the majority of samples in the data are African-American, male, and 25-45 for the 'race', 'sex', and age category attributes, respectively. Based on the lighter colors in the treemaps, we see that there is precision disparity relatively close to 1 (a disparity of 1 indicates no disparity) across all attributes.

In [None]:
j = aqp.plot_disparity_all(majority_bdf, metrics=['precision_disparity'], significance_alpha=0.05)

In [None]:
j_original = aqp.plot_disparity_all(majority_bdf_original, metrics=['precision_disparity'], significance_alpha=0.05)

### Visualizing disparities between groups in a single user-specified attribute for default metrics
##### Default Metrics
When visualizing more than one disparity, you can specify a list of disparity metrics, `'all'` disaprity metrics, or use the Aequitas default disparity metrics by not supplying an argument:
- Predicted Positive Group Rate Disparity (pprev_disparity),
- Predicted Positive Rate Disparity (ppr_disparity),
- False Discovery Rate Disparity (fdr_disparity),
- False Omission Rate Disparity (for_disparity)
- False Positive Rate Disparity (fpr_disparity)
- False Negative Rate Disparity (fnr_disparity)

The treemaps below display the default disparities between 'age_cat' groups calculated based on the minimum value of each metric, colored based on disparity magnitude. We can see based on coloring that there is a lower level of false discovery rate disparity ('fdr_disparity') between age categories than predicted positive group rate disparity or ('pprev_disparity') predicted positive rate disparity ('ppr_disparity').

In [None]:
min_met = aqp.plot_disparity_all(min_metric_bdf, attributes=['age_cat'], significance_alpha=0.05)

### Visualizing disparities between groups in a single user-specified attribute for all calculated disparity metrics

The treemaps below display disparities between 'race' attribute groups calculated based on predefined reference groups ('race' attribute: Hispanic, 'sex' attribute: Male, 'age_cat' attribute: 25-45) for all 10 disparity metrics, colored based on disparity magnitude.

In [None]:
tm_capped = aqp.plot_disparity_all(hbdf, attributes=['race'], metrics = 'all', significance_alpha=0.05)

In [None]:
tm_capped_original = aqp.plot_disparity_all(hbdf_original, attributes=['race'], metrics = 'all', significance_alpha=0.05)

### Visualizing disparity between all groups for multiple user-specified disparity metrics

The treemaps below display False Omission Rate and False Positive Rate disparities (calculated in relation to the sample majority group for each attribute) between groups acorss all three attributes, colored based on disparity magnitude.

We see that several groups (Asian, Native American) have a much lower false omission rate than African Americans, with fairly close false omission rates between the sexes and the two older oldest age groups. Though there are many more men in the sample, the two groups have nearly identical false positive rates, while color tells us that there are larger false positive rate disparities between races and age categories than false omission rate disparity.

In [None]:
dp = aqp.plot_disparity_all(majority_bdf, metrics=['for_disparity', 'fpr_disparity'], significance_alpha=0.05)

In [None]:
dp_original = aqp.plot_disparity_all(majority_bdf_original, metrics=['for_disparity', 'fpr_disparity'], significance_alpha=0.05)

[Back to Top](#top_cell)
<a id='fairness'></a>

## How do I assess model fairness?

### _Aequitas Fairness() Class_
Finally, the Aequitas `Fairness()` class provides three functions that provide a high level summary of fairness. This class builds on the dataframe returned from one of the three `Bias()` class **`get_dispariy_`** methods. 

Using FPR disparity as an example and the default fairness threshold, we have:

$$ 0.8 < Disparity_{FNR} =  \frac{FPR_{group}}{FPR_{base group}} < 1.25 $$ 

We can assess fairness at various levels of detail:

### Group Level Fairness
When the `label_value` column is not included in the original data set, Aequitas calculates only Statistical Parity and Impact Parities.

When the `label_value` is included in the original data set, the **`get_group_value_fairness()`** function builds on the previous dataframe. The **`get_group_value_fairness()`**
function gives us attribute group-level statistics for fairness determinations:

#### Pairities Calcuated:

| Parity | Column Name |
| --- | --- |
| True Positive Rate Parity | 'TPR Parity' |
| True Negative Rate Parity | 'TNR Parity' |
| False Omission Rate Parity | 'FOR Parity' |
| False Discovery Rate Parity | 'FDR Parity' |
| False Positive Rate Parity | 'FPR Parity' |
| False Negative Rate Parity | 'FNR Parity' |
| Negative Predictive Value Parity | 'NPV Parity' |
| Precision Parity | 'Precision Parity' |
| Predicted Positive Ratio$_k$ Parity | 'Statistical Parity' |
| Predicted Positive Ratio$_g$ Parity | 'Impact Parity' |

#### Also assessed:
- **_Type I Parity_**: Fairness in both FDR Parity and FPR Parity
- **_Type II Parity_**: Fairness in both FOR Parity and FNR Parity
- **_Equalized Odds_**: Fairness in both FPR Parity and TPR Parity
- **_Unsupervised Fairness_**: Fairness in both Statistical Parity and Impact Parity
- **_Supervised Fairness_**: Fairness in both Type I and Type II Parity
- **_Overall Fairness_**: Fairness across all parities for all attributes

In [None]:
f = Fairness()
fdf = f.get_group_value_fairness(bdf)
fdf_original = f.get_group_value_fairness(bdf_original)

The `Fairness()` class includes a method to quickly return a list of fairness determinations from the dataframe returned by the **`get_group_value_fairness()`** method.

In [None]:
parity_detrminations = f.list_parities(fdf)
parity_detrminations_original = f.list_parities(fdf_original)

In [None]:
fdf[['attribute_name', 'attribute_value'] + absolute_metrics + b.list_disparities(fdf) + parity_detrminations].style

In [None]:
fdf_original[['attribute_name', 'attribute_value'] + absolute_metrics + b.list_disparities(fdf_original) + parity_detrminations].style

[Back to Top](#top_cell)
<a id='interpret_fairness'></a>

### How do I interpret parities?
Calling the Aequitas `Fairness()` class **`get_group_value_fairness()`** method on the dataframe returned from a `Bias()` class `get_dispariy` method will return the dataframe with additional columns indicating parities, as seen in the slice of the `get_group_value_fairness` data frame directly above.

In this case, our base groups are Caucasian for race, Male for gender, and 25-45 for age_cat. By construction, the base group has supervised fairness. (The disparity ratio is 1). Relative to the base groups, the COMPAS predictions only provide supervised fairness to one group, Hispanic.

Above, the African-American false omission and false discovery are within the bounds of fairness. This result is expected because COMPAS is calibrated. (Given calibration, it is surprising that Asian and Native American rates are so low. This may be a matter of having few observations for these groups.)

On the other hand, African-Americans are roughly twice as likely to have false positives and 40 percent less likely to false negatives. In real terms, 44.8% of African-Americans who did not recidivate were marked high or medium risk (with potential for associated penalties), compared with 23.4% of Caucasian non-reoffenders. This is unfair and is marked False below.

These findings mark an inherent trade-off between FPR Fairness, FNR Fairness and calibration, which is present in any decision system where base rates are not equal. See [Chouldechova (2017)](https://www.andrew.cmu.edu/user/achoulde/files/disparate_impact.pdf). Aequitas helps bring this trade-off to the forefront with clear metrics and asks system designers to make a reasoned decision based on their use case.

### Attribute Level Fairness
Use the **`get_group_attribute_fairness()`** function to view only the calculated parities from the **`get_group_value_fairness()`** function at the attribute level.

In [None]:
gaf = f.get_group_attribute_fairness(fdf)
gaf

In [None]:
gaf_original = f.get_group_attribute_fairness(fdf_original)
gaf_original

### Overall Fairness
The **`get_overall_fairness()`** function gives a quick boolean assessment of the output of **`get_group_value_fairness()`** or **`get_group_attribute_fairness()`**, returning a dictionary with a determination across all attributes for each of:
- Unsupervised Fairness
- Supervised Fairness
- Overall Fairness

In [None]:
gof = f.get_overall_fairness(fdf)
gof

In [None]:
gof_original = f.get_overall_fairness(fdf_original)
gof_original

[Back to Top](#top_cell)
<a id='fairness_group_viz'></a>

## How do I visualize bias metric parity?
Once you have run the `Group()` class to retrieve a crosstab of absolute group value bias metrics, added calculdated disparities via one of the the `Bias()` class **`get_disparity`** functions, and added parity determinations via the `Fairness()` class **`get_group_value_fairness()`** or **`get_group_attribute_fairness()`** method, you are ready to visualize biases and disparities in terms of fairness determination.

For visualizing absolute metric fairness with the the Aequitas `Plot()` class, a particular metric can be specified with **`plot_fairness_group()`**. A list of particular metrics of interest or 'all' metrics can be plotted with **`plot_fairness_group_all()`**.

### Visualizing parity of a single absolute group metric across all population groups

The chart below displays absolute group metric Predicted Positive Rate Disparity (ppr) across each attribute, colored based on fairness determination for that attribute group (green = 'True' and red = 'False'). 

We can see from the green color that only the 25-45 age group, Male category, and Caucasian groups have been determined to be fair. Sound familiar? They should! These are the groups selected as reference groups, so this model is not fair in terms of Statistical Parity for any of the other groups.

In [None]:
z = aqp.plot_fairness_group(fdf, group_metric='ppr')

In [None]:
z_original = aqp.plot_fairness_group(fdf_original, group_metric='ppr')

### Visualizing all absolute group metrics across all population groups
The charts below display all calculated absolute group metrics across each attribute, colored based on fairness determination for that attribute group (green = 'True' and red = 'False'). 

Immediately we can see that negative predictive parity status is 'True' for all population groups, and that only two groups had a 'False' determination for true negative parity. 

In [None]:
fg = aqp.plot_fairness_group_all(fdf, ncols=5, metrics = "all")

In [None]:
fg_original = aqp.plot_fairness_group_all(fdf_original, ncols=5, metrics = "all")

[Back to Top](#top_cell)
<a id='fairness_disp_viz'></a>

## How do I visualize parity between groups in my model? 
To visualize disparity fairness based on the dataframe returned from the Fairness() class **`get_group_value_fairness()`** method, a particular disparity metric can be specified with the **`plot_fairness_disparity()`** method in the the Aequitas `Plot()` class. To plot a single disparity, a metric and an attribute must be specified.

Disparities related to a list of particular metrics of interest or `'all'` metrics can be plotted with **`plot_fairness_disparity_all()`**. At least one metric or at least one attribute **must** be specified when plotting multiple fairness disparities (or the same disparity across multiple attributes).

### Visualizing parity between groups in a single user-specified attribute for all calculated disparity metrics

The treemap below displays False Discovery Rate disparity values between race attribute groups calculated based on a predefined reference group ('Caucasian'), colored based on fairness determination for that attribute group (green = 'True' and red = 'False'). We see very quickly that only two groups have a 'False' parity determination.

In [None]:
m = aqp.plot_fairness_disparity(fdf, group_metric='fdr', attribute_name='race')

In [None]:
m_original = aqp.plot_fairness_disparity(fdf_original, group_metric='fdr', attribute_name='race')

In [None]:
fpr = aqp.plot_fairness_disparity(fdf, group_metric='fpr', attribute_name='race')

In [None]:
fpr_original = aqp.plot_fairness_disparity(fdf_original, group_metric='fpr', attribute_name='race')

### Researcher Check: Could the unfairness I am seeing be related to small group sizes in my sample?

Use the `min_group` parameter on all visualization methods to vizualize parities for only those sample population groups above a user-specified percentage of the total sample size. Note that only the smallest groups had an 'False' determination for false discovery rate parity above. The parity determination is 'True' for all groups at least 1% of the sample size .

In [None]:
m = aqp.plot_fairness_disparity(fdf, group_metric='fdr', attribute_name='race', 
                                min_group_size=0.01, significance_alpha=0.05)

In [None]:
m_original = aqp.plot_fairness_disparity(fdf_original, group_metric='fdr', attribute_name='race', 
                                min_group_size=0.01, significance_alpha=0.05)

### Visualizing parity between groups in a single user-specified attribute for all calculated disparity metrics

The treemaps below display disparities between race attribute groups calculated based on a predefined reference group ('Caucasian') for all 10 disparity metrics, colored based on fairness determination for that attribute group (green = 'True' and red = 'False').

As all treemap squares are sized and positioned based on group size, the population groups on all subplots are found in the same place across all disparity metrics, allowing for ease of comparison of fairness determinations for each 'race' group across every calculated dipsarity metric.

In [None]:
a_tm = aqp.plot_fairness_disparity_all(fdf, attributes=['race'], metrics='all', 
                                       significance_alpha=0.05)

In [None]:
a_tm_original = aqp.plot_fairness_disparity_all(fdf_original, attributes=['race'], metrics='all', 
                                       significance_alpha=0.05)

### Visualizing parity between all groups for multiple user-specified disparity metrics

The treemaps below display Predicted Positive Group Rate (pprev) and Predicted Positive Rate (ppr) disparities between attribute groups for all three attributes (race, sex, age category) calculated based on predefined reference groups ('race' attribute: Caucasian, 'sex' attribute: Male, 'age_cat' attribute: 25-45), colored based on fairness determination for that attribute group (green = 'True' and red = 'False'). As we want to plot for all groups, there is no need to specify any attributes. 

We can see that the Predicted Positive Group Rate Parity (Impact Parity) determination was 'False' for nearly every race in comparison to Caucausians, and 'False' for every other age category in comparison to the 25-45 age group, and that overall Predicted Positive Rate Parity (Statistical Parity) did not have any 'True' fairness determinations at all.

In [None]:
r_tm = aqp.plot_fairness_disparity_all(fdf, metrics=['pprev_disparity', 'ppr_disparity'], 
                                       significance_alpha=0.05)

In [None]:
r_tm_original = aqp.plot_fairness_disparity_all(fdf_original, metrics=['pprev_disparity', 'ppr_disparity'], 
                                       significance_alpha=0.05)

### Visualizing parity between groups in multiple user-specified attributes

The treemaps below display disparities between attribute groups for all two attributes (sex, age category) calculated based on predefined reference groups ('sex' attribute: Male, 'age_cat' attribute: 25-45) for the six default disparity metrics, colored based on fairness determination for that attribute group (green = 'True' and red = 'False'). As we want to see only the default metrics, we do not need to set the 'metrics' parameter. 

Note that there is slightly more parity between the sexes (FNR, FDR, FNR, and Statistical Parity) than between age categories (FDR Parity only).

In [None]:
n_tm = aqp.plot_fairness_disparity_all(fdf, attributes=['sex', 'age_cat'], 
                                       significance_alpha=0.05)

In [None]:
n_tm_original = aqp.plot_fairness_disparity_all(fdf_original, attributes=['sex', 'age_cat'], 
                                       significance_alpha=0.05)

## The Aequitas Effect

By breaking down the COMPAS predictions using a variety of bias and disparity metrics calculated using different reference groups, we are able to surface the specific metrics for which the model is imposing bias on given attribute groups, and have a clearer lens when evaluating models and making recommendations for intervention. 

Researchers utilizing Aequitas will be able to make similar evaluations on their own data sets, and as they continue to use the tool, will begin to identify patterns in where biases exist and which models appear to produce less bias, thereby helping to reduce bias and its effects in future algorithm-based decision-making.

[Back to Top](#top_cell)