# Fairness Post-Processing on COMPAS

In this notebook, we explore how simple, model-agnostic post-processing techniques can help reduce racial disparities in a recidivism prediction task.  
We use the COMPAS dataset, treating **race** (Caucasian vs. African-American) as our protected attribute, and compare:

1. **Origin** - no fairness correction
2. **Unconstrained** - full mass-transport repair without thresholding
3. **Barycentre** - optimal-transport-based repair to equalize distributions
4. **Partial** - a tunable "dial" that trades off fairness vs. accuracy
5. **ROC** - learning a threshold on validation data under a fairness constraint

We will walk through:

1. Imports & setup

2. Data loading & utility helpers

3. Measuring per-feature imbalance (TV distance)

4. Main training & post-processing loop

5. Aggregating results

6. Feature-importance analysis

7. Conclusions

## 1  Imports & setup

In [1]:
import sys
import os
import numpy as np
import pandas as pd

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score

from aif360.algorithms.postprocessing.reject_option_classification import RejectOptionClassification
from aif360.datasets import BinaryLabelDataset,CompasDataset

from humancompatible.repair.methods.data_analysis import rdata_analysis
from humancompatible.repair.postprocess.roc_postprocess import ROCpostprocess
from humancompatible.repair.postprocess.proj_postprocess import Projpostprocess

pip install 'aif360[AdversarialDebiasing]'
pip install 'aif360[AdversarialDebiasing]'
pip install 'aif360[Reductions]'
pip install 'aif360[Reductions]'
pip install 'aif360[inFairness]'
pip install 'aif360[Reductions]'


In [2]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

## 2  Utility Helpers

Here we have the small helper functions that:

- **Load & preprocess** the COMPAS dataset into a single DataFrame, renaming race to `S`, recidivism to `Y`, and dropping unused features.

- **Compute Total-Variation distance** (`tv_dist`) per feature to identify which inputs differ most between Caucasian and African-American groups.

In [3]:
pa = 'race'
label_map = {1.0: 'Did recid.', 0.0: 'No recid.'}
protected_attribute_maps = {1.0: 'Caucasian', 0.0: 'African-American'}
favorable_label = 0
privileged_groups = [{pa: 1}]
unprivileged_groups = [{pa: 0}]
cd = CompasDataset(protected_attribute_names=[pa],privileged_classes=[['Caucasian'],[1]], 
                    metadata={'label_map': label_map,'protected_attribute_maps': protected_attribute_maps},
                    features_to_drop=['age', 'sex', 'c_charge_desc'])
train,test = cd.split([0.6], shuffle=True) #len(test.instance_names) = 2057
var_list = cd.feature_names.copy()
var_list.remove(pa)
var_dim=len(var_list)

K=200
e=0.01
thresh=0.05

messydata=cd.convert_to_dataframe()[0]
messydata=messydata.rename(columns={pa:'S',cd.label_names[0]:'Y'})
messydata=messydata[(messydata['S']==1)|(messydata['S']==0)]
for col in var_list+['S','Y']:
    messydata[col]=messydata[col].astype('category')
messydata['W']=cd.instance_weights
X=messydata[var_list+['S','W']].to_numpy() # [X,S,W]
y=messydata['Y'].to_numpy() #[Y]

tv_dist=dict()
for x_name in var_list:
    x_range_single=list(pd.pivot_table(messydata,index=x_name,values=['W'])[('W')].index) 
    dist=rdata_analysis(messydata,x_range_single,x_name)
    tv_dist[x_name]=sum(abs(dist['x_0']-dist['x_1']))/2
x_list=[]
for key,val in tv_dist.items():
    if val>0.1:
        x_list+=[key]        
tv_dist

{'juv_fel_count': np.float64(0.03210337325453563),
 'juv_misd_count': np.float64(0.04323143324022939),
 'juv_other_count': np.float64(0.021763780679615215),
 'priors_count': np.float64(0.12622233191661625),
 'age_cat=25 - 45': np.float64(0.054431947619680315),
 'age_cat=Greater than 45': np.float64(0.13519019921101838),
 'age_cat=Less than 25': np.float64(0.08075825159133806),
 'c_charge_degree=F': np.float64(0.07840757396162046),
 'c_charge_degree=M': np.float64(0.07840757396162046)}

## 3  Measuring Group Imbalance

We compute the Total-Variation distance for each feature to see which attributes behave differently  
across race. Those with the highest TV distance will be the axes along which we apply our  
post-processing repairs.

In [4]:
x_list

['priors_count', 'age_cat=Greater than 45']

The features **`priors_count`** and **`age_cat=Greater than 45`** emerge as the most imbalanced.  
These will be our coordinates for all subsequent repair strategies.

## 4  Training & Post-Processing Experiment

For each of 10 random splits:

1. Fit a depth-5 Random Forest on the training set.

2. Apply each post-processor to the test set:
   - **origin** (baseline, no repair)
   - **unconstrained** (full coupling)
   - **barycentre** (optimal transport)
   - **partial** (tunable coupling via `t`)
   - **ROC** (thresholding learned under a parity constraint)

3. Record:
   - **DI** (Disparate Impact)
   - **F1** (macro, micro, weighted)
   - **TV distance** (remaining gap on repaired axes)

In [5]:
methods=['origin','unconstrained','barycentre','partial','ROC'] # Place ROC in the end
report=pd.DataFrame(columns=['DI','f1 macro','f1 micro','f1 weighted','TV distance','method'])
for ignore in range(10):
    # train val test 4:2:4
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3)

    clf=RandomForestClassifier(max_depth=5).fit(X_train[:,0:var_dim],y_train)
    projpost = Projpostprocess(X_test,y_test,x_list,var_list,clf,K,e,thresh,favorable_label,linspace_range=(0.01,0.1),theta=1e-2)
    for method in methods[:-1]:
        # report = pd.concat([report,projpost.postprocess(method,para=1e-2)], ignore_index=True)
        report = pd.concat([report,projpost.postprocess(method,para=1e-3)], ignore_index=True)

    ROCpost = ROCpostprocess(X_val,y_val,var_list,clf,favorable_label) # use validation set to train a ROC model
    report = pd.concat([report,ROCpost.postprocess(X_test,y_test,tv_origin=projpost.tv_origin)], ignore_index=True)

report.to_csv('../data/report_postprocess_compas_'+str(pa)+'.csv',index=None)

Optimal classification threshold (with fairness constraints) = 0.7300
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.7300
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.7100
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.6900
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.7900
Optimal ROC margin = 0.0233
Optimal classification threshold (with fairness constraints) = 0.7100
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.7100
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.7300
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.7100
Optimal ROC margin = 0.0000
Optimal classification threshold (with fairness constraints) = 0.7700
Optimal ROC margin = 0.0000


## 5  Aggregated Results

Below is the combined table across all folds. Each row shows:

- **DI**: Ratio of favorable outcomes (ideal ≈ 1 for parity)

- **F1** (macro/micro/weighted): Overall predictive performance

- **TV distance**: Remaining imbalance on the repaired features

- **method**: Which post-processor was applied

Notice how different techniques balance accuracy vs. fairness.

In [6]:
report

Unnamed: 0,DI,f1 macro,f1 micro,f1 weighted,TV distance,method
0,0.783545,0.668244,0.673957,0.672282,0.151335,origin
1,0.753625,0.667108,0.667882,0.668597,0.148897,unconstrained
2,1.229796,0.625635,0.625759,0.626268,0.0,barycentre
3,0.946193,0.553434,0.563791,0.547127,0.019839,partial_0.001
4,0.971861,0.439968,0.572701,0.465256,0.151335,ROC
5,0.777348,0.668627,0.675172,0.673399,0.163487,origin
6,0.753501,0.662402,0.664237,0.664952,0.1573,unconstrained
7,1.147662,0.622508,0.622924,0.623793,2.8e-05,barycentre
8,0.936825,0.569089,0.572296,0.56528,0.020784,partial_0.001
9,0.955358,0.445895,0.583637,0.474205,0.163487,ROC


In [7]:
valpost = Projpostprocess(X_val,y_val,x_list,var_list,clf,K,e,'auto',linspace_range=(0.01,0.1),theta=1e-2)
valpost.thresh

Optional threshold =  [0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 ]
Disparate Impact =  [1.12849189 1.13844394 1.13844394 1.13844394 1.13844394 1.15129734
 1.14285082 1.14299072 1.13422458 1.14674675]
f1 scores =  [0.59507956 0.59813328 0.59813328 0.59813328 0.59813328 0.59584842
 0.61918294 0.61825019 0.6163674  0.61261952]


np.float64(0.06000000000000001)

## 6  Feature-Importance Analysis

Finally, we revisit our Random Forest baseline to see which inputs drive recidivism prediction the most.  
Understanding these drivers helps highlight potential proxy features that deserve extra fairness scrutiny.

In [8]:
# Compute average feature importance
importance=[]
for ignore in range(10):
    # train val test 4:2:4
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4)
    X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.3)

    clf=RandomForestClassifier(max_depth=5).fit(X_train[:,0:var_dim],y_train)
    importance.append(list(clf.feature_importances_))
importance=np.array(importance)
print("features", var_list)
print("mean importances", importance.mean(axis=0))

features ['juv_fel_count', 'juv_misd_count', 'juv_other_count', 'priors_count', 'age_cat=25 - 45', 'age_cat=Greater than 45', 'age_cat=Less than 25', 'c_charge_degree=F', 'c_charge_degree=M']
mean importances [0.02886132 0.05202298 0.08352282 0.59343752 0.02858455 0.07039007
 0.07961757 0.03311442 0.03044874]


## 7  Conclusions

- **Post-processing is plug-and-play**: you can retrofit fairness on top of any model without retraining.

- **Barycentre repair** closes the gap (TV≈0) with minimal F1 loss.

- **Partial repair** offers a smooth "fairness dial" - small `t` nudges, large `t` for full parity.

- **ROC-based thresholding** finds a decision threshold that meets a strict parity constraint.

On COMPAS, **priors_count** dominates the predictions, so interventions there (or data collection)  
could be especially impactful for reducing racial bias.