# Prejudice Remover: Fairness through Regularization

The Prejudice Remover is an in-processing fairness algorithm that learns a classifier while removing direct and indirect prejudice by adding a regularization term to the objective function.

This algorithm works by penalizing the mutual information between the sensitive attribute and the prediction, effectively reducing the influence of protected attributes on the model's decisions.

Key features of the Prejudice Remover algorithm:
- **Regularization-based approach**: Adds a fairness term to the loss function
- **Tunable fairness parameter (eta)**: Controls the trade-off between accuracy and fairness
- **Handles both direct and indirect discrimination**: Addresses both explicit and implicit biases
- **Works with any differentiable model**: Can be applied to various neural network architectures

In this demo, we'll explore how different values of the regularization parameter (eta) affect the fairness-accuracy trade-off.

In [18]:
import sys
import os

# Add the root directory of the project to PYTHONPATH
sys.path.append(os.path.abspath(os.path.join('..')))


In [19]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import openml

from fairlib import DataFrame
from fairlib.inprocessing import PrejudiceRemover
from fairlib.metrics import statistical_parity_difference, disparate_impact

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

<torch._C.Generator at 0x10d1e39d0>

## Loading and Preparing the Adult Dataset
We will use the Adult dataset from OpenML, which contains demographic information and predicts whether an individual earns more than $50K per year. We'll use 'sex' as our sensitive attribute.

In [20]:
# Load the Adult dataset from OpenML
adult_dataset = openml.datasets.get_dataset(179)
adult_X, _, _, _ = adult_dataset.get_data(dataset_format="dataframe")

# Rename the target column for clarity
adult_X.rename(columns={'class': 'income'}, inplace=True)

# Create a DataFrame object and specify target and sensitive attributes
adult = DataFrame(adult_X)
adult.targets = 'income'
adult.sensitive = ['sex']

# Drop unnecessary columns
adult.drop(columns=["fnlwgt"], inplace=True)

# Encode categorical features
label_maps = {}
for col in adult.columns:
    if adult[col].dtype == 'object' or adult[col].dtype == 'category':
        adult[col], uniques = pd.factorize(adult[col])
        label_maps[col] = uniques

print(f"Dataset shape: {adult.shape}")
print(f"Target column: {adult.targets}")
print(f"Sensitive attributes: {adult.sensitive}")

adult.head()

Dataset shape: (48842, 14)
Target column: {'income'}
Sensitive attributes: {'sex'}


Unnamed: 0,age,workclass,education,education-num,marital-status,occupation,relationship,race,sex,capitalgain,capitalloss,hoursperweek,native-country,income
0,0,0,0,13,0,0,0,0,0,0,0,0,0,0
1,1,1,0,13,1,1,1,0,0,1,0,1,0,0
2,0,2,1,9,2,2,0,0,0,1,0,0,0,0
3,1,2,2,7,1,2,1,1,0,1,0,0,0,0
4,2,2,0,13,1,3,2,1,1,1,0,0,1,0


## Splitting the Dataset
We'll split our data into training and testing sets to evaluate our models.

In [21]:
# Split the data into features and target
X = adult.drop(columns=adult.targets)
y = adult[adult.targets.pop()]

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create DataFrames for training and testing
train_df = DataFrame(pd.concat([X_train, y_train], axis=1))
train_df.targets = adult.targets
train_df.sensitive = adult.sensitive

test_df = DataFrame(pd.concat([X_test, y_test], axis=1))
test_df.targets = adult.targets
test_df.sensitive = adult.sensitive

print(f"Training set shape: {train_df.shape}")
print(f"Testing set shape: {test_df.shape}")

Training set shape: (39073, 14)
Testing set shape: (9769, 14)


## Creating a Neural Network Model
We'll define a simple neural network model to use with the Prejudice Remover algorithm.

In [22]:
class SimpleNN(nn.Module):
    def __init__(self, input_size):
        super(SimpleNN, self).__init__()
        self.layer1 = nn.Linear(input_size, 64)
        self.layer2 = nn.Linear(64, 32)
        self.layer3 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.relu(self.layer2(x))
        x = self.sigmoid(self.layer3(x))
        return x

# Get the number of features in our dataset
input_size = X_train.shape[1]

# Create a model instance
model = SimpleNN(input_size)
print(f"Model created with input size: {input_size}")

Model created with input size: 13


## Training and Evaluating Models with Different Eta Values
Now we'll train multiple Prejudice Remover models with different eta values to see how this parameter affects the fairness-accuracy trade-off.
- eta = 0: No fairness regularization (standard model)
- eta > 0: Increasing values impose stronger fairness constraints

In [23]:
def train_and_evaluate(train_data, test_data, eta, epochs=200, batch_size=128):
    model_copy = SimpleNN(input_size)
    prejudice_remover = PrejudiceRemover(
        model_copy,
        loss=nn.BCELoss(),
        eta=eta
    )

    prejudice_remover.fit(
        train_data,
        epochs=epochs,
        batch_size=batch_size
    )

    # Convert from set to list before indexing
    target_col = list(test_data.targets)[0]
    sensitive_col = list(test_data.sensitive)[0]

    # Prepare test data
    X_test = test_data.drop(columns=target_col)
    y_true = test_data[target_col].to_numpy()
    sensitive = test_data[sensitive_col].to_numpy()

    # Predictions
    y_pred = prejudice_remover.predict(X_test)
    y_pred_binary = (y_pred > 0.5).float().numpy()

    # Accuracy
    accuracy = accuracy_score(y_true, y_pred_binary)

    # Fairness metrics - use predictions as targets
    spd = statistical_parity_difference(y_pred_binary, sensitive)
    di  = disparate_impact(y_pred_binary, sensitive)

    return {
        'eta': eta,
        'accuracy': accuracy,
        'statistical_parity_difference': spd,
        'disparate_impact': di
    }


# Define eta values to test
eta_values = [0.0, 0.1, 0.3 ,0.5, 0.8]

# Train and evaluate models with different eta values
results = []
for eta in eta_values:
    print(f"Training model with eta = {eta}...")
    result = train_and_evaluate(train_df, test_df, eta)
    results.append(result)
    print(f"  Accuracy: {result['accuracy']:.4f}")
    print(f"  Statistical Parity Difference: {result['statistical_parity_difference']}")
    print(f"  Disparate Impact: {result['disparate_impact']}")
    print()

# Convert results to DataFrame for easier analysis
results_df = pd.DataFrame(results)
results_df

Training model with eta = 0.0...
  Accuracy: 0.8396
  Statistical Parity Difference: [[-0.20984906  0.20984906]
 [ 0.20984906 -0.20984906]]
  Disparate Impact: [[1.30677107 0.33579978]
 [0.76524498 2.97796505]]

Training model with eta = 0.1...
  Accuracy: 0.8404
  Statistical Parity Difference: [[-0.17647815  0.17647815]
 [ 0.17647815 -0.17647815]]
  Disparate Impact: [[1.24171441 0.34611044]
 [0.80533816 2.88925118]]

Training model with eta = 0.3...
  Accuracy: 0.8450
  Statistical Parity Difference: [[-0.17701002  0.17701002]
 [ 0.17701002 -0.17701002]]
  Disparate Impact: [[1.24037763 0.32853307]
 [0.80620609 3.0438336 ]]

Training model with eta = 0.5...
  Accuracy: 0.8410
  Statistical Parity Difference: [[-0.16196253  0.16196253]
 [ 0.16196253 -0.16196253]]
  Disparate Impact: [[1.21290971 0.32315405]
 [0.82446368 3.09449939]]

Training model with eta = 0.8...
  Accuracy: 0.8403
  Statistical Parity Difference: [[-0.16649617  0.16649617]
 [ 0.16649617 -0.16649617]]
  Disparate 

Unnamed: 0,eta,accuracy,statistical_parity_difference,disparate_impact
0,0.0,0.839595,"[[-0.20984906076829335, 0.20984906076829335], ...","[[1.3067710716129648, 0.3357997766675229], [0...."
1,0.1,0.840414,"[[-0.17647814895427016, 0.17647814895427016], ...","[[1.2417144135719007, 0.3461104412896203], [0...."
2,0.3,0.84502,"[[-0.17701002437758417, 0.17701002437758412], ...","[[1.2403776271206919, 0.3285330706141092], [0...."
3,0.5,0.841028,"[[-0.16196252613709372, 0.16196252613709372], ...","[[1.2129097085342004, 0.3231540467825802], [0...."
4,0.8,0.840311,"[[-0.16649617375284942, 0.1664961737528494], [...","[[1.2217235109308524, 0.3315608159406489], [0...."
