<h1 align="center"> Single Group Fairness Metrics for the Heart Disease Dataset: End-to-End Demonstration</h1>


![Fairness](Fairness.jpg)
<p align="center">
  <em>Figure 1: Fairness over Time</em><br>
  <em>Source: TimeBots</em>
</p>



# Group Fairness: Definition üêù
Group Fairness in Machine Learning and AI refers to the idea that the performance of an algorithm should not disproportionately disadvantage or harm specific groups of people based on the sensitive attributes (e.g sex, age). 

## Distinction Between Group Fairness and Intersectional Fairness
While this package emphasises the development of intersectional fairness metrics, single-group fairness metrics remain an essential component of any comprehensive fairness evaluation. Group fairness metrics assess model behaviour with respect to a single protected attribute (e.g., sex or age), providing an initial, high-level understanding of whether an algorithm performs equitably across broad demographic categories.

In contrast, intersectional fairness metrics evaluate fairness across combinations of multiple protected attributes simultaneously. This evaluation enables the identification of ‚Äúcompounded‚Äù disadvantages experienced by individuals belonging to multiple marginalised groups - patterns that may remain hidden when attributes are analysed in isolation. 

Together, single-group and intersectional fairness metrics offer complementary perspectives: the former provides a general overview of fairness performance, while the latter facilitates a more granular and nuanced analysis of algorithmic bias.

## Literature Review and Rationale for Metric Selection 
### Equalised Opportunity Difference (EOD)
$$
\text{EOD} =
\text{TPR}_{\text{unpriv}} - \text{TPR}_{\text{priv}}
$$

EOD is a group fairness metric proposed by Hardt et al. (2016) that measures the difference in True Positive Rates (TPR) between privileged and underprivileged groups. The metric is designed to assess whether a classifier provides equal opportunity for individuals who truly belong to the positive class, irrespective of their membership in a protected group.

In the context of heart disease risk prediction, the positive outcome corresponds to correctly identifying individuals with heart disease, rather than the presence of a positive model prediction itself. An ideal EOD value of zero indicates "fairness", whereas a score with larger magnitude indicate increasing levels of unfairness. The sign of the metric conveys the direction of disparity, with positive values indicating that the privileged group benefits from higher true positive rates, and negative values indicating disadvantage for the privileged group.

### Average Odds Difference (AOD)
$$
\text{AOD} = \frac{1}{2} \Big[
(\text{TPR}_{\text{unpriv}} - \text{TPR}_{\text{priv}})
+
(\text{FPR}_{\text{unpriv}} - \text{FPR}_{\text{priv}})
\Big]
$$
AOD is a group fairness metric that emerged as a response to limitations identified in earlier fairness definitions, including those proposed by Hardt et al. (2016). Rather than being introduced as a standalone theoretical framework, AOD is commonly discussed in the literature as a practical extension of Equalised Odds.

Previous work, Statistical Parity Difference (SPD) considers only disparities in positive prediction rates, and Equalised Opportunity Difference (EOD) focuses exclusively on differences in true positive rates; AOD incorporates information from both true positive rates and false positive rates.

This metric works similar as the former, where a magnitude of 0 and 1 indicate fairness and unfairness respectively. 

### Disparate Impact (DI)
$$
\text{DI} =
\frac{P(\hat{Y} = 1 \mid A = \text{unpriv})}
     {P(\hat{Y} = 1 \mid A = \text{priv})}
$$
Disparate Impact (DI) is a metric in fairness quanitfying the proportion of groups receiving positive outcomes. A value of 0 indicates fairness, with over 0 being biased towards the privileged group and a value under 0 means that it favours the underprivileged group.


## Code Demonstration

### Data Preprocessing and Training

In [3]:
# Import the necessary packages

In [12]:
cd ..

/home/ubuntu/hpdm139/HPDM139_assignment


In [20]:
import numpy as np
import pandas as pd
import itertools
from fairness.data import load_heart_csv
from fairness.preprocess import add_age_group, map_binary_column, apply_transforms, preprocess_tabular, make_train_test_split
from fairness.groups import make_eval_df
from fairness.single_metrics import calculate_EOD
from fairness.single_metrics import calculate_AOD
from fairness.single_metrics import calculate_DI

In [21]:
from pathlib import Path

ROOT = Path.cwd().parent if Path.cwd().name == "examples" else Path.cwd()
DATA_PATH = ROOT / "data" / "heart.csv"

df_raw = load_heart_csv(DATA_PATH)
df_raw.head()


Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0


In [22]:
# Add a protected attribute for fairness analysis 
df_fair = add_age_group(df_raw, age_col="Age", new_col="age_group", bins=(0, 55, 120), labels=("young", "older"))

# map binary/categorical encodings if needed (if dataset has M/F)
if "Sex" in df_fair.columns and df_fair["Sex"].dtype == object:
    df_fair = map_binary_column(df_fair, col="Sex", mapping={"M": 1, "F": 0})

df_fair[["Age", "age_group", "Sex"]].head()

Unnamed: 0,Age,age_group,Sex
0,40,young,1
1,49,young,0
2,37,young,1
3,48,young,0
4,54,young,1


In [23]:
df_fair2 = apply_transforms(
    df_raw,
    transforms=[
        lambda d: add_age_group(d, age_col="Age", new_col="age_group"),
        lambda d: map_binary_column(d, col="Sex", mapping={"M": 1, "F": 0}),
    ],
)

df_fair2[["Age", "age_group","Sex"]].head()

Unnamed: 0,Age,age_group,Sex
0,40,young,1
1,49,young,0
2,37,young,1
3,48,young,0
4,54,young,1


In [24]:
df_model = preprocess_tabular(df_fair, drop_cols=("age_group",))
df_model.head()

Unnamed: 0,Age,Sex,RestingBP,Cholesterol,FastingBS,MaxHR,Oldpeak,HeartDisease,ChestPainType_ATA,ChestPainType_NAP,ChestPainType_TA,RestingECG_Normal,RestingECG_ST,ExerciseAngina_Y,ST_Slope_Flat,ST_Slope_Up
0,40,1,140,289,0,172,0.0,0,True,False,False,True,False,False,False,True
1,49,0,160,180,0,156,1.0,1,False,True,False,True,False,False,True,False
2,37,1,130,283,0,98,0.0,0,True,False,False,False,True,False,False,True
3,48,0,138,214,0,108,1.5,1,False,False,False,True,False,True,True,False
4,54,1,150,195,0,122,0.0,0,False,True,False,True,False,False,False,True


In [25]:
split = make_train_test_split(
    df_model,
    target_col="HeartDisease",
    test_size=0.3,
    random_state=42,
    stratify=True,
)

split.X_train.shape, split.X_test.shape, split.y_train.shape, split.y_test.shape

((642, 15), (276, 15), (642,), (276,))

In [26]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report


model = Pipeline([
    ("scaler", StandardScaler()),
    ("clf", LogisticRegression(max_iter=1000))
])

model.fit(split.X_train, split.y_train)
y_pred = model.predict(split.X_test)

print("\nClassification report:")
print(classification_report(split.y_test, y_pred))




Classification report:
              precision    recall  f1-score   support

           0       0.90      0.84      0.87       123
           1       0.88      0.92      0.90       153

    accuracy                           0.88       276
   macro avg       0.89      0.88      0.88       276
weighted avg       0.88      0.88      0.88       276



In [27]:
df_test = df_fair.loc[split.X_test.index]

eval_df = make_eval_df(
    df_test=df_test,
    protected=["Sex"],
    y_pred=y_pred,
    y_true=split.y_test.to_numpy(),
)

eval_df.head(5)

Unnamed: 0,subject_label,y_pred,y_true
351,Sex=1,1,1
596,Sex=1,1,1
491,Sex=1,1,1
794,Sex=1,0,0
544,Sex=0,0,0


In [28]:
from fairness.adapters import unpack_eval_df, make_subject_labels_dict

subject_labels, predictions, true_statuses = unpack_eval_df(eval_df)

## ‚≠ê Demonstration for the three single metrics: EOD, AOD and DI

In [53]:
import fairness.single_metrics as sm

### Equalised Opportunity Difference

In [84]:
sm.calculate_EOD(
    y_test=true_statuses,
    y_pred=predictions,
    group_labels=subject_labels,
    privileged_label='Sex=0'
)

0.023076923076923106

### Average Odds Difference

In [85]:
sm.calculate_AOD(
    y_test=true_statuses,
    y_pred=predictions,
    group_labels=subject_labels,
    privileged_label='Sex=0'
)

-0.04887938148807708

### Disparate Impact

In [86]:
sm.calculate_DI(
    y_pred=predictions,
    group_labels=subject_labels,
    privileged_label='Sex=1'
)

np.float64(0.3450772200772201)