# Group Fairness Analysis - Demonstration

This notebook demonstrates how to perform comprehensive group fairness analysis using the fairness evaluation framework. It shows how to:

1. **Load and prepare your data** for group fairness evaluation
2. **Calculate group fairness metrics** across demographic attributes
3. **Visualize fairness disparities** using bar charts and radar plots
4. **Analyze specific metrics** by demographic groups
5. **Interpret results** to identify potential biases

**To run the code**: Replace the data loading sections below with your own dataset following the structure requirements described in the README.

## Group Fairness Metrics Covered:
- **Demographic Parity (DP)**: Equal positive prediction rates across groups
- **Equalized Odds (EO)**: Equal true positive and false positive rates across groups
- **Equal Opportunity (EOD)**: Equal true positive rates across groups
- **Error Rate Disparity Index (EDDI)**: Normalized error rate differences across groups

## 1. Environment Setup

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Add the parent directory to the Python path
sys.path.append(os.path.join(os.getcwd(), '..'))

In [None]:
# Import fairness analysis functions
from src.util import TestPredictions
from src.constants import ATTRIBUTES_REVERSE_MAPPINGS
from src.group import (
    calculate_and_visualize_fairness_metrics,
    visualize_metric_by_group
)

## 2. Data Loading and Preparation

**To use**: Replace this section with your own data loading code. You need:
- `X_test`: DataFrame with features including sensitive attributes
- `y_test`: Binary ground truth labels (0/1)
- `y_pred`: Binary model predictions (0/1) 
- `demographic_variables`: Dictionary with demographic group assignments
- `demographic_mappings`: Mappings from codes to readable group names

In [None]:
# DEMO DATA LOADING - Replace with your own data loading code
# This section shows the expected data structure

# Example: Load your trained model predictions
# model_pred = TestPredictions.load('../output/delirium_xgb_model.pkl')
# y_test = model_pred.y_test      # Ground truth binary labels
# y_pred = model_pred.y_pred      # Model binary predictions
# X_test = model_pred.X_test      # Feature data with demographic attributes

print("For this demo, you would load your data here.")
print("Expected data structure:")
print("- y_test: Binary ground truth labels (0/1)")
print("- y_pred: Binary model predictions (0/1)")
print("- X_test: DataFrame with demographic attributes")

# Other ways to load data:
# UNCOMMENT AND MODIFY FOR YOUR DATA:
# y_test = np.load('your_true_labels.npy')     # Binary ground truth
# y_pred = np.load('your_predictions.npy')    # Binary predictions (use threshold on probabilities)
# X_test = pd.read_csv('your_features.csv')   # Must include demographic columns

In [None]:
# Create demographic mappings for your data
# MODIFY these mappings to match your data encoding

demographic_mappings = {
    'race_ethnicity': ATTRIBUTES_REVERSE_MAPPINGS['race_ethnicity'],
    'sex': ATTRIBUTES_REVERSE_MAPPINGS['sex'],
    'insurance_type': ATTRIBUTES_REVERSE_MAPPINGS['insurance_type']
}

print("Available demographic group mappings:")
for attr, mapping in demographic_mappings.items():
    print(f"\n{attr.upper()}:")
    for code, label in mapping.items():
        print(f"  {code}: {label}")

In [None]:
# Prepare demographic variables from your data
# UNCOMMENT AND MODIFY FOR YOUR DATA:
# demographic_variables = {
#     'Race_Ethnicity': X_test['race_ethnicity'].astype(int),
#     'Sex': X_test['sex'].astype(int),
#     'Insurance': X_test['insurance_type'].astype(int)
# }

print("Demographic variables would be extracted here.")
print("These define which demographic group each patient belongs to.")

## 3. Data Filtering and Preprocessing

Remove samples with unknown demographic information to ensure clean analysis.

In [None]:
# Filter out samples with unknown demographic information
# UNCOMMENT FOR ACTUAL FILTERING:

# # Filter Race/Ethnicity (remove -1 values)
# mask_race = demographic_variables['Race_Ethnicity'] != -1
# race_ethnicity_clean = demographic_variables['Race_Ethnicity'][mask_race]
# y_test_race = y_test[mask_race]
# y_pred_race = y_pred[mask_race]
# 
# # Filter Sex (remove -1 values)
# mask_sex = demographic_variables['Sex'] != -1
# sex_clean = demographic_variables['Sex'][mask_sex]
# y_test_sex = y_test[mask_sex]
# y_pred_sex = y_pred[mask_sex]
# 
# # Filter Insurance (remove -1 values)
# mask_insurance = demographic_variables['Insurance'] != -1
# insurance_clean = demographic_variables['Insurance'][mask_insurance]
# y_test_insurance = y_test[mask_insurance]
# y_pred_insurance = y_pred[mask_insurance]

print("Data filtering would happen here.")
print("This removes samples with unknown (-1) demographic values for cleaner analysis.")

# Display filtering statistics
# print(f"Original dataset size: {len(y_test)}")
# print(f"Race/Ethnicity analysis: {len(y_test_race)} samples ({len(y_test_race)/len(y_test)*100:.1f}%)")
# print(f"Sex analysis: {len(y_test_sex)} samples ({len(y_test_sex)/len(y_test)*100:.1f}%)")
# print(f"Insurance analysis: {len(y_test_insurance)} samples ({len(y_test_insurance)/len(y_test)*100:.1f}%)")

## 4. Race/Ethnicity Fairness Analysis

Analyze fairness across racial and ethnic groups.

In [None]:
# Calculate and visualize fairness metrics for Race/Ethnicity
# UNCOMMENT FOR ACTUAL ANALYSIS:
# race_metrics, race_viz_metrics = calculate_and_visualize_fairness_metrics(
#     y_test_race, 
#     y_pred_race, 
#     race_ethnicity_clean,
#     demographic_mappings['race_ethnicity'],
#     title='Model Fairness Metrics by Race and Ethnicity',
#     show_plot="both",  # Shows both bar chart and radar chart
#     save_pdf=False     # Set to True to save plots
# )

print("Race/ethnicity fairness analysis would run here.")
print("This generates comprehensive fairness metrics and visualizations.")
print("Results include bar charts and radar plots showing disparity levels.")

In [None]:
# Visualize specific fairness metric by race/ethnicity groups
# UNCOMMENT FOR SPECIFIC METRIC VISUALIZATION:
# # Remove 'Unknown' group if present for cleaner visualization
# if 'Unknown or declined to state' in race_metrics[0]:
#     del race_metrics[0]['Unknown or declined to state']
#
# # Visualize Equalized Odds differences
# visualize_metric_by_group(
#     race_metrics[0], 
#     'Equalized Odds', 
#     'Equalized Odds',
#     title='Equalized Odds Difference by Race/Ethnicity',
#     figsize=(10, 6),
#     save_pdf=False
# )

print("Individual metric visualization would appear here.")
print("This shows detailed breakdown of a specific fairness metric across groups.")

## 5. Sex/Gender Fairness Analysis

Analyze fairness across sex/gender groups.

In [None]:
# Calculate and visualize fairness metrics for Sex
# UNCOMMENT FOR ACTUAL ANALYSIS:
# sex_metrics, sex_viz_metrics = calculate_and_visualize_fairness_metrics(
#     y_test_sex, 
#     y_pred_sex, 
#     sex_clean, 
#     demographic_mappings['sex'],
#     title='Model Fairness Metrics by Sex',
#     show_plot="both",
#     save_pdf=False
# )

print("Sex/gender fairness analysis would run here.")
print("This compares fairness metrics between male and female patients.")

## 6. Insurance Type Fairness Analysis

Analyze fairness across insurance/socioeconomic groups.

In [None]:
# Calculate and visualize fairness metrics for Insurance Type
# UNCOMMENT FOR ACTUAL ANALYSIS:
# insurance_metrics, insurance_viz_metrics = calculate_and_visualize_fairness_metrics(
#     y_test_insurance, 
#     y_pred_insurance, 
#     insurance_clean,
#     demographic_mappings['insurance_type'],
#     title='Model Fairness Metrics by Insurance Type',
#     show_plot="both",
#     save_pdf=False
# )

print("Insurance type fairness analysis would run here.")
print("This examines fairness across different insurance/socioeconomic groups.")

In [None]:
# Visualize specific metric for insurance groups
# UNCOMMENT FOR SPECIFIC METRIC VISUALIZATION:
# visualize_metric_by_group(
#     insurance_metrics[0], 
#     'EDDI',           # Error Rate Disparity Index
#     'Error Rate',     # Raw metric to display alongside
#     title='Error Rate Disparity Index (EDDI) by Insurance Type',
#     figsize=(10, 6),
#     save_pdf=False
# )

print("EDDI (Error Rate Disparity Index) visualization would appear here.")
print("EDDI measures normalized error rate differences across insurance groups.")

## 7. Results Interpretation Guide

### Understanding Group Fairness Metrics:

**Metric Interpretations (Lower values = Better fairness):**

1. **Demographic Parity Difference (DPD)**:
   - Measures difference in positive prediction rates between groups
   - **Ideal value**: 0 (equal positive rates across all groups)
   - **Threshold guidance**: <0.05 good, 0.05-0.10 moderate concern, >0.10 high concern

2. **Equal Opportunity Difference (EOD)**:
   - Measures difference in true positive rates (sensitivity) between groups
   - **Ideal value**: 0 (equal benefit for positive cases across groups)
   - **Clinical significance**: Ensures equal detection of positive cases

3. **Equalized Odds**:
   - Combines true positive rate and false positive rate differences
   - **Ideal value**: 0 (equal performance across groups)
   - **Comprehensive measure**: Considers both benefits and harms

4. **Error Rate Disparity Index (EDDI)**:
   - Normalized measure of error rate differences
   - **Range**: 0 to 1, where 0 = perfect fairness
   - **Advantage**: Accounts for baseline error rates

### Visualization Guide:

**Bar Charts**:
- **Green bars**: Low disparity (good fairness)
- **Yellow bars**: Moderate disparity (monitor)
- **Orange/Red bars**: High disparity (requires attention)

**Radar Charts**:
- **Center (0)**: Perfect fairness
- **Outer rings**: Increasing disparity levels
- **Shape**: Balanced polygon indicates consistent fairness across metrics