# Individual Fairness Analysis - Demonstration

This notebook demonstrates how to perform comprehensive individual fairness analysis using the fairness evaluation framework. It shows how to:

1. **Load and prepare your data** for fairness analysis
2. **Calculate distance-based fairness metrics** across different feature sets
3. **Analyze fairness by demographic groups** with heatmap visualizations
4. **Generate Mean Prediction Probability Difference (MPPD)** plots

**To run the code**: Replace the data loading sections below with your own dataset following the structure requirements described in the README.

## 1. Environment Setup

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import sys
import os
import pickle
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Add the parent directory to the Python path
sys.path.append(os.path.join(os.getcwd(), '..'))

In [None]:
# Import fairness analysis functions
from src.util import TestPredictions
from src.constants import ATTRIBUTES_REVERSE_MAPPINGS
from src.individual import (
    calculate_distance_based_fairness,
    cosine_distance,
    analyze_individual_fairness_by_group,
    filter_fairness_results,
    plot_fairness_heatmap,
    plot_fairness_comparison,
    plot_multiple_group_comparison_proba_diff,
    save_fairness_results,
    load_fairness_results
)

## 2. Data Loading and Preparation

**To use**: Replace this section with your own data loading code. You need:
- `X_test`: DataFrame with features including sensitive attributes
- `y_test`: Binary ground truth labels  
- `y_pred_proba`: Model probability predictions
- `demographic_mappings`: Mappings from codes to group names

In [None]:
# DEMO DATA LOADING - Replace with your own data loading code
# This section shows the expected data structure

# Example: Load your trained model predictions
# delirium_pred = TestPredictions.load('../output/delirium_xgb_model.pkl')
# X_test = delirium_pred.X_test
# y_test = delirium_pred.y_test  
# y_pred_proba = delirium_pred.y_probs

# For demonstration purposes, create sample data structure:
print("For this demo, you would load your data here.")
print("Expected data structure:")
print("- X_test: DataFrame with features + sensitive attributes")
print("- y_test: Binary labels (0/1)")
print("- y_pred_proba: Probability predictions (0-1)")

# Other ways to load data:
# UNCOMMENT AND MODIFY FOR YOUR DATA:
# X_test = pd.read_csv('your_features.csv')  # Must include demographic columns
# y_test = np.load('your_labels.npy')        # Binary outcomes
# y_pred_proba = np.load('your_preds.npy')  # Model probabilities

In [None]:
# Create demographic mappings for your data
# MODIFY these mappings to match your data encoding

demographic_mappings = {
    'race_ethnicity': ATTRIBUTES_REVERSE_MAPPINGS['race_ethnicity'],
    'sex': ATTRIBUTES_REVERSE_MAPPINGS['sex'],
    'insurance_type': ATTRIBUTES_REVERSE_MAPPINGS['insurance_type']
}

print("Demographic group mappings:")
for attr, mapping in demographic_mappings.items():
    print(f"{attr}: {mapping}")

## 3. Demographic Group Fairness Analysis

This section analyzes individual fairness across different demographic groups and creates comparison visualizations.

In [None]:
# Function to analyze fairness across all demographic groups
def analyze_fairness_across_demographics(
    X_test,
    y_pred_proba,
    feature_set='all_clinical',
    distance_threshold=0.01,
    max_samples=10000,
    attribute_mappings=None
):
    """Analyze individual fairness across all demographic attributes"""
    results = {}
    
    for demo_key, _ in attribute_mappings.items():
        print(f"\nAnalyzing individual fairness for {demo_key}...")
        
        results[demo_key] = analyze_individual_fairness_by_group(
            X_test=X_test,
            y_pred_proba=y_pred_proba,
            group_column=demo_key,
            feature_set=feature_set,
            distance_threshold=distance_threshold,
            max_samples=max_samples,
            attribute_mappings=attribute_mappings,
            process_specific_group=None
        )
        
        print(f"Completed analysis for {demo_key}")
        
    return results

print("Demographic group analysis function defined.")
print("This will analyze fairness separately for each demographic group.")

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

# Run demographic analysis
# UNCOMMENT FOR ACTUAL ANALYSIS:
# results = analyze_fairness_across_demographics(
#     X_test=X_test,
#     y_pred_proba=y_pred_proba,
#     feature_set='all_clinical',
#     distance_threshold=0.01,
#     max_samples=10000,
#     attribute_mappings=demographic_mappings
# )

print("Demographic group analysis would run here.")
print("This analyzes fairness within and across different demographic groups.")

## 4. Fairness Heatmap Visualizations

Create heatmap visualizations showing prediction differences between demographic groups.

In [None]:
# Filter out unknown groups and extract results by demographic attribute
# UNCOMMENT FOR ACTUAL FILTERING:
# race_result_df = filter_fairness_results(
#     results['race_ethnicity'], 
#     exclude_groups=['Unknown or declined to state']
# )
# 
# sex_result_df = results['sex']
# 
# insurance_result_df = filter_fairness_results(
#     results['insurance_type'], 
#     exclude_groups=['Unknown']
# )

print("Results filtering would happen here.")
print("This removes groups with insufficient data or unknown categories.")

In [None]:
# Create fairness heatmap for race/ethnicity
# UNCOMMENT FOR ACTUAL PLOTTING:
# plot_fairness_heatmap(
#     race_result_df,
#     title="Model Fairness Analysis - Race/Ethnicity Groups",
#     figsize=(10, 8),
#     annot=True,
#     vmin=0.03,
#     vmax=0.125,
#     savefig=False
# )

print("Race/ethnicity fairness heatmap would appear here.")
print("Darker colors indicate larger prediction differences between groups.")

## 5. Mean Prediction Probability Difference (MPPD) Analysis

MPPD measures the difference in predictions between clinically similar patients from different demographic groups.

In [None]:
# Analyze specific demographic groups for MPPD
# UNCOMMENT FOR ACTUAL ANALYSIS:
# # Focus on specific reference groups for clearer analysis
# race_mppd_df = analyze_individual_fairness_by_group(
#     X_test, 
#     y_pred_proba, 
#     feature_set='all_clinical',
#     distance_threshold=0.01,
#     max_samples=10000,
#     group_column='race_ethnicity',
#     attribute_mappings=ATTRIBUTES_REVERSE_MAPPINGS,
#     process_specific_group='White'  # Use White as reference group
# )
#
# sex_mppd_df = analyze_individual_fairness_by_group(
#     X_test, 
#     y_pred_proba,
#     feature_set='all_clinical', 
#     distance_threshold=0.01,
#     max_samples=10000,
#     group_column='sex',
#     attribute_mappings=ATTRIBUTES_REVERSE_MAPPINGS,
#     process_specific_group='Female'  # Use Female as reference group
# )
#
# insurance_mppd_df = analyze_individual_fairness_by_group(
#     X_test, 
#     y_pred_proba,
#     feature_set='all_clinical',
#     distance_threshold=0.01,
#     max_samples=10000,
#     group_column='insurance_type',
#     attribute_mappings=ATTRIBUTES_REVERSE_MAPPINGS,
#     process_specific_group='Private'  # Use Private as reference group
# )
#
# # Filter unknown groups
# race_mppd_df = filter_fairness_results(race_mppd_df, exclude_groups=['Unknown or declined to state'])
# insurance_mppd_df = filter_fairness_results(insurance_mppd_df, exclude_groups=['Unknown'])

print("MPPD analysis would run here.")
print("This focuses on specific reference groups for cleaner comparisons.")

In [None]:
# Create MPPD comparison plot
# UNCOMMENT FOR ACTUAL PLOTTING:
# plot_multiple_group_comparison_proba_diff(
#     [race_mppd_df, sex_mppd_df, insurance_mppd_df],
#     sensitive_attributes=['Race/Ethnicity', 'Sex', 'Insurance Type'],
#     title='Mean Prediction Probability Differences of Clinically Similar Patients\nby Sensitive Attribute Groups',
#     figsize=(16, 10),
#     save_path=None  # Set to file path to save
# )

print("MPPD comparison plot would appear here.")
print("This shows prediction differences between clinically similar patients from different groups.")
print("Larger differences indicate potential bias in the model.")

## 6. Results Interpretation Guide

### Understanding Individual Fairness Metrics:

**Fairness Heatmaps:**
- **Darker colors = Larger prediction differences** between groups
- Diagonal elements show within-group fairness
- Off-diagonal elements show between-group differences

**MPPD (Mean Prediction Probability Difference):**
- Measures prediction differences between clinically similar patients from different demographic groups
- **Higher MPPD values indicate potential bias**
- The Δ (delta) values show the range of prediction differences within each sensitive attribute

### Next Steps for Your Analysis:

1. **Replace the demo data** with your trained model's predictions
2. **Update demographic mappings** to match your data encoding 
3. **Run the full analysis** by uncommenting the analysis code blocks
4. **Interpret results** in the context of your specific use case
5. **Consider model improvements** if significant fairness issues are detected

## 7. Save Results (Optional)

Save your analysis results for future reference.

In [None]:
# Save fairness analysis results
# UNCOMMENT TO SAVE RESULTS:
# save_fairness_results(
#     [race_mppd_df, sex_mppd_df, insurance_mppd_df], 
#     suffix='_your_model_name'
# )

print("Results saving would happen here.")
print("This preserves your analysis for future comparison and reporting.")