# X6: Ethics & Bias Detection - Building Fair and Responsible AI

Machine learning models can perpetuate and amplify societal biases, leading to unfair outcomes that harm individuals and communities. In 2025, building ethical AI is not optional‚Äîit is a legal, moral, and business imperative.

This notebook teaches you how to detect, measure, and mitigate bias in machine learning systems, preparing you to build fair and responsible AI for production deployment.

## Why Ethics and Fairness Matter

### Real-World Failures

**COMPAS Recidivism Algorithm (2016)**
- Used to predict criminal reoffending
- Found to be biased against Black defendants
- False positive rate for Black defendants was nearly double that of white defendants
- Impact: Biased decisions affected real people's freedom

**Amazon Hiring Algorithm (2018)**
- Trained on historical hiring data (mostly men)
- Learned to penalize resumes containing words like "women's"
- Discriminated against female candidates
- Amazon scrapped the system

**Facial Recognition Systems**
- Higher error rates for people of color, especially women
- Led to wrongful arrests
- Many cities banned police use of facial recognition

**Google Photos (2015)**
- Tagged Black people as "gorillas"
- Massive reputational damage
- Revealed training data bias

### Legal and Regulatory Requirements (2025)

**EU AI Act**
- High-risk AI systems must undergo bias testing
- Fines up to ‚Ç¨30 million or 6% of global revenue
- Mandatory fairness assessments

**US Equal Employment Opportunity Laws**
- Algorithmic hiring tools must not discriminate
- Employers liable for biased algorithms

**Fair Housing Act, Fair Lending Laws**
- ML models for housing, credit must be demonstrably fair
- Regular audits required

### Business Case

- **Avoid lawsuits**: Discrimination lawsuits are expensive
- **Reputation**: Biased AI causes lasting brand damage
- **Market access**: Unfair models excluded from regulated markets
- **Better decisions**: Fair models often perform better
- **Talent**: Engineers increasingly refuse to work on unethical AI

## Table of Contents

1. [Understanding Bias in ML](#understanding-bias)
2. [Protected Attributes and Fairness](#protected-attributes)
3. [Fairness Metrics](#fairness-metrics)
   - Demographic Parity
   - Equalized Odds
   - Equal Opportunity
   - Predictive Parity
4. [Detecting Bias in Data](#detecting-bias-data)
5. [Detecting Bias in Models](#detecting-bias-models)
6. [Bias Mitigation Strategies](#mitigation)
   - Pre-processing (data)
   - In-processing (training)
   - Post-processing (predictions)
7. [Real-World Case Study](#case-study)
8. [Best Practices and Frameworks](#best-practices)
9. [Ethical Decision-Making](#ethical-decisions)

## Setup and Installation

In [None]:
# Install fairness libraries
import sys
import subprocess

# Install fairlearn for fairness metrics and mitigation
try:
    import fairlearn
    print(f'‚úÖ fairlearn version {fairlearn.__version__} found')
except ImportError:
    print('Installing fairlearn...')
    subprocess.check_call([sys.executable, "-m", "pip", "install", "fairlearn"])
    import fairlearn
    print(f'‚úÖ fairlearn version {fairlearn.__version__} installed')

# Install aif360 for additional bias detection and mitigation
try:
    import aif360
    print(f'‚úÖ aif360 found')
except ImportError:
    print('Installing aif360...')
    subprocess.check_call([sys.executable, "-m", "pip", "install", "aif360"])
    import aif360
    print(f'‚úÖ aif360 installed')

In [None]:
# Import core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Fairness libraries
from fairlearn.metrics import (
    MetricFrame,
    demographic_parity_difference,
    demographic_parity_ratio,
    equalized_odds_difference,
    equalized_odds_ratio,
    selection_rate
)
from fairlearn.reductions import ExponentiatedGradient, DemographicParity, EqualizedOdds
from fairlearn.postprocessing import ThresholdOptimizer

# Visualization
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
np.random.seed(42)

print('‚úÖ All libraries loaded successfully!')

<a name="understanding-bias"></a>
## 1. Understanding Bias in ML

### Sources of Bias

**Historical Bias**
- Data reflects historical discrimination
- Example: Hiring data shows mostly men in leadership ‚Üí model learns to prefer men
- **Not fixable by better algorithms alone** - requires societal awareness

**Representation Bias**
- Training data doesn't represent all groups equally
- Example: Facial recognition trained mostly on white faces ‚Üí fails on other races
- **Fix**: Collect diverse, representative data

**Measurement Bias**
- Features measured differently for different groups
- Example: Credit scores systematically lower for certain neighborhoods
- **Fix**: Audit measurement processes, use alternative features

**Aggregation Bias**
- One model for all groups when groups behave differently
- Example: Medical diagnostic model trained on adults applied to children
- **Fix**: Consider group-specific models or features

**Evaluation Bias**
- Test data doesn't represent deployment population
- Example: Testing fraud detection only on one demographic
- **Fix**: Stratified evaluation across all groups

### Types of Harm

**Allocation Harm**
- System allocates resources or opportunities unfairly
- Examples: Loan denials, job rejections, healthcare access

**Quality-of-Service Harm**
- System works better for some groups than others
- Examples: Speech recognition, facial recognition, translation

**Stereotyping Harm**
- System reinforces negative stereotypes
- Examples: Gender-biased word associations, racist image tagging

**Denigration Harm**
- System actively insults or demeans groups
- Examples: Offensive auto-completions, hate speech generation

**Representation Harm**
- System over- or under-represents certain groups
- Examples: Search results, recommendation systems

<a name="protected-attributes"></a>
## 2. Protected Attributes and Fairness

### Protected Attributes (Sensitive Features)

Attributes that should not be used to discriminate:
- **Race / Ethnicity**
- **Gender / Sex**
- **Age**
- **Religion**
- **Disability status**
- **Sexual orientation**
- **National origin**

### The "Fairness Through Unawareness" Myth

**Myth**: "If we don't include race/gender in the model, it will be fair."

**Reality**: This doesn't work because:
1. **Proxy variables**: Other features correlate with protected attributes
   - ZIP code correlates with race
   - Name correlates with gender and ethnicity
   - Alma mater correlates with socioeconomic status

2. **Historical bias in labels**: Even if you exclude protected attributes, historical discrimination is baked into the labels

**Correct approach**: 
- Measure fairness metrics using protected attributes
- Don't necessarily exclude them from training (depends on context)
- Apply bias mitigation techniques
- Regular fairness audits

## Create Synthetic Dataset for Demonstration

We'll create a synthetic loan approval dataset that exhibits realistic bias patterns.

In [None]:
# Create synthetic biased dataset for loan approval
np.random.seed(42)
n_samples = 5000

# Protected attribute: gender
gender = np.random.choice(['Male', 'Female'], size=n_samples, p=[0.55, 0.45])

# Features that correlate with loan approval
# Income (with gender bias - women systematically paid less historically)
income_male = np.random.normal(75000, 25000, size=(gender == 'Male').sum())
income_female = np.random.normal(65000, 22000, size=(gender == 'Female').sum())
income = np.concatenate([income_male[gender == 'Male'], income_female[gender == 'Female']])
income = np.maximum(income, 20000)  # Minimum income

# Credit score (with slight gender bias)
credit_score_male = np.random.normal(720, 60, size=(gender == 'Male').sum())
credit_score_female = np.random.normal(710, 65, size=(gender == 'Female').sum())
credit_score = np.concatenate([credit_score_male[gender == 'Male'], 
                               credit_score_female[gender == 'Female']])
credit_score = np.clip(credit_score, 300, 850)

# Loan amount requested
loan_amount = np.random.normal(250000, 100000, size=n_samples)
loan_amount = np.maximum(loan_amount, 50000)

# Years employed
years_employed = np.random.exponential(5, size=n_samples)
years_employed = np.clip(years_employed, 0, 40)

# Debt-to-income ratio
debt_to_income = np.random.beta(2, 5, size=n_samples) * 0.6

# Loan approval (with historical bias)
# Base probability from legitimate factors
base_prob = (
    0.3 * (credit_score - 300) / 550 +
    0.25 * (income - 20000) / 180000 +
    0.15 * (1 - debt_to_income / 0.6) +
    0.15 * np.minimum(years_employed / 10, 1) +
    0.15 * (1 - (loan_amount - 50000) / 500000)
)

# Add gender bias (historical discrimination)
# Women have systematically lower approval rates even with same qualifications
gender_bias = np.where(gender == 'Male', 0.1, -0.1)
approval_prob = np.clip(base_prob + gender_bias, 0, 1)

# Generate approvals
approved = (np.random.random(n_samples) < approval_prob).astype(int)

# Create DataFrame
df = pd.DataFrame({
    'gender': gender,
    'income': income,
    'credit_score': credit_score,
    'loan_amount': loan_amount,
    'years_employed': years_employed,
    'debt_to_income': debt_to_income,
    'approved': approved
})

print('Synthetic Loan Approval Dataset Created')
print(f'Total samples: {len(df):,}')
print(f'\nGender distribution:')
print(df['gender'].value_counts())
print(f'\nOverall approval rate: {df["approved"].mean():.2%}')
print(f'\nApproval rate by gender:')
print(df.groupby('gender')['approved'].mean())
print(f'\n‚ö†Ô∏è Notice the approval rate difference - this is the bias we need to detect and address!')

df.head()

In [None]:
# Visualize bias in data
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# Income distribution by gender
df.boxplot(column='income', by='gender', ax=axes[0, 0])
axes[0, 0].set_title('Income Distribution by Gender')
axes[0, 0].set_ylabel('Income ($)')

# Credit score by gender
df.boxplot(column='credit_score', by='gender', ax=axes[0, 1])
axes[0, 1].set_title('Credit Score by Gender')
axes[0, 1].set_ylabel('Credit Score')

# Approval rate by gender
approval_by_gender = df.groupby('gender')['approved'].mean()
axes[0, 2].bar(approval_by_gender.index, approval_by_gender.values, 
               color=['steelblue', 'coral'], alpha=0.7, edgecolor='black')
axes[0, 2].set_title('Approval Rate by Gender', fontweight='bold')
axes[0, 2].set_ylabel('Approval Rate')
axes[0, 2].set_ylim([0, 1])
for i, v in enumerate(approval_by_gender.values):
    axes[0, 2].text(i, v + 0.02, f'{v:.2%}', ha='center', fontweight='bold')

# Income vs Credit Score colored by approval
for gender_val in ['Male', 'Female']:
    gender_data = df[df['gender'] == gender_val]
    axes[1, 0].scatter(gender_data['income'], gender_data['credit_score'],
                      c=gender_data['approved'], cmap='RdYlGn',
                      alpha=0.5, label=gender_val, s=20)
axes[1, 0].set_xlabel('Income ($)')
axes[1, 0].set_ylabel('Credit Score')
axes[1, 0].set_title('Income vs Credit Score\n(Green=Approved, Red=Denied)')
axes[1, 0].legend()

# Approval rate by income quintile and gender
df['income_quintile'] = pd.qcut(df['income'], q=5, labels=['Q1', 'Q2', 'Q3', 'Q4', 'Q5'])
approval_by_quintile = df.groupby(['income_quintile', 'gender'])['approved'].mean().unstack()
approval_by_quintile.plot(kind='bar', ax=axes[1, 1], color=['steelblue', 'coral'], alpha=0.7)
axes[1, 1].set_title('Approval Rate by Income Quintile and Gender')
axes[1, 1].set_xlabel('Income Quintile')
axes[1, 1].set_ylabel('Approval Rate')
axes[1, 1].legend(title='Gender')
axes[1, 1].set_xticklabels(axes[1, 1].get_xticklabels(), rotation=0)

# Approval rate by credit score range and gender
df['credit_range'] = pd.cut(df['credit_score'], bins=[300, 600, 650, 700, 750, 850],
                            labels=['<600', '600-650', '650-700', '700-750', '750+'])
approval_by_credit = df.groupby(['credit_range', 'gender'])['approved'].mean().unstack()
approval_by_credit.plot(kind='bar', ax=axes[1, 2], color=['steelblue', 'coral'], alpha=0.7)
axes[1, 2].set_title('Approval Rate by Credit Score and Gender')
axes[1, 2].set_xlabel('Credit Score Range')
axes[1, 2].set_ylabel('Approval Rate')
axes[1, 2].legend(title='Gender')
axes[1, 2].set_xticklabels(axes[1, 2].get_xticklabels(), rotation=45)

plt.tight_layout()
plt.show()

print('\nüîç Key Observations:')
print('  ‚Ä¢ Women have systematically lower approval rates across ALL income and credit levels')
print('  ‚Ä¢ This suggests bias beyond just income/credit differences')
print('  ‚Ä¢ A model trained on this data will learn and perpetuate this bias')

<a name="fairness-metrics"></a>
## 3. Fairness Metrics

### The Impossibility Theorem

**Important**: You generally **cannot** satisfy all fairness metrics simultaneously (except in trivial cases). You must choose which definition of fairness is most appropriate for your context.

### Key Fairness Metrics

#### 1. Demographic Parity (Statistical Parity)
**Definition**: Positive prediction rate should be the same across groups

$P(\hat{Y}=1 | A=a) = P(\hat{Y}=1 | A=b)$ for all groups $a, b$

**When to use**: 
- When you want equal representation in outcomes
- University admissions, job screening (first round)

**Limitation**: Ignores differences in qualification/merit

#### 2. Equalized Odds
**Definition**: True positive rate AND false positive rate should be equal across groups

$P(\hat{Y}=1 | Y=y, A=a) = P(\hat{Y}=1 | Y=y, A=b)$ for $y \in \{0,1\}$ and all groups $a, b$

**When to use**:
- When both types of errors matter equally
- Healthcare diagnostics, loan approval

**Interpretation**: Model makes same mistakes across all groups

#### 3. Equal Opportunity
**Definition**: True positive rate should be equal across groups (relaxed equalized odds)

$P(\hat{Y}=1 | Y=1, A=a) = P(\hat{Y}=1 | Y=1, A=b)$ for all groups $a, b$

**When to use**:
- When false negatives are more harmful than false positives
- Disease screening, fraud detection

#### 4. Predictive Parity (Precision Parity)
**Definition**: Precision should be equal across groups

$P(Y=1 | \hat{Y}=1, A=a) = P(Y=1 | \hat{Y}=1, A=b)$ for all groups $a, b$

**When to use**:
- When false positives are particularly harmful
- Criminal justice, accusatory systems

In [None]:
# Train a baseline model (will inherit bias from data)
# Create features and target
feature_cols = ['income', 'credit_score', 'loan_amount', 'years_employed', 'debt_to_income']
X = df[feature_cols]
y = df['approved']
sensitive_feature = df['gender']

# Split data
X_train, X_test, y_train, y_test, sensitive_train, sensitive_test = train_test_split(
    X, y, sensitive_feature, test_size=0.2, random_state=42, stratify=sensitive_feature
)

# Train baseline model
baseline_model = LogisticRegression(random_state=42, max_iter=1000)
baseline_model.fit(X_train, y_train)

# Predictions
y_pred_baseline = baseline_model.predict(X_test)

# Overall accuracy
baseline_accuracy = accuracy_score(y_test, y_pred_baseline)
print(f'Baseline Model Accuracy: {baseline_accuracy:.4f}')
print(f'\nOverall Classification Report:')
print(classification_report(y_test, y_pred_baseline, target_names=['Denied', 'Approved']))

In [None]:
# Calculate fairness metrics for baseline model
from sklearn.metrics import recall_score, precision_score

# Create MetricFrame for disaggregated metrics
metrics = {
    'accuracy': accuracy_score,
    'precision': precision_score,
    'recall': recall_score,
    'selection_rate': selection_rate
}

metric_frame = MetricFrame(
    metrics=metrics,
    y_true=y_test,
    y_pred=y_pred_baseline,
    sensitive_features=sensitive_test
)

print('üìä Disaggregated Metrics by Gender:\n')
print(metric_frame.by_group)

print('\nüìä Fairness Metric Summary:\n')
print(f'Overall metrics:')
print(metric_frame.overall)

# Calculate specific fairness metrics
dp_diff = demographic_parity_difference(y_test, y_pred_baseline, sensitive_features=sensitive_test)
dp_ratio = demographic_parity_ratio(y_test, y_pred_baseline, sensitive_features=sensitive_test)
eo_diff = equalized_odds_difference(y_test, y_pred_baseline, sensitive_features=sensitive_test)

print(f'\n‚öñÔ∏è Fairness Metrics:')
print(f'  Demographic Parity Difference: {dp_diff:.4f}')
print(f'  Demographic Parity Ratio: {dp_ratio:.4f}')
print(f'  Equalized Odds Difference: {eo_diff:.4f}')

print(f'\nüéØ Interpretation:')
print(f'  ‚Ä¢ Demographic Parity Difference = {abs(dp_diff):.4f}')
print(f'    This means selection rates differ by {abs(dp_diff):.1%} between groups')
print(f'  ‚Ä¢ Demographic Parity Ratio = {dp_ratio:.4f}')
print(f'    Ratio should be close to 1.0 for fairness (0.8-1.2 is often considered acceptable)')
print(f'  ‚Ä¢ Equalized Odds Difference = {eo_diff:.4f}')
print(f'    Measures difference in error rates across groups')

if abs(dp_diff) > 0.1:
    print(f'\n‚ö†Ô∏è WARNING: Significant demographic parity violation detected!')
    print(f'   The model approves loans at different rates for different genders.')

In [None]:
# Visualize fairness metrics
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Selection rate by group
selection_rates = metric_frame.by_group['selection_rate']
axes[0].bar(selection_rates.index, selection_rates.values, 
            color=['steelblue', 'coral'], alpha=0.7, edgecolor='black')
axes[0].set_ylabel('Selection Rate (Approval Rate)', fontweight='bold')
axes[0].set_title('Selection Rate by Gender\n(Demographic Parity)', fontweight='bold')
axes[0].set_ylim([0, 1])
axes[0].axhline(y=metric_frame.overall['selection_rate'], color='red', 
                linestyle='--', label='Overall Rate')
axes[0].legend()
for i, v in enumerate(selection_rates.values):
    axes[0].text(i, v + 0.02, f'{v:.2%}', ha='center', fontweight='bold')

# True Positive Rate (Recall) by group
tpr = metric_frame.by_group['recall']
axes[1].bar(tpr.index, tpr.values, 
            color=['steelblue', 'coral'], alpha=0.7, edgecolor='black')
axes[1].set_ylabel('True Positive Rate (Recall)', fontweight='bold')
axes[1].set_title('TPR by Gender\n(Equal Opportunity)', fontweight='bold')
axes[1].set_ylim([0, 1])
axes[1].axhline(y=metric_frame.overall['recall'], color='red', 
                linestyle='--', label='Overall TPR')
axes[1].legend()
for i, v in enumerate(tpr.values):
    axes[1].text(i, v + 0.02, f'{v:.2%}', ha='center', fontweight='bold')

# Precision by group
precision = metric_frame.by_group['precision']
axes[2].bar(precision.index, precision.values, 
            color=['steelblue', 'coral'], alpha=0.7, edgecolor='black')
axes[2].set_ylabel('Precision', fontweight='bold')
axes[2].set_title('Precision by Gender\n(Predictive Parity)', fontweight='bold')
axes[2].set_ylim([0, 1])
axes[2].axhline(y=metric_frame.overall['precision'], color='red', 
                linestyle='--', label='Overall Precision')
axes[2].legend()
for i, v in enumerate(precision.values):
    axes[2].text(i, v + 0.02, f'{v:.2%}', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

print('\nüìä What These Plots Show:')
print('  Left: Demographic Parity - Are approval rates equal across groups?')
print('  Middle: Equal Opportunity - Do qualified applicants have equal approval rates?')
print('  Right: Predictive Parity - Is precision (% of approvals that should be approved) equal?')

<a name="mitigation"></a>
## 6. Bias Mitigation Strategies

### Three Approaches

**Pre-processing** (Fix the data)
- Reweighting samples
- Resampling
- Learning fair representations

**In-processing** (Fix the algorithm)
- Add fairness constraints during training
- Adversarial debiasing
- Fairness-aware regularization

**Post-processing** (Fix the predictions)
- Threshold optimization
- Calibration
- Reject option classification

In [None]:
# Mitigation Strategy 1: In-processing with Fairness Constraints
# Using Exponentiated Gradient with Demographic Parity constraint

print('Training fair model with Demographic Parity constraint...\n')

# Create base estimator
estimator = LogisticRegression(random_state=42, max_iter=1000)

# Apply fairness constraint (Demographic Parity)
mitigator_dp = ExponentiatedGradient(
    estimator=estimator,
    constraints=DemographicParity(),
    max_iter=50
)

# Fit fair model
mitigator_dp.fit(X_train, y_train, sensitive_features=sensitive_train)

# Predictions
y_pred_fair_dp = mitigator_dp.predict(X_test)

# Evaluate
fair_dp_accuracy = accuracy_score(y_test, y_pred_fair_dp)
print(f'Fair Model (Demographic Parity) Accuracy: {fair_dp_accuracy:.4f}')
print(f'Baseline Model Accuracy: {baseline_accuracy:.4f}')
print(f'Accuracy change: {fair_dp_accuracy - baseline_accuracy:+.4f}')

In [None]:
# Compare fairness metrics: Baseline vs Fair Model
metric_frame_fair_dp = MetricFrame(
    metrics=metrics,
    y_true=y_test,
    y_pred=y_pred_fair_dp,
    sensitive_features=sensitive_test
)

# Calculate fairness metrics for fair model
dp_diff_fair = demographic_parity_difference(y_test, y_pred_fair_dp, sensitive_features=sensitive_test)
dp_ratio_fair = demographic_parity_ratio(y_test, y_pred_fair_dp, sensitive_features=sensitive_test)
eo_diff_fair = equalized_odds_difference(y_test, y_pred_fair_dp, sensitive_features=sensitive_test)

# Create comparison table
comparison = pd.DataFrame({
    'Metric': ['Accuracy', 'Demographic Parity Diff', 'Demographic Parity Ratio', 'Equalized Odds Diff'],
    'Baseline Model': [baseline_accuracy, dp_diff, dp_ratio, eo_diff],
    'Fair Model (DP)': [fair_dp_accuracy, dp_diff_fair, dp_ratio_fair, eo_diff_fair],
    'Change': [
        fair_dp_accuracy - baseline_accuracy,
        dp_diff_fair - dp_diff,
        dp_ratio_fair - dp_ratio,
        eo_diff_fair - eo_diff
    ]
})

print('\nüìä Baseline vs Fair Model Comparison:\n')
print(comparison.to_string(index=False))

print('\n‚úÖ Improvements:')
print(f'  ‚Ä¢ Demographic Parity Difference reduced by {abs(dp_diff - dp_diff_fair):.4f}')
print(f'  ‚Ä¢ Demographic Parity Ratio now: {dp_ratio_fair:.4f} (closer to 1.0 = fairer)')
print(f'  ‚Ä¢ Small accuracy tradeoff: {baseline_accuracy - fair_dp_accuracy:.4f}')
print('\nüí° This demonstrates the accuracy-fairness tradeoff!')

In [None]:
# Mitigation Strategy 2: Post-processing with Threshold Optimization
# Optimize thresholds per group to achieve Equalized Odds

print('Training fair model with post-processing (Threshold Optimization)...\n')

# First train regular model and get probability predictions
baseline_model_proba = LogisticRegression(random_state=42, max_iter=1000)
baseline_model_proba.fit(X_train, y_train)
y_pred_proba = baseline_model_proba.predict_proba(X_test)[:, 1]

# Apply threshold optimizer for Equalized Odds
threshold_optimizer = ThresholdOptimizer(
    estimator=baseline_model_proba,
    constraints='equalized_odds',
    predict_method='predict_proba'
)

threshold_optimizer.fit(X_train, y_train, sensitive_features=sensitive_train)
y_pred_threshold = threshold_optimizer.predict(X_test, sensitive_features=sensitive_test)

# Evaluate
threshold_accuracy = accuracy_score(y_test, y_pred_threshold)
print(f'Threshold-Optimized Model Accuracy: {threshold_accuracy:.4f}')
print(f'Baseline Model Accuracy: {baseline_accuracy:.4f}')

# Fairness metrics
dp_diff_threshold = demographic_parity_difference(y_test, y_pred_threshold, sensitive_features=sensitive_test)
eo_diff_threshold = equalized_odds_difference(y_test, y_pred_threshold, sensitive_features=sensitive_test)

print(f'\nFairness Metrics:')
print(f'  Demographic Parity Diff: {dp_diff_threshold:.4f} (was {dp_diff:.4f})')
print(f'  Equalized Odds Diff: {eo_diff_threshold:.4f} (was {eo_diff:.4f})')
print(f'\n‚úÖ Equalized Odds violation reduced by {abs(eo_diff - eo_diff_threshold):.4f}!')

In [None]:
# Comprehensive comparison visualization
models = ['Baseline', 'Fair (DP Constraint)', 'Threshold Optimized']
accuracies = [baseline_accuracy, fair_dp_accuracy, threshold_accuracy]
dp_diffs = [abs(dp_diff), abs(dp_diff_fair), abs(dp_diff_threshold)]
eo_diffs = [abs(eo_diff), abs(eo_diff_fair), abs(eo_diff_threshold)]

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Accuracy comparison
axes[0].bar(models, accuracies, color=['red', 'orange', 'green'], alpha=0.7, edgecolor='black')
axes[0].set_ylabel('Accuracy', fontweight='bold')
axes[0].set_title('Model Accuracy Comparison', fontweight='bold', fontsize=13)
axes[0].set_ylim([0.5, 1.0])
for i, v in enumerate(accuracies):
    axes[0].text(i, v + 0.01, f'{v:.3f}', ha='center', fontweight='bold')

# Demographic Parity violation
axes[1].bar(models, dp_diffs, color=['red', 'orange', 'green'], alpha=0.7, edgecolor='black')
axes[1].set_ylabel('Demographic Parity Difference (abs)', fontweight='bold')
axes[1].set_title('Fairness: Demographic Parity\n(Lower = Fairer)', fontweight='bold', fontsize=13)
axes[1].axhline(y=0.1, color='orange', linestyle='--', label='Acceptable threshold')
axes[1].legend()
for i, v in enumerate(dp_diffs):
    axes[1].text(i, v + 0.005, f'{v:.3f}', ha='center', fontweight='bold')

# Equalized Odds violation
axes[2].bar(models, eo_diffs, color=['red', 'orange', 'green'], alpha=0.7, edgecolor='black')
axes[2].set_ylabel('Equalized Odds Difference (abs)', fontweight='bold')
axes[2].set_title('Fairness: Equalized Odds\n(Lower = Fairer)', fontweight='bold', fontsize=13)
axes[2].axhline(y=0.1, color='orange', linestyle='--', label='Acceptable threshold')
axes[2].legend()
for i, v in enumerate(eo_diffs):
    axes[2].text(i, v + 0.005, f'{v:.3f}', ha='center', fontweight='bold')

plt.tight_layout()
plt.show()

print('\nüéØ Key Takeaways:')
print('  1. Baseline model is most accurate but LEAST fair')
print('  2. Fair models reduce bias significantly with small accuracy cost')
print('  3. Different mitigation strategies optimize for different fairness definitions')
print('  4. You must choose the fairness metric appropriate for your use case')
print('\n‚ö†Ô∏è The accuracy-fairness tradeoff is REAL but often acceptable')
print('   A few percentage points of accuracy is a small price for fairness!')

<a name="best-practices"></a>
## 8. Best Practices and Frameworks

### Production ML Fairness Checklist

**Before Training**:
- [ ] Audit data for representation bias (are all groups represented?)
- [ ] Examine label distribution across protected groups
- [ ] Check for proxy variables (features that correlate with protected attributes)
- [ ] Document data collection process and potential bias sources
- [ ] Define which fairness metric(s) matter for your use case

**During Training**:
- [ ] Track metrics separately for each protected group
- [ ] Consider fairness constraints if needed
- [ ] Document model architecture and hyperparameters
- [ ] Save training data statistics and distributions

**After Training**:
- [ ] Measure all relevant fairness metrics
- [ ] Create fairness report card
- [ ] Test on held-out data from all groups
- [ ] Perform error analysis by subgroup
- [ ] Consider post-processing if fairness violations detected

**Deployment**:
- [ ] Monitor fairness metrics in production
- [ ] Set up alerts for fairness metric degradation
- [ ] Regular fairness audits (quarterly minimum)
- [ ] Document model limitations and known biases
- [ ] Provide mechanism for users to appeal decisions

### Frameworks and Tools

**Fairlearn** (Microsoft)
- Fairness metrics and mitigation algorithms
- Dashboard for comparing models
- Python library, well-documented

**AIF360** (IBM)
- 70+ fairness metrics
- 10+ bias mitigation algorithms
- More comprehensive than Fairlearn

**What-If Tool** (Google)
- Visual interface for exploring model behavior
- Slice analysis by subgroups
- Counterfactual analysis

**Aequitas** (University of Chicago)
- Bias and fairness audit toolkit
- Focus on criminal justice

### Choosing the Right Fairness Metric

| Use Case | Primary Fairness Metric | Reasoning |
|----------|------------------------|----------|
| **Loan Approval** | Equalized Odds | Both false positives and false negatives cause harm |
| **Disease Screening** | Equal Opportunity | Missing a disease (false negative) is worst error |
| **University Admissions** | Demographic Parity | Want representation from all groups |
| **Hiring (initial screen)** | Demographic Parity | Equal access to opportunity |
| **Criminal Risk Assessment** | Equalized Odds + Calibration | High stakes, both errors matter |
| **Content Moderation** | False Positive Rate Parity | Wrongly censoring speech is harmful |

### When Fairness Interventions Don't Work

Sometimes technical solutions aren't enough:

1. **Problem is upstream**: If data collection is fundamentally flawed, no algorithm fixes it
2. **Prediction task itself is harmful**: Maybe you shouldn't build this model at all
3. **Societal bias too strong**: Historical discrimination may be impossible to fully correct
4. **Wrong problem framing**: You're optimizing the wrong objective

**Example**: Recidivism prediction
- Even "fair" models may perpetuate mass incarceration
- The question isn't "how do we make fair predictions?"
- The question is "should we be making these predictions at all?"

<a name="ethical-decisions"></a>
## 9. Ethical Decision-Making Framework

### Questions to Ask Before Building Any ML System

**1. Should this system exist?**
- What problem does it solve?
- Who benefits? Who is harmed?
- Are there less risky alternatives?

**2. Who is affected?**
- Who are the stakeholders?
- Who has power? Who is vulnerable?
- Are affected communities consulted?

**3. What are the failure modes?**
- How does the system fail?
- Who bears the cost of failures?
- Can failures be detected and corrected?

**4. What are the long-term effects?**
- Feedback loops that amplify bias?
- Concentration of power?
- Societal implications?

**5. Is there accountability?**
- Who is responsible when things go wrong?
- Can decisions be appealed?
- Is there transparency?

### Red Flags (Don't Build This)

üö´ **Target variable is itself biased**
- Example: "Predict who will be a good employee" when historical hires are biased

üö´ **High-stakes decisions on vulnerable populations without human oversight**
- Example: Fully automated benefit denials

üö´ **Surveillance or control of marginalized groups**
- Example: Predictive policing in over-policed neighborhoods

üö´ **Impossible to achieve acceptable fairness level**
- Example: Using fundamentally biased data with no alternative

üö´ **No mechanism for recourse**
- Example: Opaque decisions that cannot be appealed

### Green Lights (Good Practices)

‚úÖ **Augments human decision-making, doesn't replace it**
- Human has final say, algorithm provides input

‚úÖ **Measurable benefit to affected communities**
- Not just efficiency for organization

‚úÖ **Robust fairness monitoring and correction**
- Regular audits, clear thresholds for intervention

‚úÖ **Transparency and explainability**
- Affected individuals understand how decisions are made

‚úÖ **Participatory design**
- Affected communities involved in system design

‚úÖ **Clear accountability and recourse**
- Someone responsible, decisions can be appealed

## Conclusion: Building Fair and Responsible AI

### Key Takeaways

1. **Bias is inevitable** - All ML systems can perpetuate bias. Your job is to measure and mitigate it.

2. **Fairness is not automatic** - Removing protected attributes from features doesn't make models fair.

3. **Trade-offs are real** - You typically sacrifice some accuracy for fairness. That's usually okay.

4. **No single fairness definition** - Choose the metric that matches your ethical and legal requirements.

5. **Measure everything** - Track fairness metrics by subgroup. What you don't measure, you can't fix.

6. **Document thoroughly** - Record data sources, model decisions, known limitations.

7. **Monitor continuously** - Fairness can degrade over time. Set up production monitoring.

8. **Sometimes don't build it** - Some problems are better not solved with ML.

### Production Workflow

```python
# Recommended production fairness workflow
def fair_ml_pipeline(X_train, y_train, sensitive_features, fairness_metric):
    """
    Complete fair ML pipeline.
    
    Steps:
    1. Audit data for bias
    2. Train baseline model
    3. Measure fairness metrics
    4. Apply mitigation if needed
    5. Generate fairness report
    6. Set up monitoring
    """
    # 1. Data audit
    audit_data(X_train, y_train, sensitive_features)
    
    # 2. Train baseline
    baseline = train_baseline(X_train, y_train)
    
    # 3. Measure fairness
    metrics = measure_fairness(baseline, X_test, y_test, sensitive_features)
    
    # 4. Mitigate if needed
    if metrics[fairness_metric] > threshold:
        fair_model = apply_mitigation(baseline, fairness_metric)
    else:
        fair_model = baseline
    
    # 5. Generate report
    create_fairness_report(fair_model, metrics)
    
    # 6. Set up monitoring
    setup_fairness_monitoring(fair_model, sensitive_features)
    
    return fair_model
```

### Resources for Continued Learning

**Books**:
- *Weapons of Math Destruction* by Cathy O'Neil
- *Race After Technology* by Ruha Benjamin
- *Fairness and Machine Learning* by Barocas, Hardt, Narayanan (free online)

**Tools**:
- Fairlearn: https://fairlearn.org/
- AIF360: https://aif360.mybluemix.net/
- What-If Tool: https://pair-code.github.io/what-if-tool/

**Papers**:
- *Fairness Definitions Explained* (Verma & Rubin, 2018)
- *A Survey on Bias and Fairness in Machine Learning* (Mehrabi et al., 2021)

**Organizations**:
- Partnership on AI: https://partnershiponai.org/
- AI Now Institute: https://ainowinstitute.org/

### Final Thoughts

**Ethics is not optional in 2025.**

Between legal requirements (EU AI Act, anti-discrimination laws), reputational risks, and moral imperatives, you MUST build fair AI systems.

The techniques in this notebook - fairness metrics, bias detection, mitigation algorithms - are your tools. But tools alone aren't enough. You need:

- **Critical thinking**: Question whether the system should exist
- **Stakeholder engagement**: Listen to affected communities  
- **Humility**: Acknowledge what you don't know
- **Courage**: Speak up when you see harmful systems

**You are responsible for the systems you build.**

Use your skills wisely. Build AI that helps people, not harms them. And when in doubt, ask: "Would I want this algorithm making decisions about my life?"

If the answer is no, don't build it.