# üéØ Constitutional AI & Medical Ethics: Hands-On Practice

## üìö Table of Contents
1. [Bias Detection in Medical AI](#practice-1-bias-detection-in-medical-ai)
2. [Fairness Metrics Calculation](#practice-2-fairness-metrics-calculation)
3. [Risk-Benefit Analysis](#practice-3-risk-benefit-analysis)
4. [Output Filtering Simulation](#practice-4-output-filtering-simulation)

**‚è±Ô∏è Estimated Time: 15 minutes**

---

## Installing and Importing Essential Libraries

In [None]:
# Import essential libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')

# Visualization settings
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11
sns.set_style('whitegrid')

print("‚úÖ All libraries loaded successfully!")
print("üìä Ready for Constitutional AI Ethics Practice")

---
## Practice 1: Bias Detection in Medical AI

### üéØ Learning Objectives
- Detect performance disparities across demographic groups
- Calculate group-specific accuracy metrics
- Identify potential bias in AI predictions

### üìñ Key Concepts
**Demographic Parity:** Equal positive prediction rates across all groups  
**Equalized Odds:** Equal TPR and FPR across all groups

In [None]:
# 1.1 Generate simulated medical AI predictions
def generate_medical_predictions():
    """Simulate medical AI predictions with potential bias"""
    np.random.seed(42)
    
    # Create dataset with two demographic groups (A and B)
    n_samples = 500
    
    # Group A (e.g., majority population)
    group_a_true = np.random.binomial(1, 0.3, n_samples//2)  # 30% disease rate
    group_a_pred = np.where(group_a_true == 1, 
                            np.random.binomial(1, 0.90, n_samples//2),  # 90% sensitivity
                            np.random.binomial(1, 0.10, n_samples//2))  # 10% false positive
    
    # Group B (e.g., minority population) - with bias
    group_b_true = np.random.binomial(1, 0.3, n_samples//2)  # Same 30% disease rate
    group_b_pred = np.where(group_b_true == 1,
                            np.random.binomial(1, 0.70, n_samples//2),  # 70% sensitivity (LOWER!)
                            np.random.binomial(1, 0.15, n_samples//2))  # 15% false positive
    
    # Combine data
    df = pd.DataFrame({
        'group': ['A']*len(group_a_true) + ['B']*len(group_b_true),
        'true_label': np.concatenate([group_a_true, group_b_true]),
        'predicted': np.concatenate([group_a_pred, group_b_pred])
    })
    
    return df

# Generate data
medical_data = generate_medical_predictions()

print("üìä Medical AI Prediction Data Generated")
print(f"Total samples: {len(medical_data)}")
print(f"\nGroup distribution:")
print(medical_data['group'].value_counts())
print(f"\nDisease prevalence by group:")
print(medical_data.groupby('group')['true_label'].mean())

In [None]:
# 1.2 Calculate performance metrics by group
def evaluate_fairness(df):
    """Evaluate AI fairness across demographic groups"""
    
    results = {}
    
    for group in df['group'].unique():
        group_data = df[df['group'] == group]
        y_true = group_data['true_label']
        y_pred = group_data['predicted']
        
        # Confusion matrix
        tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
        
        # Calculate metrics
        results[group] = {
            'Accuracy': (tp + tn) / (tp + tn + fp + fn),
            'TPR (Sensitivity)': tp / (tp + fn) if (tp + fn) > 0 else 0,
            'FPR': fp / (fp + tn) if (fp + tn) > 0 else 0,
            'PPV (Precision)': tp / (tp + fp) if (tp + fp) > 0 else 0,
            'Positive Rate': (tp + fp) / len(group_data)
        }
    
    return pd.DataFrame(results).T

# Evaluate fairness
fairness_metrics = evaluate_fairness(medical_data)

print("‚öñÔ∏è Fairness Evaluation Results")
print("=" * 60)
print(fairness_metrics.round(3))
print("\nüìå Key Observations:")
print(f"  ‚Ä¢ TPR difference: {abs(fairness_metrics.loc['A', 'TPR (Sensitivity)'] - fairness_metrics.loc['B', 'TPR (Sensitivity)']):.3f}")
print(f"  ‚Ä¢ Accuracy difference: {abs(fairness_metrics.loc['A', 'Accuracy'] - fairness_metrics.loc['B', 'Accuracy']):.3f}")

if abs(fairness_metrics.loc['A', 'TPR (Sensitivity)'] - fairness_metrics.loc['B', 'TPR (Sensitivity)']) > 0.1:
    print("\n‚ö†Ô∏è WARNING: Significant bias detected! Group B has lower sensitivity.")
else:
    print("\n‚úÖ No significant bias detected.")

In [None]:
# 1.3 Visualize bias
def visualize_bias(metrics_df):
    """Visualize performance disparities across groups"""
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Plot 1: Bar chart comparison
    metrics_to_plot = ['Accuracy', 'TPR (Sensitivity)', 'PPV (Precision)']
    metrics_df[metrics_to_plot].plot(kind='bar', ax=axes[0], rot=0, width=0.7)
    axes[0].set_title('Performance Metrics by Group', fontsize=14, fontweight='bold')
    axes[0].set_ylabel('Score')
    axes[0].set_ylim(0, 1)
    axes[0].legend(loc='lower right')
    axes[0].grid(axis='y', alpha=0.3)
    
    # Plot 2: Gap visualization
    gaps = metrics_df.loc['A'] - metrics_df.loc['B']
    colors = ['red' if x > 0.05 else 'green' for x in gaps]
    axes[1].barh(gaps.index, gaps.values, color=colors, alpha=0.7)
    axes[1].axvline(x=0, color='black', linestyle='--', linewidth=1)
    axes[1].axvline(x=0.05, color='orange', linestyle=':', linewidth=1, label='5% threshold')
    axes[1].axvline(x=-0.05, color='orange', linestyle=':', linewidth=1)
    axes[1].set_title('Performance Gap (Group A - Group B)', fontsize=14, fontweight='bold')
    axes[1].set_xlabel('Difference')
    axes[1].legend()
    axes[1].grid(axis='x', alpha=0.3)
    
    plt.tight_layout()
    plt.show()

visualize_bias(fairness_metrics)

---
## Practice 2: Fairness Metrics Calculation

### üéØ Learning Objectives
- Calculate Disparate Impact Ratio (4/5ths rule)
- Understand different fairness definitions
- Apply fairness thresholds

### üìñ Key Concepts
**Disparate Impact Ratio:** P(≈∂=1|Group=B) / P(≈∂=1|Group=A)  
**4/5ths Rule:** Ratio should be ‚â• 0.8 to be considered fair

In [None]:
# 2.1 Calculate Disparate Impact Ratio
def calculate_disparate_impact(df):
    """Calculate disparate impact ratio and apply 4/5ths rule"""
    
    # Positive prediction rates by group
    positive_rates = df.groupby('group')['predicted'].mean()
    
    print("üìä Disparate Impact Analysis")
    print("=" * 60)
    print("\nPositive Prediction Rates:")
    for group in positive_rates.index:
        print(f"  Group {group}: {positive_rates[group]:.3f} ({positive_rates[group]*100:.1f}%)")
    
    # Calculate Disparate Impact Ratio
    dir_ratio = positive_rates['B'] / positive_rates['A']
    
    print(f"\n‚öñÔ∏è Disparate Impact Ratio: {dir_ratio:.3f}")
    print(f"   Formula: P(≈∂=1|B) / P(≈∂=1|A) = {positive_rates['B']:.3f} / {positive_rates['A']:.3f}")
    
    # Apply 4/5ths rule
    threshold = 0.8
    print(f"\nüìè 4/5ths Rule Threshold: {threshold}")
    
    if dir_ratio >= threshold:
        print(f"‚úÖ PASS: Ratio {dir_ratio:.3f} ‚â• {threshold} - No disparate impact detected")
    else:
        print(f"‚ùå FAIL: Ratio {dir_ratio:.3f} < {threshold} - Disparate impact detected!")
        print(f"   ‚ö†Ô∏è Group B receives {(1-dir_ratio)*100:.1f}% fewer positive predictions than Group A")
    
    return dir_ratio, positive_rates

di_ratio, pos_rates = calculate_disparate_impact(medical_data)

In [None]:
# 2.2 Compare multiple fairness definitions
def compare_fairness_definitions(metrics_df):
    """Compare different fairness criteria"""
    
    print("‚öñÔ∏è Fairness Criteria Comparison")
    print("=" * 60)
    
    # 1. Demographic Parity
    pos_rate_diff = abs(metrics_df.loc['A', 'Positive Rate'] - metrics_df.loc['B', 'Positive Rate'])
    print(f"\n1Ô∏è‚É£ Demographic Parity")
    print(f"   Positive rate difference: {pos_rate_diff:.3f}")
    print(f"   Status: {'‚úÖ PASS' if pos_rate_diff < 0.05 else '‚ùå FAIL'} (threshold: 0.05)")
    
    # 2. Equalized Odds
    tpr_diff = abs(metrics_df.loc['A', 'TPR (Sensitivity)'] - metrics_df.loc['B', 'TPR (Sensitivity)'])
    fpr_diff = abs(metrics_df.loc['A', 'FPR'] - metrics_df.loc['B', 'FPR'])
    print(f"\n2Ô∏è‚É£ Equalized Odds")
    print(f"   TPR difference: {tpr_diff:.3f}")
    print(f"   FPR difference: {fpr_diff:.3f}")
    print(f"   Status: {'‚úÖ PASS' if (tpr_diff < 0.1 and fpr_diff < 0.1) else '‚ùå FAIL'} (threshold: 0.1 each)")
    
    # 3. Predictive Parity
    ppv_diff = abs(metrics_df.loc['A', 'PPV (Precision)'] - metrics_df.loc['B', 'PPV (Precision)'])
    print(f"\n3Ô∏è‚É£ Predictive Parity")
    print(f"   PPV difference: {ppv_diff:.3f}")
    print(f"   Status: {'‚úÖ PASS' if ppv_diff < 0.1 else '‚ùå FAIL'} (threshold: 0.1)")
    
    # Summary
    print("\n" + "=" * 60)
    print("üí° Key Insight: It's mathematically impossible to satisfy all")
    print("   fairness criteria simultaneously (Fairness Impossibility Theorem)")

compare_fairness_definitions(fairness_metrics)

---
## Practice 3: Risk-Benefit Analysis

### üéØ Learning Objectives
- Assess AI system risks and benefits
- Create risk matrices
- Make deployment decisions

### üìñ Key Concepts
**Risk Matrix:** Maps likelihood vs impact to categorize risks  
**Benefit-Risk Ratio:** Weighs potential benefits against potential harms

In [None]:
# 3.1 Risk-Benefit Assessment
def assess_risk_benefit():
    """Perform systematic risk-benefit analysis"""
    
    # Define scenarios
    scenarios = {
        'Misdiagnosis': {'likelihood': 'Medium', 'impact': 'High', 'score': 6},
        'Privacy Breach': {'likelihood': 'Low', 'impact': 'High', 'score': 3},
        'Biased Recommendation': {'likelihood': 'High', 'impact': 'Medium', 'score': 6},
        'System Downtime': {'likelihood': 'Low', 'impact': 'Low', 'score': 1}
    }
    
    benefits = {
        'Early Detection': {'magnitude': 'High', 'score': 9},
        'Cost Reduction': {'magnitude': 'Medium', 'score': 6},
        'Accessibility': {'magnitude': 'High', 'score': 8},
        'Efficiency': {'magnitude': 'Medium', 'score': 5}
    }
    
    print("‚öñÔ∏è Risk-Benefit Analysis Matrix")
    print("=" * 60)
    
    # Calculate scores
    total_risk = sum(s['score'] for s in scenarios.values())
    total_benefit = sum(b['score'] for b in benefits.values())
    benefit_risk_ratio = total_benefit / total_risk
    
    print("\nüìä Risk Assessment:")
    for risk, details in scenarios.items():
        print(f"  ‚Ä¢ {risk:25s} | Likelihood: {details['likelihood']:6s} | Impact: {details['impact']:6s} | Score: {details['score']}")
    print(f"\n  Total Risk Score: {total_risk}")
    
    print("\nüíö Benefit Assessment:")
    for benefit, details in benefits.items():
        print(f"  ‚Ä¢ {benefit:25s} | Magnitude: {details['magnitude']:6s} | Score: {details['score']}")
    print(f"\n  Total Benefit Score: {total_benefit}")
    
    print("\n" + "=" * 60)
    print(f"üìà Benefit-Risk Ratio: {benefit_risk_ratio:.2f}")
    
    # Decision
    if benefit_risk_ratio > 2.0:
        decision = "‚úÖ RECOMMEND DEPLOYMENT"
        note = "Benefits significantly outweigh risks"
    elif benefit_risk_ratio > 1.5:
        decision = "‚ö†Ô∏è CONDITIONAL APPROVAL"
        note = "Deploy with strict monitoring and safeguards"
    elif benefit_risk_ratio > 1.0:
        decision = "üîç REQUIRE REVIEW"
        note = "Additional risk mitigation needed"
    else:
        decision = "‚ùå DO NOT DEPLOY"
        note = "Risks outweigh benefits"
    
    print(f"\nüéØ Decision: {decision}")
    print(f"   Rationale: {note}")
    
    return scenarios, benefits, benefit_risk_ratio

risks, benefits, ratio = assess_risk_benefit()

In [None]:
# 3.2 Visualize Risk Matrix
def plot_risk_matrix(scenarios):
    """Visualize risks on a likelihood-impact matrix"""
    
    # Map categories to numerical values
    likelihood_map = {'Low': 1, 'Medium': 2, 'High': 3}
    impact_map = {'Low': 1, 'Medium': 2, 'High': 3}
    
    fig, ax = plt.subplots(figsize=(10, 8))
    
    # Plot background zones
    # Low risk (green)
    ax.add_patch(plt.Rectangle((0.5, 0.5), 1, 1, color='green', alpha=0.2))
    # Medium risk (yellow)
    ax.add_patch(plt.Rectangle((1.5, 0.5), 1, 1, color='yellow', alpha=0.2))
    ax.add_patch(plt.Rectangle((0.5, 1.5), 1, 1, color='yellow', alpha=0.2))
    ax.add_patch(plt.Rectangle((1.5, 1.5), 1, 1, color='orange', alpha=0.2))
    # High risk (red)
    ax.add_patch(plt.Rectangle((2.5, 0.5), 1, 2, color='orange', alpha=0.2))
    ax.add_patch(plt.Rectangle((0.5, 2.5), 2, 1, color='red', alpha=0.2))
    ax.add_patch(plt.Rectangle((2.5, 2.5), 1, 1, color='red', alpha=0.3))
    
    # Plot risk points
    for risk_name, details in scenarios.items():
        x = likelihood_map[details['likelihood']]
        y = impact_map[details['impact']]
        ax.plot(x, y, 'ko', markersize=15)
        ax.text(x, y, risk_name, ha='center', va='center', 
                fontsize=9, fontweight='bold', color='white',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='black', alpha=0.7))
    
    ax.set_xlim(0.5, 3.5)
    ax.set_ylim(0.5, 3.5)
    ax.set_xticks([1, 2, 3])
    ax.set_yticks([1, 2, 3])
    ax.set_xticklabels(['Low', 'Medium', 'High'])
    ax.set_yticklabels(['Low', 'Medium', 'High'])
    ax.set_xlabel('Likelihood (Î∞úÏÉù Í∞ÄÎä•ÏÑ±)', fontsize=12, fontweight='bold')
    ax.set_ylabel('Impact (ÏòÅÌñ•ÎèÑ)', fontsize=12, fontweight='bold')
    ax.set_title('‚ö†Ô∏è Risk Assessment Matrix', fontsize=14, fontweight='bold', pad=20)
    ax.grid(True, alpha=0.3)
    
    # Add legend
    from matplotlib.patches import Patch
    legend_elements = [
        Patch(facecolor='green', alpha=0.3, label='Low Risk'),
        Patch(facecolor='yellow', alpha=0.3, label='Medium Risk'),
        Patch(facecolor='orange', alpha=0.3, label='High Risk'),
        Patch(facecolor='red', alpha=0.3, label='Critical Risk')
    ]
    ax.legend(handles=legend_elements, loc='upper left')
    
    plt.tight_layout()
    plt.show()

plot_risk_matrix(risks)

---
## Practice 4: Output Filtering Simulation

### üéØ Learning Objectives
- Implement basic content safety filters
- Understand multi-layer filtering
- Measure filter effectiveness

### üìñ Key Concepts
**Output Filtering:** Multi-stage validation before AI response delivery  
**Safety Guardrails:** Automated checks to prevent harmful outputs

In [None]:
# 4.1 Implement output filtering system
def output_filtering_system():
    """Simulate a multi-layer output filtering pipeline"""
    
    # Sample AI responses to filter
    test_responses = [
        "Based on your symptoms, I recommend taking aspirin and resting.",
        "You definitely have cancer. Start chemotherapy immediately.",
        "Patient John Doe (SSN: 123-45-6789) should increase dosage.",
        "Your condition seems mild. Consider consulting a doctor if it persists.",
        "Stop all medications now! They are poisoning you.",
        "Studies show that treatment A has 85% efficacy for this condition.",
        "Women are naturally worse at handling this type of condition.",
        "I suggest scheduling a follow-up appointment with your physician."
    ]
    
    print("üõ°Ô∏è Multi-Layer Output Filtering System")
    print("=" * 70)
    
    results = []
    
    for idx, response in enumerate(test_responses, 1):
        print(f"\nüìù Response {idx}: {response[:60]}...")
        
        passed_filters = True
        flags = []
        
        # Filter 1: Content Safety
        harmful_keywords = ['definitely', 'cancer', 'immediately', 'stop all', 'poisoning']
        if any(keyword in response.lower() for keyword in harmful_keywords):
            flags.append("‚ö†Ô∏è Filter 1 FAIL: Potentially harmful content")
            passed_filters = False
        else:
            flags.append("‚úÖ Filter 1 PASS: Content safety")
        
        # Filter 2: Privacy Protection
        import re
        if re.search(r'\b\d{3}-\d{2}-\d{4}\b', response):  # SSN pattern
            flags.append("‚ö†Ô∏è Filter 2 FAIL: Privacy violation (PII detected)")
            passed_filters = False
        else:
            flags.append("‚úÖ Filter 2 PASS: Privacy protection")
        
        # Filter 3: Bias Detection
        biased_terms = ['women are', 'men are', 'naturally worse', 'naturally better']
        if any(term in response.lower() for term in biased_terms):
            flags.append("‚ö†Ô∏è Filter 3 FAIL: Biased language detected")
            passed_filters = False
        else:
            flags.append("‚úÖ Filter 3 PASS: Bias detection")
        
        # Filter 4: Medical Accuracy (simplified)
        requires_disclaimer = any(word in response.lower() for word in ['recommend', 'should', 'treatment'])
        has_disclaimer = 'consult' in response.lower() or 'doctor' in response.lower() or 'physician' in response.lower()
        
        if requires_disclaimer and not has_disclaimer:
            flags.append("‚ö†Ô∏è Filter 4 FAIL: Medical advice without disclaimer")
            passed_filters = False
        else:
            flags.append("‚úÖ Filter 4 PASS: Medical accuracy & disclaimer")
        
        # Print results
        for flag in flags:
            print(f"   {flag}")
        
        if passed_filters:
            print("   üéØ DECISION: ‚úÖ APPROVED for output")
            results.append('PASS')
        else:
            print("   üéØ DECISION: ‚ùå BLOCKED from output")
            results.append('FAIL')
    
    # Summary
    pass_rate = (results.count('PASS') / len(results)) * 100
    print("\n" + "=" * 70)
    print(f"üìä Filtering Summary:")
    print(f"   Total responses: {len(results)}")
    print(f"   Approved: {results.count('PASS')} ({results.count('PASS')/len(results)*100:.1f}%)")
    print(f"   Blocked: {results.count('FAIL')} ({results.count('FAIL')/len(results)*100:.1f}%)")
    print(f"\n   üéØ System effectiveness: {100 - pass_rate:.1f}% harmful content blocked")
    
    return results

filter_results = output_filtering_system()

---
## üéØ Practice Complete!

### Summary of What We Learned:

1. **Bias Detection** üë•
   - Measured performance disparities across demographic groups
   - Calculated TPR, FPR, and PPV by group
   - Identified potential AI bias in medical predictions

2. **Fairness Metrics** ‚öñÔ∏è
   - Applied the 4/5ths rule (Disparate Impact Ratio)
   - Compared multiple fairness definitions
   - Understood fairness-accuracy tradeoffs

3. **Risk-Benefit Analysis** üìä
   - Created risk assessment matrices
   - Calculated benefit-risk ratios
   - Made evidence-based deployment decisions

4. **Output Filtering** üõ°Ô∏è
   - Implemented multi-layer safety filters
   - Detected harmful content, privacy violations, and bias
   - Measured filter effectiveness

### Key Takeaways:

‚úÖ **Constitutional AI requires continuous monitoring** - Bias and safety issues must be actively detected and mitigated

‚úÖ **No single fairness metric is perfect** - Different contexts require different fairness definitions

‚úÖ **Multi-layer defenses are essential** - Safety cannot rely on a single filtering mechanism

‚úÖ **Ethics must be built into the system** - Not added as an afterthought

### Next Steps:
- Implement more sophisticated bias mitigation techniques (reweighting, adversarial debiasing)
- Explore Red Teaming methodologies
- Study real-world case studies of AI failures and successes
- Practice with actual medical datasets (with proper ethical approval)

---

## üìö Additional Resources:

**Tools for Fairness Testing:**
- Fairlearn (Microsoft): https://fairlearn.org/
- AI Fairness 360 (IBM): https://aif360.mybluemix.net/
- What-If Tool (Google): https://pair-code.github.io/what-if-tool/

**Further Reading:**
- Beauchamp & Childress - Principles of Biomedical Ethics
- Anthropic's Constitutional AI paper
- FDA guidance on AI/ML in medical devices

---

**‚ú® Thank you for completing this hands-on practice! ‚ú®**