# Legal Compliance Metrics: Regulatory Fairness in AI Systems

## Executive Summary

**Legal Compliance Metrics** are fairness measures specifically designed to help organizations meet regulatory requirements and avoid discrimination lawsuits. These metrics focus on **selection-rate parity** across demographic groups, which is often how regulators initially screen for potential discrimination, though full legal analysis also considers job-relatedness and business necessity.

### Key Business Insights:
- **Regulatory Protection**: Designed to meet specific legal standards (80% Rule, EEOC guidelines)
- **Selection-Rate Focus**: Measures whether different groups receive positive outcomes at similar rates
- **Legal Defensibility**: Provides measurable evidence of non-discriminatory practices
- **Risk Mitigation**: Reduces exposure to discrimination lawsuits and regulatory penalties

### Deployment Recommendation:
**ESSENTIAL** - Legal Compliance metrics are mandatory for any AI system used in regulated contexts such as hiring, lending, insurance, or housing decisions.

## Understanding Legal Compliance Metrics

Legal compliance metrics emerged from decades of civil rights legislation and court decisions. They represent an initial screening tool for algorithmic fairness in regulated industries.

### The Two Core Legal Compliance Metrics:

#### 1. Disparate Impact
- **Legal Foundation**: Based on the "80% Rule" from EEOC Uniform Guidelines (1978)
- **Measurement**: Ratio of selection rates between groups
- **Threshold**: Protected group rate ‚â• 80% of majority group rate
- **Focus**: Employment law compliance

#### 2. Statistical Parity
- **Legal Foundation**: Equal treatment doctrine from civil rights law
- **Measurement**: Difference in positive prediction rates
- **Threshold**: Typically ¬±5% difference considered acceptable
- **Focus**: Broader anti-discrimination compliance

### Key Differences from Merit-Based Metrics:
| Aspect | Legal Compliance | Merit-Based |
|--------|------------------|-------------|
| **Focus** | Selection-rate parity | Fair outcomes for qualified |
| **Qualification Consideration** | Not directly measured | Central |
| **Legal Basis** | Specific statutes | General fairness principles |
| **Business Impact** | Risk reduction | Performance optimization |
| **Measurement** | Group-level rates | Individual-level fairness |

### Important Legal Nuance:
These metrics measure selection-rate parity, which regulators use as an **initial screen** for potential discrimination. However, under frameworks like Title VII and EEOC guidelines, differences in selection rates may be legally justified if the employer can demonstrate **job-relatedness and business necessity**. Think of these metrics as red flags that trigger deeper legal analysis, not absolute prohibitions.

### When Legal Compliance Metrics Are Required:
- **Employment decisions** (hiring, promotion, termination)
- **Credit and lending** (mortgages, loans, credit cards)
- **Insurance** (pricing, coverage decisions)
- **Housing** (rental, sales, zoning)
- **Education** (admissions, financial aid)
- **Healthcare** (treatment access, insurance coverage)

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
from jurity.fairness import BinaryFairnessMetrics
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

print("Libraries imported successfully!")
print("Ready to analyze Legal Compliance Metrics:")
print("‚Ä¢ Disparate Impact (80% Rule)")
print("‚Ä¢ Statistical Parity (Equal Treatment)")

## Data Loading and Preprocessing

We'll use the Adult Income dataset to demonstrate legal compliance metrics in a realistic employment context - the exact scenario where these metrics are legally required.

In [None]:
# Load the Adult Income dataset
print("=== LOADING ADULT INCOME DATASET ===")
print("Context: Employment income prediction (where legal compliance is mandatory)")
print()

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
column_names = ['age', 'workclass', 'fnlwgt', 'education', 'education_num', 
                'marital_status', 'occupation', 'relationship', 'race', 'sex', 
                'capital_gain', 'capital_loss', 'hours_per_week', 'native_country', 'income']

df = pd.read_csv(url, names=column_names, skipinitialspace=True)
print(f"Dataset shape: {df.shape}")
print(f"\nIncome distribution (target variable):")
print(df['income'].value_counts())

print(f"\nGender distribution (protected attribute):")
print(df['sex'].value_counts())

# Show cross-tabulation for legal compliance context
print(f"\nIncome by Gender (Legal Compliance Focus):")
income_by_gender = pd.crosstab(df['sex'], df['income'], margins=True)
print(income_by_gender)

# Calculate actual selection rates (important for legal compliance)
print(f"\nActual Selection Rates by Gender:")
female_rate = (df[df['sex'] == 'Female']['income'] == '>50K').mean()
male_rate = (df[df['sex'] == 'Male']['income'] == '>50K').mean()
print(f"Female selection rate: {female_rate:.3f} ({female_rate*100:.1f}%)")
print(f"Male selection rate: {male_rate:.3f} ({male_rate*100:.1f}%)")

# Calculate disparate impact in actual data
actual_disparate_impact = female_rate / male_rate
print(f"\nüìä Actual Disparate Impact: {actual_disparate_impact:.3f}")
print(f"80% Rule Compliance: {'‚úÖ PASS' if actual_disparate_impact >= 0.8 else '‚ùå FAIL'}")
print(f"(Threshold: Female rate must be ‚â•80% of male rate)")

df.head()

In [None]:
# Clean and preprocess data
print("=== DATA PREPROCESSING FOR LEGAL COMPLIANCE ANALYSIS ===")

# Handle missing values
df_clean = df.replace('?', np.nan).dropna()
print(f"After removing missing values: {df_clean.shape[0]} rows")

# Create binary variables for legal compliance analysis
df_clean['high_income'] = (df_clean['income'] == '>50K').astype(int)
df_clean['is_male'] = (df_clean['sex'] == 'Male').astype(int)

print(f"\nLegal Compliance Variables:")
print(f"‚Ä¢ Target: high_income (1 = >$50K, 0 = ‚â§$50K)")
print(f"‚Ä¢ Protected attribute: is_male (1 = Male, 0 = Female)")

# Select features for modeling (excluding protected attributes per legal best practice)
features = ['age', 'education_num', 'hours_per_week', 'capital_gain', 'capital_loss']
X = df_clean[features]
y = df_clean['high_income']
sensitive_attr = df_clean['is_male']

print(f"\nModel features: {features}")
print(f"Note: Gender is excluded from model features (legal best practice)")
print(f"Final dataset: {X.shape[0]} samples, {X.shape[1]} features")

# Show baseline statistics for legal compliance
baseline_stats = df_clean.groupby('sex')['high_income'].agg(['count', 'sum', 'mean']).round(3)
baseline_stats.columns = ['Total_Count', 'High_Income_Count', 'Selection_Rate']
print(f"\nBaseline Selection Rates:")
print(baseline_stats)

## Model Training and Evaluation

We'll train a model and then evaluate it using both legal compliance metrics to understand how they differ in practice.

In [None]:
# Train model for legal compliance evaluation
print("=== MODEL TRAINING ===")
print("Training employment prediction model (subject to legal compliance requirements)")

# Split data
X_train, X_test, y_train, y_test, sensitive_train, sensitive_test = train_test_split(
    X, y, sensitive_attr, test_size=0.3, random_state=42, stratify=y
)

print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")

# Train Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42, max_depth=10)
rf_model.fit(X_train, y_train)

# Make predictions
y_pred = rf_model.predict(X_test)
y_prob = rf_model.predict_proba(X_test)[:, 1]

# Model performance
accuracy = (y_pred == y_test).mean()
print(f"\nModel Performance:")
print(f"Overall accuracy: {accuracy:.3f}")

# Calculate prediction rates by gender (key for legal compliance)
female_mask = sensitive_test == 0
male_mask = sensitive_test == 1

female_pred_rate = y_pred[female_mask].mean()
male_pred_rate = y_pred[male_mask].mean()

print(f"\nPredicted Selection Rates:")
print(f"Female: {female_pred_rate:.3f} ({female_pred_rate*100:.1f}%)")
print(f"Male: {male_pred_rate:.3f} ({male_pred_rate*100:.1f}%)")

print(f"\nüìã Ready for Legal Compliance Analysis...")

## Legal Compliance Metrics Analysis

Now let's evaluate our model using both legal compliance metrics and understand what each tells us about regulatory risk.

In [None]:
# Calculate Legal Compliance Metrics using Jurity
print("=== LEGAL COMPLIANCE METRICS ANALYSIS ===")

bfm = BinaryFairnessMetrics()

# 1. Disparate Impact (80% Rule)
disparate_impact_score = bfm.DisparateImpact.get_score(
    predictions=y_pred,
    memberships=sensitive_test.values
)

# 2. Statistical Parity (Equal Treatment)
statistical_parity_score = bfm.StatisticalParity.get_score(
    predictions=y_pred,
    memberships=sensitive_test.values
)

print("\nüèõÔ∏è DISPARATE IMPACT ANALYSIS (80% Rule)")
print(f"Disparate Impact Score: {disparate_impact_score:.3f}")
print(f"Interpretation:")
print(f"‚Ä¢ Score represents Male/Female selection rate ratio")
print(f"‚Ä¢ Legal requirement: Female rate ‚â• 80% of Male rate")
print(f"‚Ä¢ Inverse ratio (Female/Male): {1/disparate_impact_score:.3f}")

# Legal compliance assessment for Disparate Impact
female_male_ratio = 1 / disparate_impact_score
if female_male_ratio >= 0.8:
    di_status = "‚úÖ LEGALLY COMPLIANT"
    di_risk = "LOW"
elif female_male_ratio >= 0.7:
    di_status = "‚ö†Ô∏è BORDERLINE - LEGAL REVIEW NEEDED"
    di_risk = "MODERATE"
else:
    di_status = "‚ùå NON-COMPLIANT - LEGAL RISK"
    di_risk = "HIGH"

print(f"\nLegal Assessment: {di_status}")
print(f"Legal Risk Level: {di_risk}")
print(f"Female/Male Ratio: {female_male_ratio:.3f} (minimum required: 0.800)")

print("\nüìä STATISTICAL PARITY ANALYSIS (Equal Treatment)")
print(f"Statistical Parity Score: {statistical_parity_score:.3f}")
print(f"Interpretation:")
print(f"‚Ä¢ Score represents Male - Female selection rate difference")
print(f"‚Ä¢ Positive = Male advantage, Negative = Female advantage")
print(f"‚Ä¢ Typical tolerance: ¬±0.05 (5 percentage points)")

# Legal compliance assessment for Statistical Parity
if abs(statistical_parity_score) <= 0.05:
    sp_status = "‚úÖ ACCEPTABLE DIFFERENCE"
    sp_risk = "LOW"
elif abs(statistical_parity_score) <= 0.1:
    sp_status = "‚ö†Ô∏è NOTABLE DIFFERENCE - MONITOR"
    sp_risk = "MODERATE"
else:
    sp_status = "‚ùå SIGNIFICANT DISPARITY"
    sp_risk = "HIGH"

print(f"\nAssessment: {sp_status}")
print(f"Risk Level: {sp_risk}")
print(f"Difference: {statistical_parity_score:+.3f} (tolerance: ¬±0.050)")

print(f"\nüéØ COMPARATIVE ANALYSIS")
print(f"Disparate Impact focuses on: Ratio compliance (80% rule)")
print(f"Statistical Parity focuses on: Absolute difference")
print(f"")
if di_risk == "LOW" and sp_risk == "LOW":
    overall_risk = "‚úÖ LOW LEGAL RISK"
elif di_risk == "HIGH" or sp_risk == "HIGH":
    overall_risk = "‚ùå HIGH LEGAL RISK"
else:
    overall_risk = "‚ö†Ô∏è MODERATE LEGAL RISK"
    
print(f"Overall Legal Risk Assessment: {overall_risk}")

## Detailed Comparison: Disparate Impact vs Statistical Parity

Let's dive deeper into how these two metrics differ in their mathematical approach and legal implications.

In [None]:
# Detailed comparison of legal compliance metrics
print("=== DETAILED METRIC COMPARISON ===")

# Manual calculations to show the mathematics
female_count = np.sum(sensitive_test == 0)
male_count = np.sum(sensitive_test == 1)
female_selected = np.sum(y_pred[sensitive_test == 0])
male_selected = np.sum(y_pred[sensitive_test == 1])

female_rate_manual = female_selected / female_count
male_rate_manual = male_selected / male_count

print("üìä RAW NUMBERS:")
print(f"Female group: {female_selected:,} selected out of {female_count:,} total ({female_rate_manual:.1%})")
print(f"Male group: {male_selected:,} selected out of {male_count:,} total ({male_rate_manual:.1%})")

print("\nüßÆ MATHEMATICAL CALCULATIONS:")

# Disparate Impact calculation
manual_di_ratio = male_rate_manual / female_rate_manual
manual_female_male_ratio = female_rate_manual / male_rate_manual

print(f"\n1. DISPARATE IMPACT (Ratio Method):")
print(f"   Formula: Minority_Rate / Majority_Rate")
print(f"   Calculation: {female_rate_manual:.3f} / {male_rate_manual:.3f} = {manual_female_male_ratio:.3f}")
print(f"   Jurity result: {1/disparate_impact_score:.3f}")
print(f"   Legal threshold: ‚â•0.800")
print(f"   Compliance: {'PASS' if manual_female_male_ratio >= 0.8 else 'FAIL'}")

# Statistical Parity calculation  
manual_sp_diff = male_rate_manual - female_rate_manual

print(f"\n2. STATISTICAL PARITY (Difference Method):")
print(f"   Formula: Majority_Rate - Minority_Rate")
print(f"   Calculation: {male_rate_manual:.3f} - {female_rate_manual:.3f} = {manual_sp_diff:+.3f}")
print(f"   Jurity result: {statistical_parity_score:+.3f}")
print(f"   Typical tolerance: ¬±0.050")
print(f"   Assessment: {'ACCEPTABLE' if abs(manual_sp_diff) <= 0.05 else 'CONCERNING'}")

print(f"\nüîç WHY THESE METRICS CAN GIVE DIFFERENT RESULTS:")

# Demonstrate scenarios where they disagree
print(f"\nCurrent scenario:")
print(f"‚Ä¢ Disparate Impact cares about: {manual_female_male_ratio:.3f} ‚â• 0.800")
print(f"‚Ä¢ Statistical Parity cares about: |{manual_sp_diff:.3f}| ‚â§ 0.050")

# Show example scenarios
print(f"\nüìã EXAMPLE SCENARIOS WHERE THEY DISAGREE:")
scenarios = [
    {"female_rate": 0.12, "male_rate": 0.15, "context": "Low base rates"},
    {"female_rate": 0.40, "male_rate": 0.50, "context": "High base rates"},
    {"female_rate": 0.08, "male_rate": 0.10, "context": "Very low base rates"}
]

for scenario in scenarios:
    f_rate = scenario["female_rate"]
    m_rate = scenario["male_rate"]
    di_ratio = f_rate / m_rate
    sp_diff = m_rate - f_rate
    
    print(f"\n{scenario['context']} (F:{f_rate:.2f}, M:{m_rate:.2f}):")
    print(f"  Disparate Impact: {di_ratio:.3f} ({'PASS' if di_ratio >= 0.8 else 'FAIL'})")
    print(f"  Statistical Parity: {sp_diff:+.3f} ({'OK' if abs(sp_diff) <= 0.05 else 'FAIL'})")
    
    if (di_ratio >= 0.8) != (abs(sp_diff) <= 0.05):
        print(f"  ‚ö†Ô∏è METRICS DISAGREE!")

## Comprehensive Legal Compliance Visualization

Let's create visualizations that clearly show how our model performs on both legal compliance metrics.

In [None]:
# Create comprehensive legal compliance visualization
fig, axes = plt.subplots(3, 3, figsize=(18, 16))
fig.suptitle('Legal Compliance Metrics: Comprehensive Analysis Dashboard', fontsize=16, fontweight='bold')

# 1. Selection Rates by Gender
selection_data = pd.DataFrame({
    'Gender': ['Female', 'Male'],
    'Selection_Rate': [female_rate_manual, male_rate_manual],
    'Count_Selected': [female_selected, male_selected],
    'Total_Count': [female_count, male_count]
})

bars = axes[0, 0].bar(selection_data['Gender'], selection_data['Selection_Rate'], 
                      color=['pink', 'lightblue'], alpha=0.7)
axes[0, 0].set_title('Selection Rates by Gender')
axes[0, 0].set_ylabel('Selection Rate')
axes[0, 0].set_ylim(0, max(selection_data['Selection_Rate']) * 1.2)

# Add value labels and counts
for i, (rate, selected, total) in enumerate(zip(selection_data['Selection_Rate'], 
                                                 selection_data['Count_Selected'],
                                                 selection_data['Total_Count'])):
    axes[0, 0].text(i, rate + 0.01, f'{rate:.3f}\n({selected}/{total})', 
                    ha='center', va='bottom', fontweight='bold')

# 2. Disparate Impact Visualization
di_threshold = 0.8
current_di = manual_female_male_ratio

# Create disparate impact gauge
di_values = [current_di, di_threshold, 1.0]
di_labels = [f'Current\n{current_di:.3f}', f'Threshold\n{di_threshold:.3f}', 'Perfect\n1.000']
di_colors = ['red' if current_di < di_threshold else 'green', 'orange', 'blue']

bars = axes[0, 1].bar(di_labels, di_values, color=di_colors, alpha=0.7)
axes[0, 1].set_title('Disparate Impact (80% Rule)')
axes[0, 1].set_ylabel('Female/Male Rate Ratio')
axes[0, 1].axhline(y=di_threshold, color='red', linestyle='--', alpha=0.7, label='Legal Minimum')
axes[0, 1].legend()

for bar, value in zip(bars, di_values):
    axes[0, 1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
                    f'{value:.3f}', ha='center', va='bottom', fontweight='bold')

# 3. Statistical Parity Visualization
sp_tolerance = 0.05
current_sp = abs(manual_sp_diff)

sp_values = [current_sp, sp_tolerance]
sp_labels = [f'Current Diff\n{manual_sp_diff:+.3f}', f'Tolerance\n¬±{sp_tolerance:.3f}']
sp_colors = ['red' if current_sp > sp_tolerance else 'green', 'orange']

bars = axes[0, 2].bar(sp_labels, [current_sp, sp_tolerance], color=sp_colors, alpha=0.7)
axes[0, 2].set_title('Statistical Parity (Equal Treatment)')
axes[0, 2].set_ylabel('Absolute Rate Difference')
axes[0, 2].axhline(y=sp_tolerance, color='red', linestyle='--', alpha=0.7, label='Typical Tolerance')
axes[0, 2].legend()

# 4. Legal Risk Assessment Matrix
risk_matrix = pd.DataFrame({
    'Metric': ['Disparate Impact', 'Statistical Parity'],
    'Current_Value': [current_di, abs(manual_sp_diff)],
    'Threshold': [di_threshold, sp_tolerance],
    'Status': ['PASS' if current_di >= di_threshold else 'FAIL',
              'PASS' if current_sp <= sp_tolerance else 'FAIL']
})

# Create status visualization
status_colors = ['green' if status == 'PASS' else 'red' for status in risk_matrix['Status']]
bars = axes[1, 0].bar(risk_matrix['Metric'], [1, 1], color=status_colors, alpha=0.7)
axes[1, 0].set_title('Legal Compliance Status')
axes[1, 0].set_ylabel('Compliance Status')
axes[1, 0].set_ylim(0, 1.2)

for i, (metric, status) in enumerate(zip(risk_matrix['Metric'], risk_matrix['Status'])):
    axes[1, 0].text(i, 0.5, status, ha='center', va='center', 
                    fontweight='bold', fontsize=14, color='white')

# 5. Historical Context (Actual vs Predicted)
comparison_data = pd.DataFrame({
    'Scenario': ['Actual Data', 'Model Predictions'],
    'Female_Rate': [female_rate, female_rate_manual],
    'Male_Rate': [male_rate, male_rate_manual],
    'Disparate_Impact': [female_rate/male_rate, manual_female_male_ratio]
})

x = np.arange(len(comparison_data['Scenario']))
width = 0.35

axes[1, 1].bar(x - width/2, comparison_data['Female_Rate'], width, 
               label='Female Rate', color='pink', alpha=0.7)
axes[1, 1].bar(x + width/2, comparison_data['Male_Rate'], width, 
               label='Male Rate', color='lightblue', alpha=0.7)
axes[1, 1].set_title('Actual Data vs Model Predictions')
axes[1, 1].set_ylabel('Selection Rate')
axes[1, 1].set_xticks(x)
axes[1, 1].set_xticklabels(comparison_data['Scenario'])
axes[1, 1].legend()

# 6. Metric Sensitivity Analysis
# Show how small changes in rates affect compliance
rate_changes = np.arange(-0.02, 0.025, 0.005)
di_impacts = []
sp_impacts = []

for change in rate_changes:
    new_female_rate = female_rate_manual + change
    new_di = new_female_rate / male_rate_manual
    new_sp = abs(male_rate_manual - new_female_rate)
    di_impacts.append(new_di)
    sp_impacts.append(new_sp)

axes[1, 2].plot(rate_changes, di_impacts, 'b-', label='Disparate Impact', linewidth=2)
axes[1, 2].axhline(y=0.8, color='blue', linestyle='--', alpha=0.7)
axes[1, 2].set_title('Sensitivity to Female Rate Changes')
axes[1, 2].set_xlabel('Change in Female Rate')
axes[1, 2].set_ylabel('Disparate Impact Ratio')
axes[1, 2].legend()
axes[1, 2].grid(True, alpha=0.3)

# 7. Legal Framework Comparison
framework_data = pd.DataFrame({
    'Framework': ['80% Rule\n(EEOC)', 'Equal Treatment\n(Civil Rights)'],
    'Current_Score': [current_di, abs(manual_sp_diff)],
    'Compliance': ['PASS' if current_di >= 0.8 else 'FAIL',
                   'PASS' if abs(manual_sp_diff) <= 0.05 else 'FAIL']
})

colors = ['green' if comp == 'PASS' else 'red' for comp in framework_data['Compliance']]
bars = axes[2, 0].bar(framework_data['Framework'], framework_data['Current_Score'], 
                      color=colors, alpha=0.7)
axes[2, 0].set_title('Legal Framework Compliance')
axes[2, 0].set_ylabel('Metric Value')

for i, (score, comp) in enumerate(zip(framework_data['Current_Score'], framework_data['Compliance'])):
    axes[2, 0].text(i, score + 0.01, f'{score:.3f}\n{comp}', 
                    ha='center', va='bottom', fontweight='bold')

# 8. Business Risk Assessment
risk_factors = ['Regulatory Fines', 'Lawsuits', 'Reputation', 'Compliance Costs']
risk_levels = [1 if di_risk == 'LOW' and sp_risk == 'LOW' else 
               2 if di_risk == 'MODERATE' or sp_risk == 'MODERATE' else 3
               for _ in risk_factors]

risk_colors = ['green' if r == 1 else 'orange' if r == 2 else 'red' for r in risk_levels]
bars = axes[2, 1].bar(risk_factors, risk_levels, color=risk_colors, alpha=0.7)
axes[2, 1].set_title('Business Risk Assessment')
axes[2, 1].set_ylabel('Risk Level (1=Low, 3=High)')
axes[2, 1].tick_params(axis='x', rotation=45)
axes[2, 1].set_ylim(0, 3.5)

# 9. Overall Compliance Dashboard
overall_status = "COMPLIANT" if current_di >= 0.8 and abs(manual_sp_diff) <= 0.05 else "NON-COMPLIANT"
overall_color = 'green' if overall_status == "COMPLIANT" else 'red'

axes[2, 2].text(0.5, 0.7, 'Legal Compliance', ha='center', va='center', 
                fontsize=18, fontweight='bold', transform=axes[2, 2].transAxes)
axes[2, 2].text(0.5, 0.5, overall_status, ha='center', va='center', 
                fontsize=24, fontweight='bold', color=overall_color,
                transform=axes[2, 2].transAxes)
axes[2, 2].text(0.5, 0.3, f'DI: {current_di:.3f}\nSP: {abs(manual_sp_diff):.3f}', 
                ha='center', va='center', fontsize=12,
                transform=axes[2, 2].transAxes)
axes[2, 2].text(0.5, 0.1, 'Both metrics must pass', ha='center', va='center', 
                fontsize=10, style='italic', transform=axes[2, 2].transAxes)
axes[2, 2].set_xlim(0, 1)
axes[2, 2].set_ylim(0, 1)
axes[2, 2].axis('off')

plt.tight_layout()
plt.show()

## Legal Risk Assessment and Business Impact

Let's analyze the business implications of our legal compliance findings and provide concrete recommendations.

In [None]:
# Comprehensive legal risk assessment
print("="*80)
print("LEGAL COMPLIANCE RISK ASSESSMENT & BUSINESS IMPACT ANALYSIS")
print("="*80)

print(f"REGULATORY CONTEXT: Employment Income Prediction")
print(f"APPLICABLE LAWS: Title VII, EEOC Guidelines, Equal Pay Act")
print(f"PROTECTED CHARACTERISTIC: Gender (Male vs Female)")
print(f"SAMPLE SIZE: {len(y_test):,} employment decisions")

print(f"\n" + "="*60)
print(f"LEGAL COMPLIANCE RESULTS")
print(f"="*60)

print(f"\nüèõÔ∏è DISPARATE IMPACT (80% RULE - EEOC STANDARD):")
print(f"   Current Ratio: {manual_female_male_ratio:.3f}")
print(f"   Legal Minimum: 0.800")
print(f"   Status: {di_status}")
print(f"   ")
print(f"   Raw Numbers:")
print(f"   ‚Ä¢ Female selection rate: {female_rate_manual:.1%} ({female_selected:,} of {female_count:,})")
print(f"   ‚Ä¢ Male selection rate: {male_rate_manual:.1%} ({male_selected:,} of {male_count:,})")
print(f"   ‚Ä¢ Gap analysis: Female rate is {female_male_ratio*100:.1f}% of male rate")

print(f"\nüìä STATISTICAL PARITY (EQUAL TREATMENT STANDARD):")
print(f"   Current Difference: {manual_sp_diff:+.3f}")
print(f"   Typical Tolerance: ¬±0.050")
print(f"   Status: {sp_status}")
print(f"   ")
print(f"   Interpretation:")
print(f"   ‚Ä¢ Males have {abs(manual_sp_diff)*100:.1f}pp higher selection rate")
print(f"   ‚Ä¢ Difference of {abs(manual_sp_diff)*100:.1f}pp {'exceeds' if abs(manual_sp_diff) > 0.05 else 'is within'} typical tolerance")

print(f"\n" + "="*60)
print(f"BUSINESS RISK ANALYSIS")
print(f"="*60)

# Calculate business impact metrics
total_decisions = len(y_test)
affected_females = int(female_count * abs(manual_sp_diff)) if manual_sp_diff > 0 else 0
potential_liability = affected_females * 50000  # Rough estimate of per-person liability

print(f"\nüí∞ FINANCIAL RISK ASSESSMENT:")
if di_risk == "HIGH" or sp_risk == "HIGH":
    financial_risk = "SIGNIFICANT"
    cost_range = "$500K - $5M+"
    action_urgency = "IMMEDIATE"
elif di_risk == "MODERATE" or sp_risk == "MODERATE":
    financial_risk = "MODERATE"
    cost_range = "$100K - $500K"
    action_urgency = "WITHIN 30 DAYS"
else:
    financial_risk = "LOW"
    cost_range = "<$100K"
    action_urgency = "ROUTINE MONITORING"

print(f"   ‚Ä¢ Overall Financial Risk: {financial_risk}")
print(f"   ‚Ä¢ Estimated Cost Range: {cost_range}")
print(f"   ‚Ä¢ Potentially Affected Individuals: ~{affected_females:,}")
print(f"   ‚Ä¢ Action Timeline: {action_urgency}")

print(f"\n‚öñÔ∏è LEGAL RISK FACTORS:")
risk_factors = {
    "Class Action Lawsuit": "HIGH" if di_risk == "HIGH" else "MODERATE" if di_risk == "MODERATE" else "LOW",
    "EEOC Investigation": "HIGH" if female_male_ratio < 0.7 else "MODERATE" if female_male_ratio < 0.8 else "LOW",
    "Regulatory Fines": "MODERATE" if di_risk != "LOW" or sp_risk != "LOW" else "LOW",
    "Reputation Damage": "HIGH" if overall_status == "NON-COMPLIANT" else "LOW",
    "Regulatory Scrutiny": "MODERATE" if di_risk != "LOW" else "LOW"
}

for factor, risk in risk_factors.items():
    emoji = "üö®" if risk == "HIGH" else "‚ö†Ô∏è" if risk == "MODERATE" else "‚úÖ"
    print(f"   ‚Ä¢ {factor}: {emoji} {risk}")

print(f"\n" + "="*60)
print(f"STRATEGIC RECOMMENDATIONS")
print(f"="*60)

print(f"\nüìã IMMEDIATE ACTIONS (Required):")
if overall_status == "NON-COMPLIANT":
    print(f"   1. üö® HALT DEPLOYMENT - Model fails legal compliance")
    print(f"   2. üö® Legal review required before any production use")
    print(f"   3. üö® Implement bias mitigation techniques:")
    print(f"      ‚Ä¢ Threshold adjustment for demographic parity")
    print(f"      ‚Ä¢ Post-processing calibration")
    print(f"      ‚Ä¢ Training data rebalancing")
    print(f"   4. üö® Document compliance efforts for legal protection")
else:
    print(f"   1. ‚úÖ Model passes basic legal compliance")
    print(f"   2. ‚úÖ Implement enhanced monitoring system")
    print(f"   3. ‚úÖ Establish monthly compliance reporting")
    print(f"   4. ‚úÖ Train HR team on bias monitoring")

print(f"\nüîÑ ONGOING MONITORING STRATEGY:")
print(f"   ‚Ä¢ Disparate Impact: Monitor monthly, alert if ratio < 0.85")
print(f"   ‚Ä¢ Statistical Parity: Monitor weekly, alert if difference > 0.03")
print(f"   ‚Ä¢ Audit trail: Log all predictions with demographic breakdowns")
print(f"   ‚Ä¢ Legal review: Quarterly assessment with employment attorney")

print(f"\nüíº BUSINESS PROCESS INTEGRATION:")
print(f"   ‚Ä¢ HR Training: Legal compliance requirements for AI systems")
print(f"   ‚Ä¢ Executive Reporting: Monthly legal risk dashboard")
print(f"   ‚Ä¢ Vendor Management: Ensure third-party AI tools meet standards")
print(f"   ‚Ä¢ Documentation: Maintain compliance records for regulatory audits")

print(f"\nüéì LEGAL COMPLIANCE BEST PRACTICES:")
print(f"   1. Always test BOTH disparate impact and statistical parity")
print(f"   2. Document business justification for any AI hiring tools")
print(f"   3. Provide alternative selection procedures if adverse impact exists")
print(f"   4. Regular validation studies to show job relevance")
print(f"   5. Employee training on unconscious bias and fair hiring")

print(f"\n" + "="*80)
print(f"CONCLUSION: {'‚úÖ PROCEED WITH MONITORING' if overall_status == 'COMPLIANT' else '‚ùå COMPLIANCE WORK REQUIRED'}")
print(f"")
if overall_status == "COMPLIANT":
    print(f"This model demonstrates acceptable legal compliance for employment")
    print(f"decisions. Implement robust monitoring to maintain compliance and")
    print(f"protect against regulatory risk.")
else:
    print(f"This model requires immediate attention to address legal compliance")
    print(f"issues. Do not deploy until disparate impact and statistical parity")
    print(f"metrics meet acceptable thresholds.")
print(f"="*80)

## Key Insights: Legal Compliance vs Merit-Based Fairness

### Understanding the Fundamental Difference:

**Legal Compliance Metrics (Disparate Impact & Statistical Parity):**
- Focus on **selection-rate parity** across demographic groups
- Serve as initial screening tools for potential discrimination
- Required by law in many jurisdictions
- Important legal nuance: Differences in selection rates may be legally justified through job-relatedness and business necessity
- Designed to prevent discrimination in historically biased systems

**Merit-Based Metrics (Equal Opportunity, Average Odds):**
- Focus on **fair outcomes** for qualified individuals
- Explicitly allow different overall rates if justified by qualification differences
- Optimize for both accuracy and fairness
- Better align with business performance goals

### The Legal Framework Reality:

Legal compliance metrics provide a **statistical trigger** for further analysis. Under U.S. employment law (Title VII, EEOC guidelines), failing the 80% rule doesn't automatically mean illegal discrimination‚Äîbut it does shift the burden to the employer to demonstrate that the selection criteria are:
1. **Job-related** - Connected to actual job requirements
2. **Business necessity** - Essential for safe and efficient operations
3. **Validated** - Shown to predict job performance

Similarly, passing these metrics doesn't guarantee legal safety if the selection process is otherwise discriminatory.

### When Each Type Matters:

#### Use Legal Compliance Metrics When:
- ‚úÖ **Regulatory requirement** (hiring, lending, housing)
- ‚úÖ **High legal risk** environment
- ‚úÖ **Historical discrimination** in the domain
- ‚úÖ **Public sector** or government contracting
- ‚úÖ **Initial fairness screening** before deployment

#### Use Merit-Based Metrics When:
- ‚úÖ **Performance optimization** is critical
- ‚úÖ **Clear qualification standards** exist and can be defended
- ‚úÖ **Internal business decisions** (less regulated)
- ‚úÖ **Innovation and competitiveness** are priorities
- ‚úÖ **Job-relatedness** can be demonstrated

### Best Practice:
**Monitor BOTH types simultaneously** - Legal compliance metrics for regulatory screening and risk management, merit-based metrics for business optimization and performance fairness. This dual approach provides comprehensive fairness coverage while managing legal risk.

### Business Impact:
- **Legal Protection**: Reduces lawsuit and regulatory risk by catching statistical red flags
- **Stakeholder Trust**: Demonstrates commitment to non-discrimination
- **Operational Clarity**: Provides clear initial pass/fail criteria
- **Industry Standards**: Aligns with established legal frameworks
- **Defensibility**: Enables evidence-based justification when needed

## Appendix: Legal Resources and Documentation

### Legal Framework References:
- **EEOC Uniform Guidelines on Employee Selection (1978)**: Establishes the 80% rule for disparate impact
- **Griggs v. Duke Power Co. (1971)**: Supreme Court case establishing disparate impact doctrine
- **Title VII of Civil Rights Act (1964)**: Prohibits employment discrimination
- **Equal Credit Opportunity Act (1974)**: Prohibits credit discrimination

### Technical Implementation:
- **Jurity Library Documentation**: [Disparate Impact](https://jurity.readthedocs.io/en/latest/fairness_metrics.html#disparate-impact) and [Statistical Parity](https://jurity.readthedocs.io/en/latest/fairness_metrics.html#statistical-parity)
- **NIST AI Risk Management Framework**: Guidelines for responsible AI deployment
- **IEEE Standards for AI Systems**: Technical standards for algorithmic accountability

### Industry Guidelines:
- **Partnership on AI**: Best practices for fair and beneficial AI
- **Aequitas Toolkit**: Open-source bias audit toolkit
- **Google AI Principles**: Responsible AI development guidelines

### Compliance Monitoring:
- Document all bias testing and mitigation efforts
- Maintain audit trails for regulatory review
- Regular validation studies to demonstrate business necessity
- Legal review of AI systems before deployment

**Disclaimer**: This analysis is for educational purposes only. Consult with employment attorneys and compliance experts for specific legal guidance in your jurisdiction and industry.