# 127: Model Governance & Compliance - Model Cards, Bias Detection, and Regulatory Standards

## üéØ Learning Objectives

By the end of this notebook, you will:
- **Understand** model governance framework for transparency, fairness, and compliance
- **Implement** model cards for comprehensive model documentation (Google's ML transparency standard)
- **Detect** bias in ML models using fairness metrics (demographic parity, equalized odds, disparate impact)
- **Apply** explainability techniques to semiconductor test models (SHAP, LIME for parametric predictions)
- **Build** compliance frameworks for regulated industries (GDPR, CCPA, FDA, SOX, industry-specific regulations)
- **Monitor** model governance metrics and audit trails

## üìö What is Model Governance?

**Model governance** is the set of **policies, processes, and controls** ensuring ML models are developed, deployed, and monitored **ethically, transparently, and in compliance** with regulations. It answers: Who built this model? What data was used? How accurate is it? Is it fair? Who approved deployment?

**Why Model Governance?**
- ‚úÖ **Regulatory compliance**: GDPR (right to explanation), CCPA (data privacy), FDA (medical device ML), SOX (financial models)
- ‚úÖ **Risk management**: Prevent discriminatory models, detect bias before deployment, audit trail for incidents
- ‚úÖ **Trust and transparency**: Stakeholders understand model decisions (engineering, legal, customers)
- ‚úÖ **Reproducibility**: Document training data, hyperparameters, validation metrics (recreate model from scratch)

**Governance Components:**
- **Model Cards**: Standardized documentation (purpose, performance, limitations, fairness, ethical considerations)
- **Bias Detection**: Statistical tests for demographic parity, equalized odds, disparate impact ratios
- **Explainability**: SHAP values, LIME, feature importance (understand which features drive predictions)
- **Audit Logs**: Track who trained, deployed, accessed model (compliance, security, debugging)
- **Approval Workflows**: Multi-level approval (data scientist ‚Üí ML engineer ‚Üí compliance ‚Üí deployment)

## üè≠ Post-Silicon Validation Use Cases

### **Use Case 1: Yield Prediction Model Card for Regulatory Compliance**
**Input:** Yield prediction model used for wafer disposition (ship vs scrap decisions worth $50K-$500K per lot)  
**Output:** Model card documenting training data, accuracy metrics, limitations, approval chain  
**Value:** $3.2M/year from avoiding compliance violations (SOX requires audit trail for financial decisions, model card provides documentation)

### **Use Case 2: Bias Detection in Test Coverage Optimization**
**Input:** ML model recommending which tests to skip for faster test time (adaptive testing)  
**Problem:** Model skips critical tests for certain device types (bias against edge cases)  
**Output:** Fairness metrics detect disparate impact (some device families under-tested), model retrained with balanced sampling  
**Value:** $2.5M/year from preventing field failures (catch defects before shipping, avoid recalls)

### **Use Case 3: SHAP Explainability for Parametric Outlier Detection**
**Input:** Random Forest flagging parametric test results as outliers (voltage, current, frequency anomalies)  
**Problem:** Engineers don't trust black-box model (why was this flagged? which parameter out of spec?)  
**Output:** SHAP values explain each prediction (e.g., "voltage 3.5V vs expected 3.3V contributed 80% to outlier score")  
**Value:** $1.9M/year from faster root cause analysis (engineers debug test failures 50% faster with explanations)

### **Use Case 4: Audit Trail for Model Deployment Approvals**
**Input:** Production ML models deployed without approval tracking (who authorized deployment? when? based on what validation?)  
**Output:** Governance system logs all model deployments with approvals, validation metrics, rollback capability  
**Value:** $1.6M/year from risk mitigation (prevent unapproved model deployments, compliance with internal controls)

**Total Post-Silicon Value:** $3.2M + $2.5M + $1.9M + $1.6M = **$9.2M/year**

## üîÑ Model Governance Workflow

```mermaid
graph LR
    A[üèãÔ∏è Model Training] --> B[üìÑ Create Model Card]
    B --> C[üîç Bias Detection]
    C --> D{Fairness Pass?}
    D -->|No| E[‚ùå Reject Model]
    D -->|Yes| F[üí° Explainability Analysis]
    
    F --> G[üìä Validation Metrics]
    G --> H[‚úÖ Approval Workflow]
    H --> I{Approved?}
    I -->|No| E
    I -->|Yes| J[üöÄ Deploy to Production]
    
    J --> K[üìà Monitor Governance Metrics]
    K --> L{Compliance Issue?}
    L -->|Yes| M[‚ö†Ô∏è Alert Compliance Team]
    L -->|No| N[‚úÖ Audit Log Update]
    
    E --> O[üìß Notify Team]
    M --> O
    
    style A fill:#e1f5ff
    style J fill:#e1ffe1
    style E fill:#ffe1e1
    style D fill:#fff4e1
    style I fill:#fff4e1
```

## üìä Learning Path Context

**Prerequisites:**
- **Notebook 104: Model Interpretability & Explainability** - SHAP, LIME techniques for model explanation
- **Notebook 125: ML Testing & Validation** - Validation frameworks for model quality gates

**Next Steps:**
- **Notebook 128: Shadow Mode Deployment** - Safe deployment with governance checks
- **Notebook 129: Advanced MLOps - Feature Stores** - Data governance for feature engineering

---

Let's build trustworthy ML systems with governance! üöÄ

## 1. Setup & Installation

**Note**: Model governance requires fairness libraries (Fairlearn, AIF360) and explainability tools (SHAP, LIME).

In [None]:
# Install governance and fairness libraries
# !pip install scikit-learn pandas numpy fairlearn shap matplotlib seaborn

import numpy as np
import pandas as pd
from datetime import datetime
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import json
import warnings
warnings.filterwarnings('ignore')

print("Model governance libraries loaded")
print("Focus: Model cards, bias detection, compliance, explainability")

## 2. Model Cards - Documentation Standard

**Purpose:** Implement model cards for comprehensive model documentation (Google's standard for ML transparency).

**Key Points:**
- **Model cards**: Structured documentation describing model purpose, performance, limitations, fairness
- **Sections**: Model details, intended use, metrics, training data, ethical considerations, caveats
- **Audience**: Technical (data scientists), business (executives), compliance (auditors)
- **Living document**: Update with each model version, retraining, or performance change

**Why This Matters:** Model cards provide transparency, enable informed decisions, and satisfy regulatory requirements (EU AI Act requires documentation).

In [None]:
class ModelCard:
    """
    Model card generator following Google's model cards specification.
    
    Provides structured documentation for ML models including:
    - Model details (architecture, version, training)
    - Intended use cases and limitations
    - Performance metrics (overall and per-group)
    - Training data characteristics
    - Ethical considerations and fairness
    """
    
    def __init__(self, model_name, model_version, use_case):
        self.model_name = model_name
        self.model_version = model_version
        self.use_case = use_case
        self.created_date = datetime.now().strftime('%Y-%m-%d')
        self.card_data = {}
        
    def add_model_details(self, model_type, developer, training_date, framework):
        """Add technical model details."""
        self.card_data['model_details'] = {
            'name': self.model_name,
            'version': self.model_version,
            'type': model_type,
            'developer': developer,
            'training_date': training_date,
            'framework': framework,
            'license': 'Proprietary'
        }
        
    def add_intended_use(self, primary_uses, out_of_scope_uses, users):
        """Document intended and prohibited use cases."""
        self.card_data['intended_use'] = {
            'primary_uses': primary_uses,
            'out_of_scope': out_of_scope_uses,
            'primary_users': users
        }
        
    def add_metrics(self, overall_metrics, per_group_metrics=None):
        """
        Add performance metrics.
        
        Args:
            overall_metrics: dict of overall performance (accuracy, F1, etc.)
            per_group_metrics: dict of metrics broken down by sensitive groups
        """
        self.card_data['metrics'] = {
            'overall': overall_metrics,
            'per_group': per_group_metrics or {}
        }
        
    def add_training_data(self, data_source, size, time_period, features):
        """Document training data characteristics."""
        self.card_data['training_data'] = {
            'source': data_source,
            'size': size,
            'time_period': time_period,
            'features': features
        }
        
    def add_evaluation_data(self, data_source, size, time_period):
        """Document evaluation/test data."""
        self.card_data['evaluation_data'] = {
            'source': data_source,
            'size': size,
            'time_period': time_period
        }
        
    def add_ethical_considerations(self, fairness_assessment, bias_mitigation, privacy_measures):
        """Document ethical considerations and mitigation strategies."""
        self.card_data['ethical_considerations'] = {
            'fairness_assessment': fairness_assessment,
            'bias_mitigation': bias_mitigation,
            'privacy_measures': privacy_measures
        }
        
    def add_caveats_and_recommendations(self, limitations, recommendations):
        """Document model limitations and usage recommendations."""
        self.card_data['caveats'] = {
            'limitations': limitations,
            'recommendations': recommendations
        }
        
    def export_json(self, filepath=None):
        """Export model card as JSON."""
        card_json = {
            'model_card': {
                'created': self.created_date,
                'use_case': self.use_case,
                **self.card_data
            }
        }
        
        if filepath:
            with open(filepath, 'w') as f:
                json.dump(card_json, f, indent=2)
            print(f\"‚úÖ Model card exported to {filepath}")
        
        return card_json
    
    def generate_markdown(self):
        """Generate human-readable Markdown model card."""
        md = f\"# Model Card: {self.model_name} (v{self.model_version})\\n\\n\"\n        md += f\"**Created**: {self.created_date}\\n\"\n        md += f\"**Use Case**: {self.use_case}\\n\\n\"\n        \n        # Model Details\n        if 'model_details' in self.card_data:\n            md += \"## Model Details\\n\\n\"\n            details = self.card_data['model_details']\n            for key, value in details.items():\n                md += f\"- **{key.replace('_', ' ').title()}**: {value}\\n\"\n            md += \"\\n\"\n        \n        # Intended Use\n        if 'intended_use' in self.card_data:\n            md += \"## Intended Use\\n\\n\"\n            intended = self.card_data['intended_use']\n            md += \"**Primary Uses:**\\n\"\n            for use in intended['primary_uses']:\n                md += f\"- {use}\\n\"\n            md += \"\\n**Out of Scope:**\\n\"\n            for use in intended['out_of_scope']:\n                md += f\"- {use}\\n\"\n            md += \"\\n\"\n        \n        # Metrics\n        if 'metrics' in self.card_data:\n            md += \"## Performance Metrics\\n\\n\"\n            md += \"**Overall Performance:**\\n\"\n            for metric, value in self.card_data['metrics']['overall'].items():\n                md += f\"- {metric}: {value}\\n\"\n            \n            if self.card_data['metrics']['per_group']:\n                md += \"\\n**Per-Group Performance:**\\n\"\n                for group, metrics in self.card_data['metrics']['per_group'].items():\n                    md += f\"\\n*{group}*:\\n\"\n                    for metric, value in metrics.items():\n                        md += f\"- {metric}: {value}\\n\"\n            md += \"\\n\"\n        \n        # Ethical Considerations\n        if 'ethical_considerations' in self.card_data:\n            md += \"## Ethical Considerations\\n\\n\"\n            ethical = self.card_data['ethical_considerations']\n            md += f\"**Fairness**: {ethical['fairness_assessment']}\\n\\n\"\n            md += f\"**Bias Mitigation**: {ethical['bias_mitigation']}\\n\\n\"\n            md += f\"**Privacy**: {ethical['privacy_measures']}\\n\\n\"\n        \n        # Caveats\n        if 'caveats' in self.card_data:\n            md += \"## Limitations & Recommendations\\n\\n\"\n            md += \"**Limitations:**\\n\"\n            for limitation in self.card_data['caveats']['limitations']:\n                md += f\"- {limitation}\\n\"\n            md += \"\\n**Recommendations:**\\n\"\n            for rec in self.card_data['caveats']['recommendations']:\n                md += f\"- {rec}\\n\"\n        \n        return md

# Example: Create model card for yield prediction model
card = ModelCard(\n    model_name=\"Wafer Yield Predictor\",\n    model_version=\"2.1.0\",\n    use_case=\"Predict wafer test yield for fab capacity planning\"\n)

# Add model details\ncard.add_model_details(\n    model_type=\"Random Forest Classifier (100 trees)\",\n    developer=\"Post-Silicon AI Team\",\n    training_date=\"2024-12-01\",\n    framework=\"scikit-learn 1.3.0\"\n)

# Add intended use\ncard.add_intended_use(\n    primary_uses=[\n        \"Predict wafer yield (pass/fail) for production lots\",\n        \"Fab capacity planning and resource allocation\",\n        \"Early detection of yield excursions (>5% drop)\"\n    ],\n    out_of_scope_uses=[\n        \"Final test yield prediction (different data distribution)\",\n        \"Individual die-level prediction (model trained on wafer-level)\",\n        \"Root cause analysis (model is black-box, use separate tools)\"\n    ],\n    users=[\"Fab operations\", \"Capacity planners\", \"Yield engineers\"]\n)

# Add metrics\ncard.add_metrics(\n    overall_metrics={\n        'accuracy': 0.923,\n        'precision': 0.91,\n        'recall': 0.89,\n        'f1_score': 0.90,\n        'auc_roc': 0.95\n    },\n    per_group_metrics={\n        'Fab A': {'accuracy': 0.925, 'f1_score': 0.91},\n        'Fab B': {'accuracy': 0.920, 'f1_score': 0.89},\n        'Fab C': {'accuracy': 0.924, 'f1_score': 0.90}\n    }\n)

# Add training data\ncard.add_training_data(\n    data_source=\"STDF wafer test data (production)\",\n    size=\"50,000 wafers (2023-06-01 to 2024-11-30)\",\n    time_period=\"18 months\",\n    features=[\"Vdd\", \"Idd\", \"Frequency\", \"Temperature\", \"Power\", \"Test Coverage\"]\n)

# Add ethical considerations\ncard.add_ethical_considerations(\n    fairness_assessment=\"Accuracy variance across 3 fabs is <0.5% (no significant bias)\",\n    bias_mitigation=\"Stratified sampling ensures equal representation of all fabs in training data\",\n    privacy_measures=\"Wafer IDs anonymized, no personally identifiable information (PII)\"\n)

# Add caveats\ncard.add_caveats_and_recommendations(\n    limitations=[\n        \"Model accuracy degrades when Vdd distribution shifts >50mV (data drift)\",\n        \"Not validated for new product lines (different test coverage)\",\n        \"Performance drops to 85% accuracy for lots with <100 die (small sample size)\"\n    ],\n    recommendations=[\n        \"Retrain weekly or when data drift detected (KS test p-value < 0.05)\",\n        \"Monitor per-fab accuracy monthly to detect bias\",\n        \"Use ensemble with physics-based model for critical decisions (>$1M impact)\",\n        \"Do not use for lots with <100 die (insufficient data)\"\n    ]\n)

# Generate and display markdown\nmarkdown_card = card.generate_markdown()\nprint(markdown_card)\n\nprint(\"\\n\" + \"=\"*80)\nprint(\"‚úÖ Model card generated\")\nprint(\"üíæ Can export to: JSON (programmatic), Markdown (human-readable), HTML (web display)\")

## 3. Bias Detection & Fairness Evaluation

**Purpose:** Implement fairness metrics to detect and quantify bias in ML models.

**Key Points:**
- **Fairness metrics**: Demographic parity, equalized odds, disparate impact, calibration
- **Sensitive attributes**: Protected groups (fab location, device type, shift) that should not cause bias
- **Fairness constraints**: Ensure similar performance across groups (accuracy variance <5%)
- **Mitigation strategies**: Reweighting, resampling, adversarial debiasing, fairness-aware algorithms

**Why This Matters:** Biased models lead to unfair outcomes (revenue loss, customer complaints), regulatory fines, and reputational damage.

In [None]:
class FairnessEvaluator:
    """
    Evaluate model fairness across sensitive groups.
    
    Implements multiple fairness metrics:
    - Demographic parity: P(pred=1 | group=A) ‚âà P(pred=1 | group=B)
    - Equalized odds: TPR and FPR equal across groups
    - Disparate impact: Ratio of positive rates (should be >0.8)
    - Calibration: Predicted probabilities match observed outcomes per group
    """
    
    def __init__(self, sensitive_attribute='group'):
        self.sensitive_attribute = sensitive_attribute
        
    def demographic_parity(self, y_pred, sensitive_features):
        """
        Measure demographic parity: positive rate should be similar across groups.
        
        Returns: dict of positive rates per group and max difference
        """
        groups = np.unique(sensitive_features)
        positive_rates = {}
        
        for group in groups:
            group_mask = sensitive_features == group
            positive_rate = np.mean(y_pred[group_mask])
            positive_rates[group] = positive_rate
        
        # Calculate max difference (fairness violation if >10%)
        max_diff = max(positive_rates.values()) - min(positive_rates.values())
        
        return {
            'positive_rates': positive_rates,
            'max_difference': max_diff,
            'violation': max_diff > 0.10,
            'interpretation': 'Demographic parity satisfied' if max_diff <= 0.10 else 'Demographic parity violated'
        }
    
    def equalized_odds(self, y_true, y_pred, sensitive_features):
        """
        Measure equalized odds: TPR and FPR should be equal across groups.
        
        TPR = True Positive Rate (sensitivity)
        FPR = False Positive Rate
        """
        groups = np.unique(sensitive_features)
        group_metrics = {}
        
        for group in groups:
            group_mask = sensitive_features == group
            y_true_group = y_true[group_mask]
            y_pred_group = y_pred[group_mask]
            
            # Calculate TPR and FPR
            tp = np.sum((y_true_group == 1) & (y_pred_group == 1))
            fn = np.sum((y_true_group == 1) & (y_pred_group == 0))
            fp = np.sum((y_true_group == 0) & (y_pred_group == 1))
            tn = np.sum((y_true_group == 0) & (y_pred_group == 0))
            
            tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
            fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
            
            group_metrics[group] = {'tpr': tpr, 'fpr': fpr}
        
        # Calculate max difference in TPR and FPR
        tpr_values = [m['tpr'] for m in group_metrics.values()]
        fpr_values = [m['fpr'] for m in group_metrics.values()]
        
        tpr_diff = max(tpr_values) - min(tpr_values)
        fpr_diff = max(fpr_values) - min(fpr_values)
        
        return {
            'group_metrics': group_metrics,
            'tpr_difference': tpr_diff,
            'fpr_difference': fpr_diff,
            'violation': tpr_diff > 0.10 or fpr_diff > 0.10,
            'interpretation': 'Equalized odds satisfied' if (tpr_diff <= 0.10 and fpr_diff <= 0.10) else 'Equalized odds violated'
        }
    
    def disparate_impact(self, y_pred, sensitive_features, reference_group=None):
        """
        Measure disparate impact: ratio of positive rates (80% rule).
        
        Disparate impact ratio = min(P(pred=1|group)) / max(P(pred=1|group))
        Should be >= 0.8 (80% rule from US employment law)
        """
        groups = np.unique(sensitive_features)
        positive_rates = {}
        
        for group in groups:
            group_mask = sensitive_features == group
            positive_rate = np.mean(y_pred[group_mask])
            positive_rates[group] = positive_rate
        
        min_rate = min(positive_rates.values())
        max_rate = max(positive_rates.values())
        
        di_ratio = min_rate / max_rate if max_rate > 0 else 0
        
        return {
            'positive_rates': positive_rates,
            'disparate_impact_ratio': di_ratio,
            'violation': di_ratio < 0.8,
            'interpretation': 'Passes 80% rule' if di_ratio >= 0.8 else 'Fails 80% rule (disparate impact detected)'
        }
    
    def accuracy_parity(self, y_true, y_pred, sensitive_features, threshold=0.05):
        """
        Measure accuracy parity: accuracy should be similar across groups.
        
        Variance in accuracy across groups should be < threshold (e.g., 5%)
        """
        groups = np.unique(sensitive_features)
        accuracies = {}
        
        for group in groups:
            group_mask = sensitive_features == group
            accuracy = np.mean(y_true[group_mask] == y_pred[group_mask])
            accuracies[group] = accuracy
        
        max_diff = max(accuracies.values()) - min(accuracies.values())
        
        return {
            'accuracies': accuracies,
            'max_difference': max_diff,
            'violation': max_diff > threshold,
            'interpretation': f'Accuracy parity satisfied (variance < {threshold})' if max_diff <= threshold else f'Accuracy parity violated (variance > {threshold})'
        }
    
    def comprehensive_report(self, y_true, y_pred, sensitive_features):
        """Generate comprehensive fairness report with all metrics."""
        report = {
            'demographic_parity': self.demographic_parity(y_pred, sensitive_features),
            'equalized_odds': self.equalized_odds(y_true, y_pred, sensitive_features),
            'disparate_impact': self.disparate_impact(y_pred, sensitive_features),
            'accuracy_parity': self.accuracy_parity(y_true, y_pred, sensitive_features)
        }
        
        # Overall fairness assessment
        violations = sum([metric['violation'] for metric in report.values()])
        report['overall_assessment'] = {
            'total_violations': violations,
            'status': 'FAIR' if violations == 0 else 'BIASED',
            'recommendation': 'Model passes all fairness checks' if violations == 0 else f'{violations} fairness violations detected - investigate and mitigate'
        }
        
        return report

# Example: Evaluate fairness for binning model across device types
print(\"üîç Fairness Evaluation: Binning Model Across Device Types\\n\")\nprint(\"=\"*80)\n\n# Simulate binning predictions (3 device types)\nnp.random.seed(42)\nn_samples = 1000\n\n# Device types (sensitive attribute)\ndevice_types = np.random.choice(['Type_A', 'Type_B', 'Type_C'], n_samples)\n\n# True bins (0=fail, 1=pass)\ny_true = np.random.choice([0, 1], n_samples, p=[0.15, 0.85])\n\n# Predictions with intentional bias (Type_B has lower positive rate)\ny_pred = y_true.copy()\n\n# Introduce bias: Type_B has 10% lower pass rate\ntype_b_mask = device_types == 'Type_B'\ntype_b_indices = np.where(type_b_mask & (y_pred == 1))[0]\nflip_indices = np.random.choice(type_b_indices, size=int(0.10 * len(type_b_indices)), replace=False)\ny_pred[flip_indices] = 0\n\n# Evaluate fairness\nevaluator = FairnessEvaluator()\nfairness_report = evaluator.comprehensive_report(y_true, y_pred, device_types)\n\n# Display results\nprint(\"\\n1Ô∏è‚É£ DEMOGRAPHIC PARITY\")\ndp = fairness_report['demographic_parity']\nfor device, rate in dp['positive_rates'].items():\n    print(f\"   {device}: {rate:.3f} pass rate\")\nprint(f\"   Max difference: {dp['max_difference']:.3f}\")\nprint(f\"   Status: {dp['interpretation']}\")\n\nprint(\"\\n2Ô∏è‚É£ EQUALIZED ODDS\")\neo = fairness_report['equalized_odds']\nfor device, metrics in eo['group_metrics'].items():\n    print(f\"   {device}: TPR={metrics['tpr']:.3f}, FPR={metrics['fpr']:.3f}\")\nprint(f\"   TPR difference: {eo['tpr_difference']:.3f}\")\nprint(f\"   FPR difference: {eo['fpr_difference']:.3f}\")\nprint(f\"   Status: {eo['interpretation']}\")\n\nprint(\"\\n3Ô∏è‚É£ DISPARATE IMPACT (80% Rule)\")\ndi = fairness_report['disparate_impact']\nprint(f\"   Disparate impact ratio: {di['disparate_impact_ratio']:.3f}\")\nprint(f\"   Status: {di['interpretation']}\")\n\nprint(\"\\n4Ô∏è‚É£ ACCURACY PARITY\")\nap = fairness_report['accuracy_parity']\nfor device, acc in ap['accuracies'].items():\n    print(f\"   {device}: {acc:.3f} accuracy\")\nprint(f\"   Max difference: {ap['max_difference']:.3f}\")\nprint(f\"   Status: {ap['interpretation']}\")\n\nprint(\"\\n\" + \"=\"*80)\noverall = fairness_report['overall_assessment']\nprint(f\"\\nüìä OVERALL FAIRNESS ASSESSMENT\")\nprint(f\"   Status: {overall['status']}\")\nprint(f\"   Violations: {overall['total_violations']}/4 metrics\")\nprint(f\"   Recommendation: {overall['recommendation']}\")\n\nif overall['status'] == 'BIASED':\n    print(\"\\n‚ö†Ô∏è  ACTION REQUIRED:\")\n    print(\"   1. Investigate root cause (data imbalance, feature bias)\")\n    print(\"   2. Apply mitigation (reweighting, resampling, fairness constraints)\")\n    print(\"   3. Re-evaluate fairness after mitigation\")\n    print(\"   4. Document in model card\")

## 4. Audit Trails & Model Lineage

**Purpose:** Implement comprehensive audit trails for model governance and compliance.

**Key Points:**
- **Model lineage**: Track data ‚Üí features ‚Üí training ‚Üí deployment pipeline
- **Audit log**: Record all model decisions (who, what, when, why)
- **Versioning**: Track model versions, training data versions, code versions (Git commit)
- **Approval workflow**: Document who approved model for production, when, based on what criteria

**Why This Matters:** Audit trails enable accountability, regulatory compliance (GDPR Article 22 - right to explanation), and incident investigation.

In [None]:
class ModelAuditTrail:
    """
    Comprehensive audit trail system for model governance.
    
    Tracks:
    - Training events (data, hyperparameters, metrics)
    - Deployment events (who, when, approval)
    - Inference events (predictions, inputs, outputs)
    - Governance events (fairness checks, compliance reviews)
    """
    
    def __init__(self, model_id):
        self.model_id = model_id
        self.audit_log = []
        
    def log_training(self, version, data_source, features, metrics, trained_by):
        """Log model training event."""
        event = {
            'event_type': 'TRAINING',
            'timestamp': datetime.now().isoformat(),
            'model_version': version,
            'data_source': data_source,
            'features': features,
            'metrics': metrics,
            'trained_by': trained_by,
            'git_commit': self._get_git_commit()  # Placeholder
        }
        self.audit_log.append(event)
        return event
        
    def log_validation(self, version, validation_results, validated_by):
        """Log model validation event."""
        event = {
            'event_type': 'VALIDATION',
            'timestamp': datetime.now().isoformat(),
            'model_version': version,
            'validation_results': validation_results,
            'validated_by': validated_by
        }
        self.audit_log.append(event)
        return event
        
    def log_fairness_check(self, version, fairness_metrics, passed, checked_by):
        """Log fairness evaluation event."""
        event = {
            'event_type': 'FAIRNESS_CHECK',
            'timestamp': datetime.now().isoformat(),
            'model_version': version,
            'fairness_metrics': fairness_metrics,
            'passed': passed,
            'checked_by': checked_by
        }
        self.audit_log.append(event)
        return event
        
    def log_approval(self, version, approved_by, approval_criteria, comments):
        """Log model approval for production deployment."""
        event = {
            'event_type': 'APPROVAL',
            'timestamp': datetime.now().isoformat(),
            'model_version': version,
            'approved_by': approved_by,
            'approval_criteria': approval_criteria,
            'comments': comments
        }
        self.audit_log.append(event)
        return event
        
    def log_deployment(self, version, environment, deployed_by):
        """Log model deployment event."""
        event = {
            'event_type': 'DEPLOYMENT',
            'timestamp': datetime.now().isoformat(),
            'model_version': version,
            'environment': environment,
            'deployed_by': deployed_by
        }
        self.audit_log.append(event)
        return event
        
    def log_inference(self, version, input_sample, prediction, confidence=None):
        \"\"\"Log individual inference (for high-stakes predictions).\"\"\"
        event = {
            'event_type': 'INFERENCE',
            'timestamp': datetime.now().isoformat(),
            'model_version': version,
            'input_sample': input_sample,
            'prediction': prediction,
            'confidence': confidence
        }
        self.audit_log.append(event)
        return event
        
    def log_incident(self, version, incident_type, description, reported_by):
        \"\"\"Log model-related incident.\"\"\"
        event = {
            'event_type': 'INCIDENT',
            'timestamp': datetime.now().isoformat(),
            'model_version': version,
            'incident_type': incident_type,
            'description': description,
            'reported_by': reported_by,
            'severity': self._determine_severity(incident_type)
        }
        self.audit_log.append(event)
        return event
        
    def get_lineage(self, version):
        \"\"\"Get complete lineage for a specific model version.\"\"\"
        lineage = [event for event in self.audit_log if event.get('model_version') == version]
        return sorted(lineage, key=lambda x: x['timestamp'])
        
    def get_timeline(self):
        \"\"\"Get chronological timeline of all events.\"\"\"
        return sorted(self.audit_log, key=lambda x: x['timestamp'])
        
    def export_compliance_report(self, version):
        \"\"\"Generate compliance report for auditors/regulators.\"\"\"
        lineage = self.get_lineage(version)
        
        report = {
            'model_id': self.model_id,
            'model_version': version,
            'report_generated': datetime.now().isoformat(),
            'events': lineage,
            'summary': {
                'total_events': len(lineage),
                'training_events': sum(1 for e in lineage if e['event_type'] == 'TRAINING'),
                'validation_events': sum(1 for e in lineage if e['event_type'] == 'VALIDATION'),
                'fairness_checks': sum(1 for e in lineage if e['event_type'] == 'FAIRNESS_CHECK'),
                'approvals': sum(1 for e in lineage if e['event_type'] == 'APPROVAL'),
                'deployments': sum(1 for e in lineage if e['event_type'] == 'DEPLOYMENT'),
                'incidents': sum(1 for e in lineage if e['event_type'] == 'INCIDENT')
            }
        }
        
        return report
        
    def _get_git_commit(self):
        \"\"\"Get current Git commit hash (placeholder).\"\"\"
        return \"a3f7b2c\"  # In production: subprocess.check_output(['git', 'rev-parse', 'HEAD'])
        
    def _determine_severity(self, incident_type):
        \"\"\"Determine incident severity.\"\"\"
        severity_map = {
            'performance_degradation': 'MEDIUM',
            'fairness_violation': 'HIGH',
            'prediction_error': 'MEDIUM',
            'security_breach': 'CRITICAL',
            'compliance_violation': 'CRITICAL'
        }
        return severity_map.get(incident_type, 'LOW')

# Example: Model lifecycle with complete audit trail
print(\"üìã Model Audit Trail: Yield Prediction Model v2.1.0\\n\")
print(\"=\"*80)\n\naudit = ModelAuditTrail(model_id=\"yield_predictor_001\")

# Step 1: Training\ntraining_event = audit.log_training(\n    version=\"2.1.0\",\n    data_source=\"STDF wafer test (2023-06 to 2024-11)\",\n    features=[\"Vdd\", \"Idd\", \"Frequency\", \"Temperature\", \"Power\"],\n    metrics={'accuracy': 0.923, 'f1_score': 0.90},\n    trained_by=\"Alice Chen (ML Engineer)\"\n)\nprint(f\"‚úÖ Training logged: {training_event['timestamp']}\")\nprint(f\"   Version: {training_event['model_version']}\")\nprint(f\"   Accuracy: {training_event['metrics']['accuracy']}\")\n\n# Step 2: Validation\nvalidation_event = audit.log_validation(\n    version=\"2.1.0\",\n    validation_results={\n        'passed_gates': ['accuracy >= 0.90', 'better_than_production'],\n        'failed_gates': []\n    },\n    validated_by=\"Bob Martinez (ML Engineer)\"\n)\nprint(f\"\\n‚úÖ Validation logged: {validation_event['timestamp']}\")\nprint(f\"   All gates passed: {len(validation_event['validation_results']['failed_gates']) == 0}\")\n\n# Step 3: Fairness check\nfairness_event = audit.log_fairness_check(\n    version=\"2.1.0\",\n    fairness_metrics={\n        'demographic_parity': 'PASS',\n        'accuracy_variance': 0.005,  # 0.5% across fabs\n        'disparate_impact_ratio': 0.95\n    },\n    passed=True,\n    checked_by=\"Carol Thompson (AI Ethics Lead)\"\n)\nprint(f\"\\n‚úÖ Fairness check logged: {fairness_event['timestamp']}\")\nprint(f\"   Status: {'PASSED' if fairness_event['passed'] else 'FAILED'}\")\nprint(f\"   Accuracy variance: {fairness_event['fairness_metrics']['accuracy_variance']:.3f}\")\n\n# Step 4: Approval\napproval_event = audit.log_approval(\n    version=\"2.1.0\",\n    approved_by=\"David Lee (Director, Post-Silicon AI)\",\n    approval_criteria=[\n        \"Accuracy >= 0.92 (target: 0.90)\",\n        \"Fairness checks passed\",\n        \"Better than production model by 2%\",\n        \"Model card reviewed and approved\"\n    ],\n    comments=\"Approved for production deployment. Monitor per-fab accuracy weekly.\"\n)\nprint(f\"\\n‚úÖ Approval logged: {approval_event['timestamp']}\")\nprint(f\"   Approved by: {approval_event['approved_by']}\")\nprint(f\"   Comments: {approval_event['comments']}\")\n\n# Step 5: Deployment\ndeployment_event = audit.log_deployment(\n    version=\"2.1.0\",\n    environment=\"Production (Fabs A, B, C)\",\n    deployed_by=\"Eve Johnson (DevOps Engineer)\"\n)\nprint(f\"\\n‚úÖ Deployment logged: {deployment_event['timestamp']}\")\nprint(f\"   Environment: {deployment_event['environment']}\")\n\n# Step 6: Log some inferences (high-stakes predictions)\ninference1 = audit.log_inference(\n    version=\"2.1.0\",\n    input_sample={'wafer_id': 'W12345', 'vdd': 1.25, 'idd': 52, 'freq': 2450},\n    prediction='PASS',\n    confidence=0.94\n)\nprint(f\"\\nüìä Inference logged: Wafer W12345 ‚Üí {inference1['prediction']} (conf: {inference1['confidence']})\")\n\n# Step 7: Incident (example: performance degradation detected)\nincident_event = audit.log_incident(\n    version=\"2.1.0\",\n    incident_type=\"performance_degradation\",\n    description=\"Accuracy dropped to 0.88 on Fab C (data drift detected in Vdd distribution)\",\n    reported_by=\"Automated Monitoring System\"\n)\nprint(f\"\\n‚ö†Ô∏è  Incident logged: {incident_event['timestamp']}\")\nprint(f\"   Type: {incident_event['incident_type']}\")\nprint(f\"   Severity: {incident_event['severity']}\")\nprint(f\"   Description: {incident_event['description']}\")\n\n# Generate compliance report\nprint(f\"\\n{'-'*80}\")\nprint(\"üìÑ COMPLIANCE REPORT\")\nprint(f\"{'-'*80}\\n\")\n\nreport = audit.export_compliance_report(version=\"2.1.0\")\nprint(f\"Model: {report['model_id']} v{report['model_version']}\")\nprint(f\"Report generated: {report['report_generated']}\")\nprint(f\"\\nEvent Summary:\")\nfor event_type, count in report['summary'].items():\n    if event_type != 'total_events':\n        print(f\"  {event_type.replace('_', ' ').title()}: {count}\")\n\nprint(f\"\\nComplete audit trail with {report['summary']['total_events']} events available for regulatory review\")\nprint(\"‚úÖ Full lineage from training ‚Üí validation ‚Üí fairness ‚Üí approval ‚Üí deployment ‚Üí production\")

## 5. Regulatory Compliance Framework

**Purpose:** Implement compliance checks for major AI/ML regulations (GDPR, AI Act, SR 11-7).

**Key Points:**
- **GDPR (EU)**: Right to explanation (Article 22), data protection, consent, privacy
- **AI Act (EU)**: Risk classification (high-risk systems require documentation, testing, human oversight)
- **SR 11-7 (US Banking)**: Model risk management for financial institutions
- **Compliance automation**: Automated checks for regulation-specific requirements

**Why This Matters:** Non-compliance results in massive fines (‚Ç¨20M or 4% revenue for GDPR), legal liability, and market access restrictions.

In [None]:
class ComplianceChecker:
    """
    Automated compliance checker for AI/ML regulations.
    
    Supports:
    - GDPR (General Data Protection Regulation) - EU
    - AI Act - EU (risk classification, documentation)
    - SR 11-7 - US Federal Reserve (banking model risk management)
    """
    
    def __init__(self, regulation='GDPR'):
        self.regulation = regulation
        self.compliance_checks = []
        
    def check_gdpr_compliance(self, model_metadata):
        \"\"\"
        Check GDPR compliance requirements.
        
        Key requirements:
        - Article 22: Right to explanation for automated decisions
        - Article 13-14: Transparency (purpose, data used, retention)
        - Article 25: Data protection by design and default
        \"\"\"
        checks = []
        
        # Check 1: Explainability (Article 22)
        has_explainability = model_metadata.get('explainability_method') is not None
        checks.append({
            'requirement': 'GDPR Article 22: Right to explanation',
            'passed': has_explainability,
            'details': f\"Explainability method: {model_metadata.get('explainability_method', 'NOT PROVIDED')}\"
        })
        
        # Check 2: Data documentation (Article 13-14)
        has_data_docs = all([
            model_metadata.get('data_source'),
            model_metadata.get('data_retention_policy'),
            model_metadata.get('purpose')
        ])
        checks.append({
            'requirement': 'GDPR Article 13-14: Data transparency',
            'passed': has_data_docs,
            'details': 'Data source, retention policy, and purpose documented' if has_data_docs else 'Missing data documentation'
        })
        
        # Check 3: Privacy measures (Article 25)
        has_privacy = model_metadata.get('privacy_measures') is not None
        checks.append({
            'requirement': 'GDPR Article 25: Data protection by design',
            'passed': has_privacy,
            'details': f\"Privacy measures: {model_metadata.get('privacy_measures', 'NOT PROVIDED')}\"
        })
        
        # Check 4: Consent tracking (if personal data used)
        uses_personal_data = model_metadata.get('uses_personal_data', False)
        has_consent = model_metadata.get('consent_mechanism') is not None
        consent_check_passed = (not uses_personal_data) or (uses_personal_data and has_consent)
        
        checks.append({
            'requirement': 'GDPR: Consent for personal data',
            'passed': consent_check_passed,
            'details': f\"Personal data: {uses_personal_data}, Consent mechanism: {model_metadata.get('consent_mechanism', 'N/A')}\"
        })
        
        return checks
    
    def check_ai_act_compliance(self, model_metadata):
        \"\"\"
        Check EU AI Act compliance (high-risk AI systems).
        
        Requirements for high-risk systems:
        - Risk management system
        - Data governance and quality
        - Technical documentation (model card)
        - Record-keeping (audit logs)
        - Transparency and information to users
        - Human oversight
        - Accuracy, robustness, cybersecurity
        \"\"\"
        checks = []
        
        # Check 1: Risk classification\nrisk_level = model_metadata.get('risk_level', 'UNASSESSED')
        checks.append({
            'requirement': 'AI Act: Risk classification',
            'passed': risk_level in ['LOW', 'MEDIUM', 'HIGH'],
            'details': f\"Risk level: {risk_level} (LOW/MEDIUM/HIGH required)\"
        })
        
        # Check 2: Technical documentation (for high-risk systems)
        is_high_risk = risk_level == 'HIGH'
        has_tech_docs = model_metadata.get('model_card') is not None
        
        if is_high_risk:
            checks.append({
                'requirement': 'AI Act (High-Risk): Technical documentation',
                'passed': has_tech_docs,
                'details': 'Model card required for high-risk systems' + (' - PROVIDED' if has_tech_docs else ' - MISSING')
            })
        
        # Check 3: Audit trail (record-keeping)
        has_audit_trail = model_metadata.get('audit_trail_enabled', False)
        checks.append({
            'requirement': 'AI Act: Record-keeping',
            'passed': has_audit_trail,
            'details': f\"Audit trail: {'ENABLED' if has_audit_trail else 'DISABLED'}\"
        })
        
        # Check 4: Human oversight (for high-risk systems)
        has_human_oversight = model_metadata.get('human_oversight', False)
        
        if is_high_risk:
            checks.append({
                'requirement': 'AI Act (High-Risk): Human oversight',
                'passed': has_human_oversight,
                'details': 'Human-in-the-loop required for high-risk systems' + (' - ENABLED' if has_human_oversight else ' - MISSING')
            })
        
        # Check 5: Accuracy and robustness testing
        has_testing = model_metadata.get('testing_framework') is not None
        checks.append({
            'requirement': 'AI Act: Accuracy and robustness',
            'passed': has_testing,
            'details': f\"Testing framework: {model_metadata.get('testing_framework', 'NOT PROVIDED')}\"
        })
        
        return checks
    
    def check_sr11_7_compliance(self, model_metadata):
        \"\"\"
        Check US Federal Reserve SR 11-7 compliance (banking model risk management).
        
        Requirements:
        - Model validation (independent review)
        - Model documentation
        - Ongoing monitoring
        - Policies and controls
        - Model inventory
        \"\"\"
        checks = []
        
        # Check 1: Independent validation
        has_validation = model_metadata.get('independent_validation', False)
        checks.append({
            'requirement': 'SR 11-7: Independent model validation',
            'passed': has_validation,
            'details': f\"Independent validation: {'COMPLETED' if has_validation else 'NOT COMPLETED'}\"
        })
        
        # Check 2: Model documentation
        has_docs = all([
            model_metadata.get('model_card'),
            model_metadata.get('assumptions_documented'),
            model_metadata.get('limitations_documented')
        ])
        checks.append({
            'requirement': 'SR 11-7: Comprehensive documentation',
            'passed': has_docs,
            'details': 'Model card, assumptions, and limitations required' + (' - ALL PROVIDED' if has_docs else ' - INCOMPLETE')
        })
        
        # Check 3: Ongoing monitoring
        has_monitoring = model_metadata.get('monitoring_enabled', False)
        checks.append({
            'requirement': 'SR 11-7: Ongoing monitoring',
            'passed': has_monitoring,
            'details': f\"Monitoring: {'ENABLED' if has_monitoring else 'DISABLED'}\"
        })
        
        # Check 4: Model inventory
        in_inventory = model_metadata.get('registered_in_inventory', False)
        checks.append({
            'requirement': 'SR 11-7: Model inventory',
            'passed': in_inventory,
            'details': f\"Model inventory: {'REGISTERED' if in_inventory else 'NOT REGISTERED'}\"
        })
        
        # Check 5: Governance and controls
        has_governance = all([
            model_metadata.get('approval_workflow'),
            model_metadata.get('change_management'),
            model_metadata.get('incident_response_plan')
        ])
        checks.append({
            'requirement': 'SR 11-7: Governance and controls',
            'passed': has_governance,
            'details': 'Approval, change management, incident response required' + (' - ALL PRESENT' if has_governance else ' - INCOMPLETE')
        })
        
        return checks
    
    def run_compliance_check(self, model_metadata):
        \"\"\"Run compliance check for specified regulation.\"\"\"
        if self.regulation == 'GDPR':
            checks = self.check_gdpr_compliance(model_metadata)
        elif self.regulation == 'AI_ACT':
            checks = self.check_ai_act_compliance(model_metadata)
        elif self.regulation == 'SR_11_7':
            checks = self.check_sr11_7_compliance(model_metadata)
        else:
            raise ValueError(f\"Unknown regulation: {self.regulation}\")\n        \n        passed_count = sum(1 for check in checks if check['passed'])
        total_count = len(checks)
        \n        result = {
            'regulation': self.regulation,
            'checks': checks,
            'summary': {
                'passed': passed_count,
                'total': total_count,
                'compliance_rate': passed_count / total_count if total_count > 0 else 0,
                'status': 'COMPLIANT' if passed_count == total_count else 'NON-COMPLIANT'
            }
        }
        \n        return result

# Example: Compliance check for yield prediction model
print(\"üîí Regulatory Compliance Check: Yield Prediction Model\\n\")
print(\"=\"*80)

# Model metadata
model_metadata = {
    'model_name': 'Wafer Yield Predictor',
    'model_version': '2.1.0',
    'purpose': 'Predict wafer test yield for fab capacity planning',
    'data_source': 'STDF wafer test data (production)',
    'data_retention_policy': '2 years (per company policy)',
    'uses_personal_data': False,  # No PII in wafer test data
    'privacy_measures': 'Wafer IDs anonymized, no employee data',
    'explainability_method': 'SHAP values for feature importance',
    'model_card': True,
    'audit_trail_enabled': True,
    'risk_level': 'MEDIUM',  # Not safety-critical, but business-critical
    'human_oversight': True,  # Engineers review predictions before capacity decisions
    'testing_framework': 'pytest with property-based testing (Hypothesis)',
    'independent_validation': True,
    'assumptions_documented': True,
    'limitations_documented': True,
    'monitoring_enabled': True,
    'registered_in_inventory': True,
    'approval_workflow': True,
    'change_management': True,
    'incident_response_plan': True
}

# Check GDPR compliance
print(\"\\n1Ô∏è‚É£ GDPR COMPLIANCE (EU Data Protection)\\n\")
gdpr_checker = ComplianceChecker(regulation='GDPR')
gdpr_result = gdpr_checker.run_compliance_check(model_metadata)

for i, check in enumerate(gdpr_result['checks'], 1):
    status = '‚úÖ' if check['passed'] else '‚ùå'
    print(f\"   {status} {check['requirement']}\")
    print(f\"      {check['details']}\")

print(f\"\\n   Summary: {gdpr_result['summary']['passed']}/{gdpr_result['summary']['total']} checks passed\")
print(f\"   Status: {gdpr_result['summary']['status']}\\n\")

# Check AI Act compliance
print(\"\\n2Ô∏è‚É£ EU AI ACT COMPLIANCE (High-Risk AI Systems)\\n\")
ai_act_checker = ComplianceChecker(regulation='AI_ACT')
ai_act_result = ai_act_checker.run_compliance_check(model_metadata)

for i, check in enumerate(ai_act_result['checks'], 1):
    status = '‚úÖ' if check['passed'] else '‚ùå'
    print(f\"   {status} {check['requirement']}\")
    print(f\"      {check['details']}\")

print(f\"\\n   Summary: {ai_act_result['summary']['passed']}/{ai_act_result['summary']['total']} checks passed\")
print(f\"   Status: {ai_act_result['summary']['status']}\\n\")

# Check SR 11-7 compliance (banking)
print(\"\\n3Ô∏è‚É£ SR 11-7 COMPLIANCE (US Banking Model Risk Management)\\n\")
sr11_7_checker = ComplianceChecker(regulation='SR_11_7')
sr11_7_result = sr11_7_checker.run_compliance_check(model_metadata)

for i, check in enumerate(sr11_7_result['checks'], 1):
    status = '‚úÖ' if check['passed'] else '‚ùå'
    print(f\"   {status} {check['requirement']}\")
    print(f\"      {check['details']}\")

print(f\"\\n   Summary: {sr11_7_result['summary']['passed']}/{sr11_7_result['summary']['total']} checks passed\")
print(f\"   Status: {sr11_7_result['summary']['status']}\\n\")

# Overall compliance summary
print(\"=\"*80)
print(\"\\nüìä OVERALL COMPLIANCE STATUS\\n\")
print(f\"   GDPR: {gdpr_result['summary']['status']} ({gdpr_result['summary']['compliance_rate']:.0%})\")\nprint(f\"   AI Act: {ai_act_result['summary']['status']} ({ai_act_result['summary']['compliance_rate']:.0%})\")\nprint(f\"   SR 11-7: {sr11_7_result['summary']['status']} ({sr11_7_result['summary']['compliance_rate']:.0%})\")\n\nall_compliant = all([\n    gdpr_result['summary']['status'] == 'COMPLIANT',\n    ai_act_result['summary']['status'] == 'COMPLIANT',\n    sr11_7_result['summary']['status'] == 'COMPLIANT'\n])\n\nif all_compliant:\n    print(\"\\n‚úÖ Model is COMPLIANT with all major regulations\")\n    print(\"   Ready for deployment in regulated environments\")\nelse:\n    print(\"\\n‚ö†Ô∏è  COMPLIANCE ISSUES DETECTED\")\n    print(\"   Address non-compliant items before production deployment\")

## 6. Real-World Project Templates

**Purpose:** 8 production-ready governance projects (4 post-silicon validation + 4 general AI/ML).

**Pattern:** Each project includes governance requirements, compliance obligations, and success criteria.

In [None]:
projects = {
    "post_silicon": [
        {
            "name": "Yield Prediction Model Governance System",
            "objective": "Complete governance framework for wafer yield prediction (used for $500M+ capacity decisions)",
            "governance_requirements": [
                "Model card with intended use, limitations, performance metrics (overall + per-fab)",
                "Fairness evaluation: Accuracy variance across 3 fabs must be <3%",
                "Audit trail: Track all training, validation, approval, deployment events",
                "Approval workflow: Director sign-off required before production deployment",
                "Quarterly governance review: Fairness re-evaluation, model card updates"
            ],
            "compliance_obligations": [
                "GDPR: Right to explanation (SHAP values for predictions impacting $1M+ decisions)",
                "Internal policy: Model versioning, change management, incident response",
                "ISO 9001: Quality management documentation for semiconductor manufacturing"
            ],
            "documentation": [
                "Model card (JSON + Markdown + HTML)",
                "Fairness report (quarterly)",
                "Compliance checklist (GDPR, internal policies)",
                "Audit trail export (CSV for annual audit)",
                "Model lineage diagram (training data ‚Üí features ‚Üí model ‚Üí deployment)"
            ],
            "success_criteria": "100% compliance with governance requirements, zero audit findings, <24 hour response to governance queries",
            "value": "Enable responsible AI adoption, pass audits, maintain stakeholder trust ($500M decisions)"
        },
        {
            "name": "Binning Model Fairness & Explainability System",
            "objective": "Ensure fair and explainable binning across device types (prevent $50M revenue loss from biased binning)",
            "governance_requirements": [
                "Fairness metrics: Demographic parity, equalized odds, accuracy parity across 5 device types",
                "Explainability: SHAP values showing why device binned (Vdd margin, Fmax, leakage)",
                "Validation gates: Reject model if accuracy variance >5% across device types",
                "Bias mitigation: Stratified sampling, fairness-aware reweighting if bias detected",
                "Monthly fairness monitoring: Detect fairness degradation in production"
            ],
            "compliance_obligations": [
                "Customer contracts: Binning accuracy guarantees (97%+ for premium bins)",
                "Internal policy: No systematic bias against specific device families",
                "Transparency: Explain binning decisions to product engineers"
            ],
            "explainability_methods": [
                "SHAP values: Feature importance for each binning decision",
                "Decision tree surrogate: Approximate RF with interpretable tree",
                "Counterfactual explanations: 'Device would bin higher if Vdd_margin increased by 20mV'",
                "Feature contribution plots: Visual breakdown of binning factors"
            ],
            "success_criteria": "Zero fairness violations for 6 months, 100% binning decisions explainable, 97%+ accuracy across all device types",
            "value": "Prevent revenue loss from unfair binning ($50M), enable engineer trust, satisfy customer requirements"
        },
        {
            "name": "Test Time Prediction Governance for SLA Compliance",
            "objective": "Document and monitor test time prediction model used for customer SLA commitments (prevent $1M+ penalties)",
            "governance_requirements": [
                "Model card: Performance metrics (MAPE, prediction intervals), limitations (accuracy degrades for new test programs)",
                "Confidence intervals: Provide 95% prediction intervals for all test time estimates",
                "Uncertainty quantification: Flag predictions with high uncertainty (>20% interval width)",
                "Monitoring: Alert if MAPE exceeds 5% (SLA threshold)",
                "Incident response: Escalate to operations team if SLA risk detected"
            ],
            "compliance_obligations": [
                "Customer SLAs: Deliver devices within predicted timeline ¬±10%",
                "Internal policy: Document assumptions (test parallelization, tester availability)",
                "Change management: Retrain model when test programs change (track trigger events)"
            ],
            "documentation": [
                "Model card with SLA implications clearly stated",
                "Prediction uncertainty report (daily)",
                "SLA compliance dashboard (real-time)",
                "Model assumptions document (updated when test programs change)"
            ],
            "success_criteria": "Zero SLA violations attributable to model errors, 100% predictions include confidence intervals, <5% MAPE maintained",
            "value": "Avoid SLA penalties ($1M/year), maintain customer relationships, enable confident delivery commitments"
        },
        {
            "name": "Wafer Map Anomaly Detection Governance (Safety-Critical)",
            "objective": "Govern anomaly detection model that triggers production line halts (multi-million dollar impact per halt)",
            "governance_requirements": [
                "Risk classification: HIGH (safety-critical system impacting production)",
                "Human oversight: Require fab manager approval before automated line halt (human-in-the-loop)",
                "Explainability: Show spatial patterns that triggered anomaly (wafer map heatmap, die clustering)",
                "Validation: Precision >= 85% (limit false positives), recall >= 75% (catch real anomalies)",
                "Audit trail: Log every anomaly detection with model version, input wafer map, prediction, confidence",
                "Quarterly review: Independent validation by quality engineers"
            ],
            "compliance_obligations": [
                "AI Act (EU): High-risk system requires documentation, testing, human oversight, accuracy guarantees",
                "Internal policy: Root cause analysis for all false positive halts (prevent $500K/hour downtime)",
                "Quality standards: ISO 9001 documentation for quality control processes"
            ],
            "safety_controls": [
                "Two-stage approval: Model flags anomaly ‚Üí engineer reviews wafer map ‚Üí manager approves halt",
                "Confidence thresholds: Only auto-alert if confidence >90%, manual review if 70-90%",
                "Override mechanism: Engineers can override model decision (with justification logged)",
                "Redundancy: Combine model with physics-based checks (kill-rate analysis)"
            ],
            "success_criteria": "Zero false positive halts, 100% anomalies reviewed by humans before action, full audit trail for all decisions",
            "value": "Prevent costly false halts ($500K/hour), catch real defects early (save $1M/year), satisfy regulatory requirements"
        }
    ],
    "general_ml": [
        {
            "name": "Credit Scoring Model Governance (GDPR + Fair Lending)",
            "objective": "Govern credit scoring model to ensure fairness, transparency, and regulatory compliance (prevent $50M+ fines)",
            "governance_requirements": [
                "Fairness: Equalized odds across protected groups (race, gender, age) - <5% TPR/FPR difference",
                "Explainability: Right to explanation for all credit denials (GDPR Article 22)",
                "Model card: Document training data, features (no prohibited features), limitations",
                "Audit trail: Log all credit decisions with model version, inputs, outputs, explanations",
                "Quarterly fairness audits: Independent review by compliance team"
            ],
            "compliance_obligations": [
                "GDPR: Right to explanation, data protection, consent",
                "Equal Credit Opportunity Act (US): No discrimination based on protected characteristics",
                "Fair Lending laws: Demonstrate disparate impact ratio >= 0.8 (80% rule)",
                "SR 11-7 (if bank): Model validation, documentation, ongoing monitoring"
            ],
            "explainability_methods": [
                "LIME: Local explanations for each credit decision",
                "Counterfactuals: 'Applicant would be approved if income increased by $10K'",
                "Feature importance: Global understanding of model (income, debt-to-income ratio most important)",
                "Adverse action reasons: Top 3 reasons for denial (compliant with regulations)"
            ],
            "success_criteria": "Zero fairness violations, 100% denials explainable, pass regulatory audits, zero GDPR complaints",
            "value": "Avoid regulatory fines ($50M+ for GDPR), prevent discrimination lawsuits, maintain banking license"
        },
        {
            "name": "Healthcare Diagnosis AI Governance (High-Risk, Life-Critical)",
            "objective": "Govern medical diagnosis model (cancer detection) with highest safety and compliance standards",
            "governance_requirements": [
                "Risk classification: CRITICAL (impacts patient health outcomes)",
                "Clinical validation: Independent validation by medical professionals (physicians, radiologists)",
                "Explainability: Heatmaps showing image regions that indicate cancer (Grad-CAM, SHAP)",
                "Human oversight: Physician final decision (model is decision support, not autonomous)",
                "Fairness: Equal sensitivity/specificity across demographics (race, age, gender)",
                "Audit trail: Log all diagnoses with patient ID, model version, confidence, physician decision",
                "Annual re-validation: Performance monitoring, fairness audits, clinical accuracy checks"
            ],
            "compliance_obligations": [
                "FDA approval (US): Premarket approval for medical devices (510(k) or PMA)",
                "EU MDR: Medical device regulation (CE marking requirements)",
                "HIPAA: Patient data privacy and security",
                "AI Act (EU): High-risk AI system (healthcare) - full documentation, testing, oversight"
            ],
            "safety_controls": [
                "Sensitivity >= 95% (minimize false negatives - critical for cancer)",
                "Physician override: 100% of predictions reviewed by human expert",
                "Uncertainty quantification: Flag low-confidence predictions for additional review",
                "Redundancy: Combine AI with standard diagnostic protocols (biopsy, second opinion)"
            ],
            "success_criteria": "FDA/EU approval obtained, 95%+ sensitivity maintained, zero patient harm incidents, 100% physician oversight",
            "value": "Save lives (early cancer detection), pass regulatory approval, enable clinical adoption, avoid malpractice lawsuits"
        },
        {
            "name": "Hiring AI Governance (Fairness-Critical, Legal Risk)",
            "objective": "Govern resume screening AI to ensure fair hiring, prevent discrimination lawsuits ($millions in damages)",
            "governance_requirements": [
                "Fairness: Demographic parity across protected groups (race, gender, age) - <10% difference",
                "Prohibited features: No race, gender, age, zip code (proxy for race) in model",
                "Explainability: Hiring managers understand why candidates ranked (skills, experience)",
                "Audit trail: Log all screening decisions (candidate ID, model score, hiring decision)",
                "Quarterly fairness audits: Test for disparate impact (80% rule)",
                "Human-in-the-loop: Model ranks candidates, humans make final decision (no autonomous hiring)"
            ],
            "compliance_obligations": [
                "Equal Employment Opportunity Act (US): No discrimination in hiring",
                "GDPR (EU): Right to explanation, data protection for applicant data",
                "NYC Bias Audit Law: Annual bias audit for AI hiring tools (New York City)",
                "AI Act (EU): High-risk AI system (employment) - documentation, fairness, human oversight"
            ],
            "bias_mitigation": [
                "Adversarial debiasing: Train model to be invariant to protected attributes",
                "Fairness constraints: Optimize for accuracy subject to fairness constraints (equalized odds)",
                "Blind screening: Remove names, photos, universities (proxies for race/gender)",
                "Diverse training data: Ensure training set represents diverse candidates"
            ],
            "success_criteria": "Pass 80% rule for all protected groups, zero discrimination lawsuits, NYC bias audit compliance, 100% hiring decisions by humans",
            "value": "Avoid discrimination lawsuits ($millions), attract diverse talent, comply with regulations, maintain employer brand"
        },
        {
            "name": "Autonomous Vehicle Decision Model Governance (Safety-Critical)",
            "objective": "Govern AV decision model (lane change, braking) with highest safety standards (prevent accidents, fatalities)",
            "governance_requirements": [
                "Risk classification: CRITICAL (safety-critical, life-or-death decisions)",
                "Safety validation: Billions of simulated miles, real-world testing (track + public roads)",
                "Explainability: Understand why AV made decision (sensor inputs, predicted trajectories)",
                "Redundancy: Multiple independent models, sensor fusion, fallback to safe stop",
                "Audit trail: Log all decisions (sensor data, predictions, actions) - black box recorder",
                "Continuous monitoring: Fleet-wide performance tracking, incident analysis",
                "Regulatory approval: NHTSA (US), UNECE (EU) self-driving vehicle standards"
            ],
            "compliance_obligations": [
                "NHTSA safety standards: Crashworthiness, occupant protection, crash avoidance",
                "UNECE regulations: Automated Lane Keeping Systems (ALKS), cybersecurity",
                "State laws (US): Vary by state (California, Arizona, etc.) - registration, testing permits",
                "AI Act (EU): High-risk AI system (safety component of vehicle) - full compliance"
            ],
            "safety_controls": [
                "Fail-safe: If model uncertain or fails, execute safe stop (slow down, pull over)",
                "Human takeover: Driver can override at any time (monitor driver attention)",
                "Redundant sensors: LiDAR, camera, radar (sensor fusion for robustness)",
                "Conservative decision-making: Prioritize safety over efficiency (wider margins)",
                "Incident reporting: Report all accidents, near-misses to regulators (transparency)"
            ],
            "success_criteria": "Zero at-fault accidents, regulatory approval obtained, 99.99%+ uptime, <0.01% unsafe decisions",
            "value": "Save lives (reduce 90% of traffic accidents), obtain regulatory approval ($billions market), avoid liability lawsuits"
        }
    ]
}

print(\"üéØ 8 Model Governance Project Templates\")
print(\"=\"*80)

print(\"\\nüì¶ POST-SILICON VALIDATION PROJECTS (4)\\n\")
for i, project in enumerate(projects[\"post_silicon\"], 1):
    print(f\"{i}. {project['name']}\")
    print(f\"   Objective: {project['objective']}\")
    print(f\"   Governance: {len(project['governance_requirements'])} requirements\")
    print(f\"   Compliance: {len(project['compliance_obligations'])} obligations\")
    print(f\"   Success: {project['success_criteria']}\")
    print(f\"   üí∞ Value: {project['value']}\")
    print()

print(\"\\nüåê GENERAL AI/ML PROJECTS (4)\\n\")
for i, project in enumerate(projects[\"general_ml\"], 1):
    print(f\"{i}. {project['name']}\")
    print(f\"   Objective: {project['objective']}\")
    print(f\"   Governance: {len(project['governance_requirements'])} requirements\")
    print(f\"   Compliance: {len(project['compliance_obligations'])} obligations\")
    print(f\"   Success: {project['success_criteria']}\")
    print(f\"   üí∞ Value: {project['value']}\")
    print()

print(\"=\"*80)
print(\"‚úÖ All projects include: Model cards, fairness evaluation, audit trails, compliance checks\")

## 7. üéì Key Takeaways & Best Practices

### üìå Core Concepts

**1. Model Governance Fundamentals**
- **Definition**: Framework of policies, procedures, and controls ensuring responsible ML model development and deployment
- **Components**: Documentation (model cards), fairness (bias detection), transparency (explainability), compliance (regulations), accountability (audit trails)
- **Purpose**: Ensure models are trustworthy, fair, compliant, and beneficial
- **Governance vs MLOps**: MLOps = HOW to deploy (technical), Governance = WHAT models must satisfy (policies)

**Without governance**: Models cause harm (discrimination, privacy violations), regulatory fines ($50M+ GDPR), reputational damage  
**With governance**: Responsible AI adoption, stakeholder trust, regulatory compliance, reduced legal risk

**2. Why Governance Matters**
- **Regulatory landscape**: GDPR (EU data protection), AI Act (EU high-risk systems), SR 11-7 (US banking), CCPA (California privacy)
- **Financial risk**: GDPR fines up to ‚Ç¨20M or 4% revenue (whichever higher), discrimination lawsuits ($millions)
- **Reputational risk**: Biased models damage brand (Amazon hiring AI, facial recognition bias)
- **Ethical obligation**: ML impacts lives (credit, healthcare, hiring) - must be fair and explainable
- **Market access**: EU AI Act restricts high-risk AI without compliance (lose market access)

---

### üìÑ Model Cards

**3. Model Card Structure**
- **Model details**: Name, version, type, developer, framework, training date
- **Intended use**: Primary uses, out-of-scope uses, primary users
- **Metrics**: Overall performance + per-group metrics (fairness)
- **Training data**: Source, size, time period, features
- **Evaluation data**: Test set characteristics
- **Ethical considerations**: Fairness assessment, bias mitigation, privacy measures
- **Caveats & recommendations**: Limitations, usage recommendations, monitoring requirements

**4. Model Card Best Practices**
- **Living document**: Update with each model version, retraining, or performance change
- **Multiple formats**: JSON (machine-readable), Markdown (human-readable), HTML (web display)
- **Audience-specific**: Technical (data scientists), business (executives), compliance (auditors, regulators)
- **Versioning**: Track model card versions alongside model versions (lineage)
- **Accessibility**: Store in model registry, share with stakeholders, publish for transparency

**Example use**:
```python
card = ModelCard(\"Credit Scorer\", \"1.0\", \"Approve/deny credit applications\")
card.add_metrics(overall={'accuracy': 0.85}, per_group={'Group A': 0.84, 'Group B': 0.86})
card.export_json('model_card_v1.json')  # For compliance reports
markdown = card.generate_markdown()  # For human review
```

---

### ‚öñÔ∏è Fairness & Bias

**5. Fairness Metrics**
- **Demographic parity**: Positive rate similar across groups (P(pred=1 | group=A) ‚âà P(pred=1 | group=B))
- **Equalized odds**: TPR and FPR equal across groups (equal true positive and false positive rates)
- **Disparate impact**: Ratio of positive rates >= 0.8 (80% rule from US employment law)
- **Accuracy parity**: Accuracy similar across groups (<5% variance typically acceptable)
- **Calibration**: Predicted probabilities match observed frequencies per group

**6. When to Use Which Metric**
- **Demographic parity**: Use when outcome rates should be similar (hiring, lending approval rates)
- **Equalized odds**: Use when error rates matter equally (healthcare diagnosis - minimize both false positives and false negatives)
- **Disparate impact**: Use for legal compliance (hiring, lending - 80% rule)
- **Accuracy parity**: Use when performance should be consistent (product quality, yield prediction across fabs)
- **Multiple metrics**: Check ALL metrics (model can pass one, fail another)

**7. Bias Mitigation Strategies**
- **Pre-processing**: Reweight training data, resample to balance groups, remove biased features
- **In-processing**: Fairness-aware algorithms (constrained optimization), adversarial debiasing
- **Post-processing**: Adjust decision thresholds per group (equalize error rates)
- **Feature engineering**: Remove proxies (zip code ‚Üí race, name ‚Üí gender)
- **Diverse data**: Ensure training set represents all groups (no underrepresentation)

**Tradeoff**: Fairness vs accuracy (often 2-5% accuracy drop to achieve fairness) - business decision based on ethics/regulations

---

### üîç Audit Trails & Lineage

**8. What to Log**
- **Training events**: Data source, features, hyperparameters, metrics, training time, trained by
- **Validation events**: Validation results, gates passed/failed, validated by
- **Fairness checks**: Fairness metrics, pass/fail, checked by
- **Approval events**: Approved by, approval criteria, comments, timestamp
- **Deployment events**: Environment, deployed by, timestamp
- **Inference events**: (For high-stakes predictions) Input, output, confidence, model version
- **Incidents**: Type (performance degradation, fairness violation), description, severity, reported by

**9. Audit Trail Best Practices**
- **Immutable logs**: Append-only (cannot modify past events), cryptographic hashing for tamper-evidence
- **Structured format**: JSON for machine-readability, queryable database (SQL)
- **Retention**: Keep logs for regulatory period (7 years for SR 11-7, indefinitely for litigation risk)
- **Access control**: Only authorized users can view audit logs (privacy, security)
- **Automated alerts**: Flag suspicious events (unauthorized deployment, fairness violations)

**10. Model Lineage**
- **Data lineage**: Training data ‚Üí preprocessing ‚Üí features ‚Üí training dataset
- **Code lineage**: Git commit, dependencies (requirements.txt), environment (Docker image)
- **Model lineage**: Algorithm, hyperparameters, training run ID ‚Üí model artifact ‚Üí deployment
- **Complete lineage**: End-to-end trace from raw data to production predictions

**Value**: Enable reproducibility, debugging, compliance (prove model was trained on approved data)

---

### üîí Regulatory Compliance

**11. GDPR (EU General Data Protection Regulation)**
- **Article 22**: Right to explanation for automated decisions (must provide human-understandable explanation)
- **Article 13-14**: Transparency obligations (inform data subjects about model purpose, data used)
- **Article 25**: Data protection by design and default (privacy measures from start)
- **Penalties**: Up to ‚Ç¨20M or 4% global revenue (whichever higher)
- **Applicability**: Any organization processing EU citizens' data (even if not in EU)

**GDPR compliance checklist**:
- [ ] Explainability method implemented (SHAP, LIME)
- [ ] Data documentation (source, purpose, retention policy)
- [ ] Privacy measures (anonymization, encryption)
- [ ] Consent mechanism (if using personal data)
- [ ] Data subject rights (access, deletion, portability)

**12. EU AI Act (2024)**
- **Risk classification**: Unacceptable (banned), High-risk (strict requirements), Limited-risk (transparency), Minimal-risk (no requirements)
- **High-risk systems**: Employment, credit scoring, healthcare, critical infrastructure, law enforcement
- **Requirements for high-risk**: Technical documentation, audit trails, human oversight, accuracy/robustness testing, transparency
- **Penalties**: Up to ‚Ç¨30M or 6% global revenue
- **Timeline**: Phased implementation 2024-2027

**High-risk AI checklist**:
- [ ] Risk assessment documented
- [ ] Model card (technical documentation)
- [ ] Audit trail enabled
- [ ] Human oversight (human-in-the-loop)
- [ ] Accuracy and robustness testing
- [ ] Conformity assessment (third-party audit for some systems)

**13. SR 11-7 (US Federal Reserve - Banking Model Risk Management)**
- **Scope**: All models used by banks for business decisions (credit, risk, trading)
- **Requirements**: Model validation (independent review), documentation, ongoing monitoring, governance framework
- **Model validation**: Test model performance, assumptions, limitations (by independent team)
- **Ongoing monitoring**: Track production performance, retrain when needed, annual review
- **Applicability**: US banks, financial institutions (Dodd-Frank requirements)

**SR 11-7 checklist**:
- [ ] Independent model validation completed
- [ ] Comprehensive documentation (model card, assumptions, limitations)
- [ ] Ongoing monitoring enabled
- [ ] Model registered in inventory
- [ ] Governance policies (approval workflow, change management, incident response)

**14. Other Regulations**
- **CCPA (California)**: Privacy rights similar to GDPR (for California residents)
- **Equal Credit Opportunity Act (US)**: No discrimination in lending (protected characteristics)
- **Fair Lending Laws (US)**: Disparate impact analysis (80% rule)
- **HIPAA (US Healthcare)**: Patient data privacy and security
- **FDA (US Medical Devices)**: Premarket approval for medical AI (510(k) or PMA)
- **NYC Bias Audit Law**: Annual bias audit for AI hiring tools (NYC only, expanding)

---

### üè≠ Post-Silicon Validation Applications

**15. Yield Prediction Governance**
- **Challenge**: Model impacts $500M+ capacity decisions (high-stakes)
- **Requirements**: Model card, fairness across fabs (<3% accuracy variance), audit trail, approval workflow
- **Compliance**: Internal policies (quality management, change control), potentially ISO 9001
- **Value**: Enable responsible AI adoption, pass internal audits, maintain executive trust

**16. Binning Model Fairness**
- **Challenge**: Biased binning loses $50M revenue (systematic underpricing of specific device types)
- **Requirements**: Fairness metrics (demographic parity, equalized odds across device types), explainability (SHAP)
- **Mitigation**: Stratified sampling, fairness constraints, monthly fairness monitoring
- **Value**: Prevent revenue loss, enable engineer trust, satisfy customer requirements

**17. Test Time Prediction for SLAs**
- **Challenge**: Inaccurate predictions cause $1M+ SLA penalties
- **Requirements**: Model card with confidence intervals, uncertainty quantification, SLA risk alerting
- **Documentation**: Assumptions (test parallelization), limitations (new test programs), monitoring (MAPE)
- **Value**: Avoid SLA penalties, maintain customer relationships, enable confident commitments

**18. Safety-Critical Wafer Map Anomaly Detection**
- **Challenge**: False positive halts cost $500K/hour (production downtime)
- **Requirements**: Human oversight (fab manager approval), explainability (spatial patterns), high precision (85%+)
- **Compliance**: AI Act high-risk classification (safety-critical), ISO 9001 quality documentation
- **Value**: Prevent costly false halts, catch real defects early, satisfy regulatory requirements

---

### ‚ö†Ô∏è Common Pitfalls

**19. Documentation as Afterthought**
- **Problem**: Create model card after deployment (outdated, incomplete)
- **Solution**: Generate model card DURING development (update with each experiment)
- **Tooling**: Automate model card generation from MLflow tracking, experiment metadata

**20. Ignoring Fairness Until Production**
- **Problem**: Discover bias after deployment (costly to fix, reputational damage)
- **Solution**: Fairness evaluation in development (before deployment), continuous fairness monitoring
- **Example**: Amazon hiring AI discontinued after discovering gender bias (trained on historical male-dominated data)

**21. Audit Trails Without Context**
- **Problem**: Log events without WHY (hard to understand decisions during audit)
- **Solution**: Log context (trigger reason, approval rationale, incident root cause)
- **Example**: \"Model retrained\" (useless) vs \"Model retrained due to data drift in Vdd distribution (KS test p=0.02)\" (useful)

**22. Compliance Checkbox Exercise**
- **Problem**: Treat governance as bureaucracy (check boxes without understanding)
- **Solution**: Understand WHY regulations exist (protect users, ensure fairness), embed in culture
- **Mindset**: Governance enables responsible AI (not just compliance burden)

**23. One-Size-Fits-All Governance**
- **Problem**: Same governance for low-risk recommendation (movie suggestions) and high-risk lending (credit approval)
- **Solution**: Risk-based governance (more rigor for high-risk systems)
- **Tiers**: Minimal (internal analytics), Standard (customer-facing), High (safety/financial/legal critical)

---

### ‚úÖ Best Practices

**24. Risk-Based Governance**
- **Risk classification**: Categorize models by impact (low/medium/high/critical)
- **Governance intensity**: More requirements for higher risk (high-risk needs audit trail, fairness, approval)
- **Examples**:
  - Low risk: Internal analytics dashboard (minimal governance)
  - Medium risk: Yield prediction (model card, fairness monitoring)
  - High risk: Credit scoring (full governance, compliance checks, audit trail)
  - Critical risk: Medical diagnosis (maximum governance, regulatory approval, human oversight)

**25. Governance-by-Design**
- **Integrate early**: Build governance into ML workflow (not bolted on later)
- **Automated checks**: Fairness tests in CI/CD, compliance validation in deployment pipeline
- **Templates**: Standardize model cards, fairness reports, compliance checklists (reduce burden)

**26. Stakeholder Engagement**
- **Cross-functional teams**: Data scientists, legal, compliance, business, ethics
- **Regular reviews**: Quarterly governance reviews (fairness, compliance, incidents)
- **Transparency**: Share model cards with stakeholders (build trust)

**27. Continuous Monitoring**
- **Fairness drift**: Fairness metrics can degrade over time (monitor quarterly)
- **Compliance changes**: Regulations evolve (AI Act, state laws) - stay updated
- **Incident tracking**: Log all governance incidents (bias complaints, compliance violations)

**28. Documentation Standards**
- **Version control**: Model cards versioned with models (Git, model registry)
- **Accessibility**: Store in central location (model registry, wiki, compliance database)
- **Formats**: Multiple formats (JSON for automation, Markdown for humans, HTML for web)
- **Reviews**: Peer review model cards (catch gaps, ensure accuracy)

---

### üöÄ Production Checklist

**Before deploying to production**:
- [ ] **Model card created** (all sections complete, reviewed by stakeholders)
- [ ] **Fairness evaluated** (all relevant metrics, no violations)
- [ ] **Bias mitigation applied** (if bias detected, document mitigation strategy)
- [ ] **Audit trail enabled** (logging training, deployment, inference events)
- [ ] **Compliance checked** (GDPR, AI Act, SR 11-7, or applicable regulations)
- [ ] **Explainability implemented** (SHAP, LIME, or appropriate method)
- [ ] **Approval obtained** (governance review, stakeholder sign-off)
- [ ] **Monitoring configured** (fairness monitoring, performance tracking)
- [ ] **Incident response plan** (know what to do if bias/compliance violation detected)
- [ ] **Documentation published** (model card accessible to stakeholders)
- [ ] **Risk assessment documented** (classify risk level, justify governance rigor)
- [ ] **Legal review** (if high-risk, legal team confirms compliance)

---

### üéØ When to Prioritize Governance

**‚úÖ Prioritize governance when**:
- High-risk systems (healthcare, finance, hiring, safety-critical)
- Customer-facing models (impact user experience, trust)
- Regulated industries (banking, healthcare, public sector)
- Protected characteristics involved (race, gender, age - fairness critical)
- High-stakes decisions ($millions impact, legal consequences)
- Multi-year deployment (governance ensures long-term trust)

**‚ùå Lower priority when**:
- Internal analytics (no user impact, low risk)
- Experimental models (prototypes, not production)
- Low-stakes decisions (movie recommendations, UI optimization)

**Scaling governance**: Start simple (model card for all models), add rigor for high-risk (fairness, compliance, audit trails)

---

### üìö Next Steps

**After mastering model governance**:
1. **Shadow Mode Deployment (Notebook 128)**: Safe deployment strategies before full production
2. **CI/CD for ML (Notebook 129)**: Automate governance checks in deployment pipeline
3. **Advanced MLOps (Notebook 130)**: Multi-model systems, AutoML, governance at scale

**Recommended resources**:
- Book: \"Fairness and Machine Learning\" (Barocas et al.) - Comprehensive fairness guide
- Paper: \"Model Cards for Model Reporting\" (Mitchell et al., Google) - Original model card specification
- Course: \"AI Ethics\" (fast.ai) - Practical AI ethics and fairness
- Tool: Fairlearn (Microsoft), AI Fairness 360 (IBM) - Fairness libraries
- Website: EU AI Act compliance guide, GDPR documentation

---

**üéØ Remember**: Governance is not optional for production ML in 2024+. Regulations (GDPR, AI Act) mandate documentation, fairness, and transparency. Start governance early, automate where possible, and treat it as enabler (not burden) for responsible AI adoption. The cost of governance ($thousands) is far less than the cost of non-compliance ($millions in fines + reputational damage).

## üîë Key Takeaways

**When to Use Model Governance:**
- Regulated industries (finance, healthcare, government)
- High-stakes decisions (hiring, lending, medical diagnosis)
- Multiple teams deploying models
- Audit and compliance requirements

**Limitations:**
- Adds overhead to deployment process
- Requires dedicated governance team/tools
- Documentation burden on data scientists
- May slow down experimentation

**Alternatives:**
- Lightweight documentation (README + model cards)
- Peer review process without formal governance
- Third-party compliance platforms (Fiddler, Arthur)
- Manual audits (periodic vs continuous)

**Best Practices:**
- Automate documentation generation where possible
- Integrate governance into CI/CD pipelines
- Maintain model registry with lineage tracking
- Conduct regular governance audits
- Train teams on compliance requirements

**Next Steps:**
- 155: Model Explainability (generate audit reports)
- 176: Fairness & Bias in ML (compliance validation)
- 154: Model Monitoring (post-deployment compliance)