# CLEAR Demo: End-to-End Model Card Generation

This notebook demonstrates the complete workflow:
1. Train a simple classifier
2. Compute evaluation metrics
3. Generate a model card
4. Generate a risk report

In [None]:
import numpy as np
from datetime import date
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier

from modelcardgen.core.trainer import ClassifierTrainer
from modelcardgen.core.models import (
    ModelMetadata,
    DatasetMetadata,
    ModelLimitations,
    UseCaseConstraints,
    RiskAssessment,
)
from modelcardgen.reports.markdown import MarkdownCardGenerator
from modelcardgen.reports.risk import RiskReportGenerator

## Step 1: Generate Sample Data

Create a synthetic binary classification dataset for demonstration.

In [None]:
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=15,
    n_redundant=5,
    random_state=42
)

print(f"Dataset shape: {X.shape}")
print(f"Class distribution: {np.bincount(y)}")

## Step 2: Train and Evaluate Classifier

Use the ClassifierTrainer to train a Random Forest and compute metrics automatically.

In [None]:
model = RandomForestClassifier(n_estimators=50, random_state=42, max_depth=10)

trainer = ClassifierTrainer(model, test_size=0.2, random_state=42)
metrics = trainer.train_and_evaluate(X, y)

print(f"Accuracy: {metrics.accuracy:.3f}")
print(f"Precision: {metrics.precision:.3f}")
print(f"Recall: {metrics.recall:.3f}")
print(f"F1-Score: {metrics.f1_score:.3f}")

## Step 3: Define Model Metadata and Documentation

Create descriptive metadata about the model, datasets, limitations, and use cases.

In [None]:
metadata = ModelMetadata(
    name="Binary Classification Model",
    version="1.0.0",
    description="A Random Forest classifier for binary classification tasks. Trained on synthetic data to demonstrate the CLEAR reporting framework.",
    owner="Data Science Team",
    license="Apache-2.0",
    framework="scikit-learn"
)

training_data = DatasetMetadata(
    name="Synthetic Training Set",
    description="Synthetically generated binary classification dataset with 20 features and balanced classes.",
    size=800,
    features=[f"feature_{i}" for i in range(20)],
    target="binary_label"
)

eval_data = DatasetMetadata(
    name="Synthetic Test Set",
    description="Holdout test set from the same distribution as training data.",
    size=200,
    features=[f"feature_{i}" for i in range(20)],
    target="binary_label"
)

In [None]:
limitations = ModelLimitations(
    unsuitable_inputs=[
        "Missing or null feature values",
        "Features outside the training range",
        "Non-numeric input types"
    ],
    environmental_constraints="Requires Python 3.10+ and scikit-learn>=1.0. Inference latency: <10ms per sample.",
    out_of_scope_uses=[
        "Real-time decision systems without human review",
        "Safety-critical applications (medical, aviation)"
    ]
)

constraints = UseCaseConstraints(
    intended_users=[
        "Data scientists evaluating model performance",
        "ML engineers deploying the model",
        "Stakeholders reviewing model capabilities"
    ],
    intended_use_cases=[
        "Demonstrating CLEAR reporting framework",
        "Educational examples",
        "Model card generation workflows"
    ],
    prohibited_uses=[
        "Production deployment without additional validation",
        "Use with real-world data without retraining"
    ]
)

## Step 4: Define Risk Assessments

Identify and document potential risks associated with the model.

In [None]:
risks = [
    RiskAssessment(
        risk_type="Dataset Shift",
        description="Model was trained on synthetic data with balanced classes. Real-world data may have different distributions.",
        mitigation_strategy="Monitor prediction distributions in production. Retrain quarterly with new data.",
        severity="Medium"
    ),
    RiskAssessment(
        risk_type="Limited Generalization",
        description="Only 20 features used. May not capture domain-specific relationships important for decision-making.",
        mitigation_strategy="Conduct feature importance analysis. Engage domain experts to validate feature selection.",
        severity="Medium"
    ),
    RiskAssessment(
        risk_type="No Uncertainty Quantification",
        description="Model provides binary predictions without confidence scores or uncertainty estimates.",
        mitigation_strategy="Implement prediction confidence thresholding. Flag low-confidence predictions for human review.",
        severity="Low"
    )
]

## Step 5: Generate Model Card

Create a comprehensive model card in Markdown format.

In [None]:
card_generator = MarkdownCardGenerator()

card_path = card_generator.generate(
    metadata=metadata,
    training_data=training_data,
    eval_data=eval_data,
    metrics=metrics,
    limitations=limitations,
    constraints=constraints,
    risks=risks,
    output_path="DEMO_MODEL_CARD.md"
)

print(f"Model card generated: {card_path}")

## Step 6: Generate Risk Report

Create a detailed risk assessment report.

In [None]:
risk_generator = RiskReportGenerator()

risk_path = risk_generator.generate(
    model_name="Binary Classification Model",
    model_version="1.0.0",
    risks=risks,
    metrics=metrics,
    output_path="DEMO_RISK_REPORT.md"
)

print(f"Risk report generated: {risk_path}")

## Step 7: Preview Generated Files

Display the first section of each generated report.

In [None]:
with open("DEMO_MODEL_CARD.md", "r") as f:
    model_card_content = f.read()

print("=== MODEL CARD (First 500 characters) ===")
print(model_card_content[:500])
print("\n...")

In [None]:
with open("DEMO_RISK_REPORT.md", "r") as f:
    risk_report_content = f.read()

print("=== RISK REPORT (First 500 characters) ===")
print(risk_report_content[:500])
print("\n...")

## Summary

This notebook demonstrated the complete CLEAR workflow:

1. **Trained a classifier** using the ClassifierTrainer wrapper
2. **Computed evaluation metrics** automatically from sklearn outputs
3. **Defined model metadata** and documented intended use, limitations, and risks
4. **Generated a model card** in Markdown format
5. **Generated a risk report** for governance and compliance

All the heavy lifting happens in the package modules. The notebook stays focused on the workflow, not implementation details.