# AI Model Governance Toolkit - Explainability Demo

This notebook demonstrates the usage of the Explainability Engine for a credit scoring model. We'll:
1. Load and prepare sample credit data
2. Train a simple credit scoring model
3. Use the ModelExplainer to generate global and local explanations

In [None]:
import sys
sys.path.append('..')

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

from explainability.shap_explainer import ModelExplainer

## 1. Load and Prepare Sample Credit Data

For this demo, we'll use a synthetic credit scoring dataset. In a real scenario, you would use your actual credit data.

In [None]:
# Generate synthetic credit data
np.random.seed(42)
n_samples = 1000

# Generate features
data = {
    'age': np.random.normal(35, 10, n_samples),
    'income': np.random.normal(50000, 20000, n_samples),
    'employment_length': np.random.normal(5, 3, n_samples),
    'debt_to_income': np.random.normal(0.3, 0.1, n_samples),
    'credit_score': np.random.normal(700, 50, n_samples),
    'payment_history': np.random.normal(0.95, 0.05, n_samples),
    'loan_amount': np.random.normal(10000, 5000, n_samples)
}

df = pd.DataFrame(data)

# Generate target (loan approval) based on features
prob = 1 / (1 + np.exp(-(
    0.1 * df['credit_score'] +
    0.05 * df['income'] -
    0.2 * df['debt_to_income'] -
    0.1 * df['payment_history']
)))
df['approved'] = (prob > 0.5).astype(int)

# Split data
X = df.drop('approved', axis=1)
y = df['approved']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Dataset shape:", df.shape)
print("\nSample of the data:")
display(df.head())
print("\nClass distribution:")
display(df['approved'].value_counts(normalize=True))

## 2. Train Credit Scoring Model

We'll use a Random Forest classifier as our credit scoring model.

In [None]:
# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)

# Evaluate model
train_score = model.score(X_train_scaled, y_train)
test_score = model.score(X_test_scaled, y_test)

print(f"Training accuracy: {train_score:.3f}")
print(f"Test accuracy: {test_score:.3f}")

## 3. Model Explainability

Now we'll use our ModelExplainer to understand how the model makes decisions.

In [None]:
# Initialize explainer
explainer = ModelExplainer(
    model=model,
    background_data=X_train_scaled,
    feature_names=X_train.columns.tolist()
)

# Get global feature importance
importance = explainer.get_feature_importance(n_samples=100, plot=True)

### Local Explanations

Let's examine how the model made decisions for specific applicants.

In [None]:
# Explain a few predictions
for i in range(3):
    print(f"\nApplicant {i+1}:")
    sample = X_test_scaled[i:i+1]
    
    # Get prediction
    pred = model.predict_proba(sample)[0]
    print(f"Prediction probabilities: [Reject: {pred[0]:.3f}, Approve: {pred[1]:.3f}]")
    
    # Get local explanation
    contributions = explainer.explain_prediction(sample, plot=True)
    
    # Display feature values
    print("\nFeature values:")
    for feature, value in zip(X_test.columns, X_test.iloc[i]):
        print(f"{feature}: {value:.2f}")

## 4. Regulatory Compliance Report

The explainability engine helps ensure regulatory compliance by providing:
1. Global feature importance analysis
2. Local explanations for individual decisions
3. Transparency in model decision-making

This information can be used to:
- Document model behavior for regulatory requirements
- Identify potential bias in the model
- Provide explanations to customers when requested
- Monitor model stability over time