# Classifier Model Interpreter Demo

A simple tool for interpreting classifier models using SHAP values.

## Core Visualizations:
- **Model Performance**: Accuracy, AUC, confusion matrix
- **Global Feature Importance**: Which features matter most
- **Beeswarm Plot**: SHAP value distributions for each feature
- **Dependence Plots**: How feature values affect predictions
- **Prediction Surface**: 2D/3D views of model predictions

## Setup

In [None]:
import sys
from pathlib import Path

# Add src to path
parent_dir = Path.cwd().parent
if str(parent_dir) not in sys.path:
    sys.path.insert(0, str(parent_dir))

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from src.core import Interpreter
import warnings
warnings.filterwarnings('ignore')

print("Setup complete")

## Load Data and Train Model

In [None]:
# Load credit data
data_path = Path.cwd().parent / 'data' / 'test_credit_data.csv'
df = pd.read_csv(data_path)

print(f"Dataset: {df.shape[0]:,} rows, {df.shape[1]} columns")
print(f"Target rate: {df['net_booking'].mean():.1%}")
print(f"\nFeatures: {list(df.columns)}")
df.head()

In [None]:
# Prepare data
X = df.drop(['application_date', 'net_booking'], axis=1).copy()
y = df['net_booking'].values

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Train model
model = XGBClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    random_state=42,
    eval_metric='logloss'
)
model.fit(X_train, y_train)

print(f"Test accuracy: {model.score(X_test, y_test):.3f}")

## Initialize Interpreter

Just pass your trained model and test data:

In [None]:
# Create interpreter - SHAP values computed automatically
interp = Interpreter(model, X_test, y_test)

print("Interpreter ready!")
print(f"Features: {interp.feature_names}")

---
## 1. Model Performance

First, let's validate model performance before interpreting:

In [None]:
performance = interp.plot_performance()
performance['metrics_summary'].show()

In [None]:
performance['confusion_matrix'].show()

In [None]:
performance['roc_curve'].show()

---
## 2. Global Feature Importance

Which features matter most for predictions?

In [None]:
fig = interp.plot_global_importance()
fig.show()

---
## 3. Beeswarm Plot

Distribution of SHAP values for each feature:
- Each dot = one sample
- X-axis = SHAP value (impact on prediction)
- Color = feature value (red=high, blue=low)

In [None]:
fig = interp.plot_beeswarm()
fig.show()

---
## 4. Dependence Plots

How does each feature value affect predictions?

In [None]:
# FICO Score
fig = interp.plot_dependence('FICO')
fig.show()

In [None]:
# DTI (Debt-to-Income)
fig = interp.plot_dependence('DTI')
fig.show()

In [None]:
# LTV (Loan-to-Value)
fig = interp.plot_dependence('LTV')
fig.show()

In [None]:
# Financing Amount
fig = interp.plot_dependence('Fin_amt')
fig.show()

---
## 5. Prediction Surface (2D Heatmap)

How do two features together affect predicted probability?

In [None]:
# FICO vs DTI
fig = interp.plot_prediction_surface('FICO', 'DTI', n_grid=30)
fig.show()

In [None]:
# FICO vs LTV
fig = interp.plot_prediction_surface('FICO', 'LTV', n_grid=30)
fig.show()

---
## 6. Prediction Surface (3D)

Interactive 3D view of the prediction surface:

In [None]:
# FICO vs DTI (3D)
fig = interp.plot_prediction_surface_3d('FICO', 'DTI', n_grid=25)
fig.show()

In [None]:
# FICO vs LTV (3D)
fig = interp.plot_prediction_surface_3d('FICO', 'LTV', n_grid=25)
fig.show()

---
## Summary

In [None]:
# Get interpretation summary
summary = interp.summary()

print(f"Model: {summary['model_type']}")
print(f"Samples: {summary['n_samples']:,}")
print(f"Features: {summary['n_features']}")
print(f"\nTop Features:")
for f in summary['top_features'][:5]:
    print(f"  {f['feature']}: {f['importance']:.4f}")
print(f"\nPerformance:")
print(f"  Accuracy: {summary['performance']['accuracy']:.3f}")
print(f"  AUC: {summary['performance']['auc']:.3f}")

---
## Quick Reference

```python
from src.core import Interpreter

# Initialize
interp = Interpreter(model, X_test, y_test)

# Core visualizations
interp.plot_performance()                    # Model metrics (do this first!)
interp.plot_global_importance()              # Feature importance
interp.plot_beeswarm()                       # SHAP distributions
interp.plot_dependence('feature')            # Single feature effect
interp.plot_prediction_surface('f1', 'f2')   # 2D heatmap
interp.plot_prediction_surface_3d('f1','f2') # 3D surface
interp.summary()                             # Summary dict
```