# Credit Data Visualization Demo: Classifier Model Interpreter

This notebook demonstrates all visualization capabilities using **credit/lending data**:
- **Target Variable**: `net_booking` (loan booking/conversion)
- **Features**: FICO score, LTV, DTI, Financing Amount

## Package Features Demonstrated:

1. **Interactive Plotly visualizations** instead of static matplotlib
2. **Prediction surface visualizations** (contour & 3D)
3. **Threshold detection** - finds non-linear breakpoints automatically
4. **Segment discovery** - finds behavioral groups traditional analytics misses
5. **Local explanation plots** - waterfall charts for individual predictions
6. **Simple, integrated API** - one Interpreter class for everything

## Setup

In [None]:
import sys
from pathlib import Path

parent_dir = Path.cwd().parent
if str(parent_dir) not in sys.path:
    sys.path.insert(0, str(parent_dir))

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from src.core import Interpreter
import warnings
warnings.filterwarnings('ignore')

print("Setup complete")

In [None]:
# Load the credit data
data_path = Path.cwd().parent / 'data' / 'test_credit_data.csv'
df = pd.read_csv(data_path)

print(f"Dataset: {df.shape[0]:,} rows, {df.shape[1]} columns")
print(f"Booking rate: {df['net_booking'].mean():.1%}")
print(f"\nFeatures: {list(df.columns)}")
print(f"\nData sample:")
df.head()

In [None]:
# Prepare data - drop non-feature columns
X = df.drop(['application_date', 'net_booking'], axis=1).copy()
y = df['net_booking'].values

print("Features used for modeling:")
print(f"  - FICO: Credit score ({X['FICO'].min():.0f} - {X['FICO'].max():.0f})")
print(f"  - LTV: Loan-to-value ratio ({X['LTV'].min():.1f}% - {X['LTV'].max():.1f}%)")
print(f"  - DTI: Debt-to-income ratio ({X['DTI'].min():.1f}% - {X['DTI'].max():.1f}%)")
print(f"  - Fin_amt: Financing amount (${X['Fin_amt'].min():,.0f} - ${X['Fin_amt'].max():,.0f})")

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"\nTrain size: {len(X_train):,}")
print(f"Test size: {len(X_test):,}")

In [None]:
# Train model
model = XGBClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    random_state=42,
    eval_metric='logloss'
)
model.fit(X_train, y_train)

print(f"Test accuracy: {model.score(X_test, y_test):.3f}")

In [None]:
# Initialize Interpreter
interp = Interpreter(model, X_test, y_test, config='detailed_analysis')

print("Interpreter ready")

---
## 1. Global Feature Importance

**What it shows:** Which features matter most for predicting loan bookings

In [None]:
fig = interp.plot_global_importance(top_n=10)
fig.show()

---
## 2. Beeswarm Plot

**What it shows:** Distribution of SHAP values for each feature

- Each dot = one application
- Color = feature value (red=high, blue=low)
- X position = impact on booking probability

In [None]:
fig = interp.plot_beeswarm(top_n=4)
fig.show()

---
## 3. Feature Dependence Plots

**What it shows:** How each feature value affects booking probability

Key questions answered:
- At what FICO score does booking probability increase?
- How does LTV affect approval?
- Is there a DTI threshold?

In [None]:
# FICO Score dependence
fig = interp.plot_dependence('FICO')
fig.show()

print("\nInterpretation: Higher FICO scores should increase booking probability")

In [None]:
# LTV dependence
fig = interp.plot_dependence('LTV')
fig.show()

print("\nInterpretation: How does loan-to-value ratio affect booking?")

In [None]:
# DTI dependence
fig = interp.plot_dependence('DTI')
fig.show()

print("\nInterpretation: Higher DTI typically reduces booking probability")

In [None]:
# Financing amount dependence
fig = interp.plot_dependence('Fin_amt')
fig.show()

print("\nInterpretation: How does loan size affect booking probability?")

---
# Threshold Detection

## 4. Automatic Threshold Detection

**What it does:** Finds values where feature effects CHANGE significantly

**Why it matters for credit:** 
- Find the FICO score where approval rates jump
- Identify DTI cutoffs where risk increases
- Discover LTV thresholds

**NATIVE SHAP DOESN'T HAVE THIS!**

In [None]:
# Detect thresholds for FICO
threshold_result = interp.detect_thresholds('FICO')

print("="*70)
print("THRESHOLD DETECTION: FICO Score")
print("="*70)
print(f"\nMethod: {threshold_result['method']}")
print(f"Samples analyzed: {threshold_result['n_samples']:,}")

print("\nDetected Thresholds:")
print("-"*70)
for t in threshold_result['thresholds']:
    print(f"\n  Threshold at FICO: {t['value']:.0f}")
    print(f"    SHAP before: {t['shap_before']:.4f}")
    print(f"    SHAP after:  {t['shap_after']:.4f}")
    print(f"    Effect change: {t['effect_change']:+.4f}")
    print(f"    Confidence: {t['confidence']:.1%}")
    print(f"    -> {t['interpretation']}")

In [None]:
# Detect thresholds for DTI
threshold_result = interp.detect_thresholds('DTI')

print("="*70)
print("THRESHOLD DETECTION: DTI (Debt-to-Income)")
print("="*70)

print("\nDetected Thresholds:")
for t in threshold_result['thresholds']:
    print(f"\n  Threshold at DTI: {t['value']:.1f}%")
    print(f"    Effect change: {t['effect_change']:+.4f}")
    print(f"    -> {t['interpretation']}")

In [None]:
# Detect thresholds for LTV
threshold_result = interp.detect_thresholds('LTV')

print("="*70)
print("THRESHOLD DETECTION: LTV (Loan-to-Value)")
print("="*70)

print("\nDetected Thresholds:")
for t in threshold_result['thresholds']:
    print(f"\n  Threshold at LTV: {t['value']:.1f}%")
    print(f"    Effect change: {t['effect_change']:+.4f}")
    print(f"    -> {t['interpretation']}")

In [None]:
# Detect thresholds across ALL features
all_thresholds = interp.detect_all_thresholds(top_n_features=4)

print("ALL DETECTED THRESHOLDS")
print("="*70)
all_thresholds[['feature', 'value', 'effect_change', 'interpretation']]

---
# Segment Discovery

## 5. Auto Segment Discovery

**What it does:** Finds groups of applicants where the MODEL REASONS DIFFERENTLY

**Why it's different from traditional segmentation:**
- Traditional: Groups by demographics (FICO bands, loan size tiers)
- This: Groups by SHAP patterns (how model explains predictions)

Reveals behavioral segments like:
- "FICO Driven" - predictions mainly driven by credit score
- "DTI Sensitive" - predictions driven by debt ratios
- "High LTV Risk" - predictions driven by loan-to-value

**NATIVE SHAP DOESN'T HAVE THIS!**

In [None]:
# Discover behavioral segments
segments = interp.discover_segments(n_segments=4, top_n_features=4)

# Print summary
print(interp.get_segment_summary(segments))

In [None]:
# Visualize segment profiles
fig = interp.plot_segment_profiles(segments, top_n_features=4)
fig.show()

In [None]:
# Compare feature importance across segments
fig = interp.plot_segment_comparison(segments, top_n=4)
fig.show()

---
# Local Explanations

## 6. Waterfall & Force Plots

**What it shows:** How a SINGLE prediction was made

- Base value = average model output
- Each feature pushes prediction up or down
- Final prediction = sum of all contributions

Essential for:
- Explaining individual loan decisions
- Understanding edge cases
- Regulatory compliance (explainability)

In [None]:
# Find a high-probability booking and a low-probability one
# IMPORTANT: Use X_shap (the sampled data used for SHAP) for local explanations
X_shap = interp.X_shap
y_proba_shap = model.predict_proba(X_shap)[:, 1]

high_prob_idx = np.argmax(y_proba_shap)  # Highest booking probability in sampled data
low_prob_idx = np.argmin(y_proba_shap)   # Lowest booking probability in sampled data
mid_idx = np.argsort(y_proba_shap)[len(y_proba_shap)//2]  # Middle

print(f"High probability application: idx={high_prob_idx}, prob={y_proba_shap[high_prob_idx]:.3f}")
print(f"Low probability application: idx={low_prob_idx}, prob={y_proba_shap[low_prob_idx]:.3f}")
print(f"Mid probability application: idx={mid_idx}, prob={y_proba_shap[mid_idx]:.3f}")

# Show feature values for high prob case
print(f"\nHigh probability case features:")
print(X_shap.iloc[high_prob_idx])

In [None]:
# Waterfall plot for HIGH probability prediction
fig = interp.plot_waterfall(high_prob_idx, top_n=4)
fig.show()

print("\nThis application has HIGH booking probability because:")
print(interp.explain_observation_text(high_prob_idx, top_n=4))

In [None]:
# Waterfall plot for LOW probability prediction
fig = interp.plot_waterfall(low_prob_idx, top_n=4)
fig.show()

print("\nThis application has LOW booking probability because:")
print(interp.explain_observation_text(low_prob_idx, top_n=4))

In [None]:
# Force plot (alternative horizontal view)
fig = interp.plot_force(high_prob_idx, top_n=4)
fig.show()

In [None]:
# Compare multiple applications side by side
fig = interp.plot_multiple_observations([high_prob_idx, mid_idx, low_prob_idx], top_n=4)
fig.show()

---
## 7. Interaction Detection

**What it shows:** Which feature pairs interact most

For credit data:
- Does FICO + DTI interact? (credit score matters differently at different debt levels)
- Does LTV + Fin_amt interact? (loan size matters differently at different LTV)

In [None]:
interactions = interp.detect_interactions(top_n=10, method='shap_variance')
print("Top Feature Interactions:")
interactions

---
## 8. Prediction Surface Visualizations

**What it shows:** Predicted booking probability across two features

**Unique to this package** - shows actual predictions (Y), not SHAP values

Business questions:
- How do FICO and DTI together affect booking?
- What combinations lead to highest approval?

In [None]:
# Heatmap: FICO × DTI
fig = interp.plot_interaction_contour('FICO', 'DTI', n_grid=30)
fig.show()

print("\nBusiness insight: How does the combination of credit score and debt ratio affect booking?")

In [None]:
# Heatmap: FICO × LTV
fig = interp.plot_interaction_contour('FICO', 'LTV', n_grid=30)
fig.show()

print("\nBusiness insight: How do credit score and loan-to-value interact?")

In [None]:
# Heatmap: LTV × Fin_amt
fig = interp.plot_interaction_contour('LTV', 'Fin_amt', n_grid=30)
fig.show()

print("\nBusiness insight: How do loan-to-value and financing amount interact?")

In [None]:
# 3D surface plot: FICO × DTI
fig = interp.plot_interaction_surface_3d('FICO', 'DTI', n_grid=25)
fig.show()

In [None]:
# 3D surface plot: FICO × LTV
fig = interp.plot_interaction_surface_3d('FICO', 'LTV', n_grid=25)
fig.show()

---
## 9. Model Performance

**Includes:** Confusion matrix, ROC curve, metrics summary

In [None]:
performance = interp.plot_performance()

performance['metrics_summary'].show()

In [None]:
performance['confusion_matrix'].show()

In [None]:
performance['roc_curve'].show()

---
# Summary: Credit Model Interpretation

## Key Insights from This Analysis

### Feature Importance
- Identified which credit factors (FICO, DTI, LTV, Fin_amt) drive booking decisions
- Quantified the relative importance of each factor

### Threshold Discovery
- Found specific FICO score breakpoints where approval probability changes
- Identified DTI and LTV cutoffs that matter

### Segment Discovery
- Found groups of applicants where the model reasons differently
- Revealed hidden patterns in how credit factors combine

### Local Explanations
- Can explain individual loan decisions
- Useful for regulatory compliance and customer communication

### Interaction Effects
- Visualized how feature combinations affect booking probability
- Identified non-linear relationships

## Why This Package Beats Native SHAP

- **Interactive Plotly** vs static matplotlib
- **Business-focused** - shows predictions, not just SHAP
- **Discovers insights** - thresholds, segments, interactions
- **Simple API** - one Interpreter class
- **Great for presentations** - interactive, professional visuals