# German Credit â€” XGBoost SHAP Computation

This notebook trains a XGBoost classifier on the **German Credit** dataset and computes SHAP values and SHAP interaction values for model explainability. The results are saved to disk for downstream visualization.

In [1]:
import os
import shap
import xgboost
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml

  from .autonotebook import tqdm as notebook_tqdm


## Load the dataset

Fetch the German Credit dataset from OpenML. The target variable is binarized as `1` (good credit) vs `0` (bad credit). Categorical features are one-hot encoded. The feature matrix is persisted as a pickle for reuse in visualization notebooks.

In [2]:
credit = fetch_openml(name="credit-g", version=1, as_frame=True)
X = credit.data
y = (credit.target == "good").astype(int)
X = pd.get_dummies(X, drop_first=True).astype(float)

# Save original feature names
os.makedirs("../../data/credit/xgboost", exist_ok=True)
X.to_pickle("../../data/credit/x_values.pkl")
y.to_pickle("../../data/credit/y_values.pkl")

# Clean feature names for XGBoost/SHAP (remove problematic characters)
X.columns = X.columns.str.replace(r'[<>\[\]]', '', regex=True)

## Train/test split

Split the data into 80% training and 20% test sets and wrap them in `xgboost.DMatrix` objects required by the XGBoost API.

In [3]:
xgb_full = xgboost.DMatrix(X, label=y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)
xgb_train = xgboost.DMatrix(X_train, label=y_train)
xgb_test = xgboost.DMatrix(X_test, label=y_test)

## Train the XGBoost survival model

Train an XGBoost model with the Cox proportional-hazards objective (`survival:cox`) on the **full** dataset for 5 000 boosting rounds with a low learning rate (0.002) and 50% row subsampling.

In [4]:
params = {"eta": 0.002, "max_depth": 3, "objective": "survival:cox", "subsample": 0.5}
model = xgboost.train(params, xgb_full, 5000, evals=[(xgb_full, "test")], verbose_eval=1000)

[0]	test-cox-nloglik:6.55108
[1000]	test-cox-nloglik:6.56052
[2000]	test-cox-nloglik:6.57230
[3000]	test-cox-nloglik:6.58401
[4000]	test-cox-nloglik:6.59267
[4999]	test-cox-nloglik:6.60038


## Compute SHAP values

Use `shap.TreeExplainer` to compute SHAP values for the first 500 samples. For a classifier the explainer returns per-class values; we extract and save only the **positive class** (good credit) values.

In [5]:
num_samples = 500
X_shapley = X.iloc[:num_samples, :]
explainer = shap.TreeExplainer(model)

In [10]:
shap_values = explainer.shap_values(X_shapley)
np.save("../../data/credit/xgboost/shap_values.npy", shap_values)

## Compute SHAP interaction values

Compute pairwise SHAP interaction values for the same 500 samples. These capture feature-pair synergies and redundancies and are saved for network-based visualization.

In [11]:
shap_interaction_values = explainer.shap_interaction_values(X_shapley)
np.save("../../data/credit/xgboost/shap_interaction_values.npy", shap_interaction_values)