# NHANES I â€” XGBoost SHAP Computation

This notebook trains an XGBoost survival model on the **NHANES I** dataset and computes SHAP values and SHAP interaction values for model explainability. The results are saved to disk for downstream visualization.

In [1]:
import shap
import xgboost
import numpy as np
from sklearn.model_selection import train_test_split

  from .autonotebook import tqdm as notebook_tqdm


## Load the dataset

Load the NHANES I survival dataset from the `shap` library and persist the feature matrix as a pickle for reuse in visualization notebooks.

In [None]:
X, y = shap.datasets.nhanesi()
X.to_pickle("../../nhanesi/data/x_values.pkl")

## Train/test split

Split the data into 80% training and 20% test sets and wrap them in `xgboost.DMatrix` objects required by the XGBoost API.

In [9]:
xgb_full = xgboost.DMatrix(X, label=y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)
xgb_train = xgboost.DMatrix(X_train, label=y_train)
xgb_test = xgboost.DMatrix(X_test, label=y_test)

## Train the XGBoost survival model

Train an XGBoost model with the Cox proportional-hazards objective (`survival:cox`) on the **full** dataset for 5 000 boosting rounds with a low learning rate (0.002) and 50% row subsampling.

In [11]:
params = {"eta": 0.002, "max_depth": 3, "objective": "survival:cox", "subsample": 0.5}
model = xgboost.train(params, xgb_full, 5000, evals=[(xgb_full, "test")], verbose_eval=1000)

[0]	test-cox-nloglik:9.28400


[1000]	test-cox-nloglik:8.60868
[2000]	test-cox-nloglik:8.53110
[3000]	test-cox-nloglik:8.49458
[4000]	test-cox-nloglik:8.47055
[4999]	test-cox-nloglik:8.45201


## Compute SHAP values

Use `shap.TreeExplainer` to compute SHAP values for the first 500 patients and save them to disk.

In [20]:
num_patients = 500
X_shapley = X.iloc[:num_patients, :]
explainer = shap.TreeExplainer(model)

In [None]:
shap_values = explainer.shap_values(X_shapley)
np.save("../../data/nhanesi/xgboost/shap_values.npy", shap_values)

## Compute SHAP interaction values

Compute pairwise SHAP interaction values for the same 500 patients. These capture feature-pair synergies and redundancies and are saved for network-based visualization.

In [None]:
shap_interaction_values = explainer.shap_interaction_values(X_shapley)
np.save("../../data/nhanesi/xgboost/shap_interaction_values.npy", shap_interaction_values)