# Jane Street: GPU accelerated SHAP values
Less than a month ago version 1.3 of [XGBoost](https://xgboost.ai/) was released and with it bringing GPU acceleration to the calculation of SHAP values, reducing computation time by up to 20x for SHAP values and 340x for SHAP interaction values. 
For more details see the blog post ["*GPU-Accelerated SHAP values with XGBoost 1.3 and RAPIDS*"](https://medium.com/rapids-ai/gpu-accelerated-shap-values-with-xgboost-1-3-and-rapids-587fad6822) by Rory Mitchell. This was made possible with [GPUTreeSHAP](https://github.com/rapidsai/gputreeshap) by the [RAPIDS](https://rapids.ai/) team.

The intention of this notebook is not an explanation of the the features in this dataset, but rather an example of calculating the XGBoost GPU accelerated SHAP values.
Here we shall look at the last 5 days worth of training data for the [Jane Street Market Prediction competition](https://www.kaggle.com/c/jane-street-market-prediction), using the value of `resp` as our target.

**First we shall install the latest version of XGBoost**

In [None]:
!pip -q install xgboost --upgrade
import xgboost as xgb
# check that we have installed version 1.3.1
xgb.__version__

In [None]:
import numpy  as np
import pandas as pd

# plotting
#import matplotlib.pyplot as plt

!pip install -q datatable 
import datatable as dt

import shap

In [None]:
# read in the train dataset
train_data = dt.fread('../input/jane-street-market-prediction/train.csv').to_pandas()

# filter out the zero weights
#train_data = train_data.query('weight > 0').reset_index(drop = True)

# create our action
#train_data['action'] = ( (train_data['resp']) > 0 )*1

# let us look only at the last 5 days
train_data = train_data.query('date > 494').reset_index(drop = True)

X_train = train_data.loc[:, train_data.columns.str.contains('feature')]
X_train = X_train.fillna(X_train.mean())
y_train = train_data.loc[:, 'resp']

# free up some RAM
del train_data

Our input consists of well over 26 thousand rows of data, and has 130 features:

In [None]:
X_train.shape

### XGBoost

In [None]:
n_trees = 500
dmat = xgb.DMatrix(X_train, y_train)
bst  = xgb.train({"tree_method": "gpu_hist"}, dmat, n_trees)
# Make sure GPU prediction is enabled
bst.set_param({"predictor": "gpu_predictor"})

### SHAP values

In [None]:
%%time 

# Compute the shap values
shap_values = bst.predict(dmat, pred_contribs=True)

In [None]:
# and plot
shap.summary_plot(shap_values[:,:-1], X_train)

### SHAP interactions
Use `pred_interactions=True` to now include the pair-wise interaction terms

In [None]:
%%time 

# Compute the shap interactions
shap_interactions = bst.predict(dmat, pred_interactions=True)

In [None]:
# and plot
shap.summary_plot(shap_interactions[:,:-1,:-1], X_train)

## Related reading
* ["*GPU-Accelerated SHAP values with XGBoost 1.3 and RAPIDS*"](https://medium.com/rapids-ai/gpu-accelerated-shap-values-with-xgboost-1-3-and-rapids-587fad6822) by Rory Mitchell
* [Rory Mitchell, Eibe Frank, Geoffrey Holmes "*GPUTreeShap: Fast Parallel Tree Interpretability*", arXiv:2010.13972 (2020)](https://arxiv.org/pdf/2010.13972.pdf)
* [GPUTreeShap](https://github.com/rapidsai/gputreeshap) by [RAPIDS](https://rapids.ai/) on GitHub
* ["*Basic SHAP Interaction Value Example in XGBoost*"](https://shap.readthedocs.io/en/latest/example_notebooks/tabular_examples/tree_based_models/Basic%20SHAP%20Interaction%20Value%20Example%20in%20XGBoost.html) fron the SHAP documentation
* [SHAP (SHapley Additive exPlanations)](https://github.com/slundberg/shap) by Scott Lundberg on GitHub