# SHAP vs EBM Comparison

SHAP is the most popular blackbox explanation method available today, so it’s natural to ask what the similarities and differences are between SHAP and EBM explanations. Let’s find out!

One reason you might prefer an Explainable Boosting Machine (EBM) model over a blackbox model + SHAP explanations is that the EBM provides its own direct explanations, and those explanations are always exact. SHAP does a great job at decomposing a complex model into independent feature importances, but that process loses some of the information present in the blackbox model. SHAP explanations, while usually informative, are only approximate explanations. This approximation aspect is something that would be shared with any explanation method that attempts to condense explanations into feature independent components.

Let's look at some examples to illustrate. We'll start by training a simple GAM (generalized additive model) and applying SHAP to it. For this initial example we would expect the two explanations to be very similar since the model can be nicely decomposed into independent feature aspects.

In [None]:
# boilerplate

from interpret import show
from interpret.glassbox import ExplainableBoostingRegressor
from interpret.blackbox import ShapKernel
import numpy as np
from sklearn.metrics import mean_squared_error

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

In [None]:
X = np.repeat([[1, 1], [-1, -1], [-1, 1], [1, -1]], 20, axis=0)

In [None]:
# y_simple can be predicted with a simple generalized additive model
y_simple = X[:, 0] * 3 + X[:, 1] * 3
# y_simple == np.repeat([6, -6, 0, 0], 20)

ebm_simple = ExplainableBoostingRegressor()
ebm_simple.fit(X, y_simple)

# low MSE indicates the model is a good fit
print(mean_squared_error(y_simple, ebm_simple.predict(X)))

In [None]:
# generate an EBM explanation and show the first sample

show(ebm_simple.explain_local(X, y_simple), 0)

<br/>
<br/>
<br/>
<br/>


In [None]:
# generate a SHAP explanation and show the first sample

shap_simple = ShapKernel(ebm_simple.predict, X)
show(shap_simple.explain_local(X, y_simple), 0)

As expected, they produce nearly identical explanations.

Let's now construct an extreme dataset that consists entirely of pairwise interaction. Traditional GAMs cannot fit a useful model on this dataset, but EBMs can because they include pairwise interactions.

In [None]:
# y_pairwise requires an interaction term to predict
y_pairwise = np.where((X[:, 0] > 0) ^ (X[:, 1] > 0), -6, 6)
# y_pairwise == np.repeat([6, 6, -6, -6], 20)

ebm_pairwise = ExplainableBoostingRegressor()
ebm_pairwise.fit(X, y_pairwise)

# low MSE indicates the model is a good fit
print(mean_squared_error(y_pairwise, ebm_pairwise.predict(X)))

In [None]:
# generate an EBM explanation and show the first sample

show(ebm_pairwise.explain_local(X, y_pairwise), 0)

<br/>
<br/>
<br/>
<br/>


In [None]:
# generate a SHAP explanation and show the first sample

shap_pairwise = ShapKernel(ebm_pairwise.predict, X)
show(shap_pairwise.explain_local(X, y_pairwise), 0)

The explanations have clearly diverged now. In this example, SHAP does not have an innate understanding of the pairwise term, so it's mapping the importance of the EBM's pair onto the two individual features. Spreading the importance this way is the best that any method could do if the importance must be expressed on a per-feature basis.

It should also be noted that the SHAP explanations for both the simple GAM and pairwise models are identical. The UI is currently positioned on the first sample's local explanation, but this would also apply for any sample that you could construct. The dropdown in the UI is live, so the explanations for the other samples can also be viewed.

These are two very different models that produce different outputs. If given only the SHAP explanations though, it would be impossible to differentiate which of these two models was being analyzed. The SHAP values would also be the same if I blended these two models together in any proportion. Most blackbox models will not be composed of pure interaction effect as this one is, but most will have at least some interaction effect present in the model.

The examples above used Kernel SHAP. Unlike Kernel SHAP, Tree SHAP is known to produce exact SHAP values. Many people make the mistake therefore of believing that Tree SHAP produces exact explanations. In reality, Tree SHAP produces exact SHAP values, but exact SHAP values are still only approximate explanations.

Decision trees are another form of glassbox model. Similarly to EBMs, they also have the ability to learn pairwise interactions. Let's see how Tree SHAP handles y_pairwise.

In [None]:
from interpret.glassbox import RegressionTree

tree = RegressionTree()
tree.fit(X, y_pairwise)

# low MSE indicates the model is a good fit
print(mean_squared_error(y_pairwise, tree.predict(X)))

In [None]:
show(tree.explain_global())

We can see that the decision tree has learned the same XOR function as the pairwise EBM, so it should produce nearly identical predictions. We can also see this because the MSE between the predictions of the two models is low, unlike the much larger MSE between the tree and the original simple EBM.

In [None]:
print(mean_squared_error(ebm_pairwise.predict(X), tree.predict(X)))
print(mean_squared_error(ebm_simple.predict(X), tree.predict(X)))

In [None]:
# generate a Tree SHAP explanation and show the first sample

from interpret.greybox import ShapTree

tree_shap = ShapTree(tree._model(), X)
show(tree_shap.explain_local(X, y_pairwise), 0)

As you can see, Tree SHAP is returning almost identical explanations as Kernel SHAP. In the case of Tree SHAP, the feature importances are both exactly 3.0, while Kernel SHAP returned feature importances of 2.99. The pairwise detail is still lost in the Tree SHAP explanation, which makes these Tree SHAP explanations also only approximate explanations.

It should be mentioned that the SHAP package has some optional functionality to handle interaction terms. Simpler models like an EBM that have a limited number of pairwise terms could have all the possible pairwise SHAP values calculated. In terms of getting exact explanations though, this method only works in practice for what are already glassbox models. Even simple blackbox models tend to have large numbers of much higher dimensional terms expressed in the model. The default XGBoost model is built with max_depth=6 and n_estimators=100, which means that there are 2^5 * 100 = 3,200 potential 6-way interactions inside that model. The model would also have an even larger number of 5-ways, 4-ways, 3-ways, and pairs that came from the purification distillates of the 6-way interactions. In theory all of these could have SHAP values calculated, but such a complex explanation would not be understandable by humans. It would also take far too long to compute these SHAP values except on trivial datasets. EBMs avoid these issues by forcing the model to learn as much as possible from the individual features and a curated set of interactions.

For more information about SHAP interaction terms see [Basic SHAP Interaction Value Example in XGBoost](https://github.com/slundberg/shap/blob/master/notebooks/tabular_examples/tree_based_models/Basic%20SHAP%20Interaction%20Value%20Example%20in%20XGBoost.ipynb)