# Similarities and Differences between SHAP and EBMs

SHAP is the most popular blackbox explanation method available today, so it’s natural to ask what the similarities and differences are between SHAP and EBM explanations.  Let’s find out!

One reason you might prefer an EBM over a black box + SHAP explanations is that an EBM model provides its own explanations, and those explanations are always exact. SHAP does a great job at decomposing a complex model into independent feature importances, but that process loses some of the information present in the blackbox model. SHAP explanations, while usually informative, are only approximate explanations. This approximation aspect is something that would be shared with any explanation method that attempts to condense explanations into feature independent components.

Let's look at some examples to illustrate. We'll start by training a simple generalized additive model (GAM) and applying SHAP to it. For this initial example we would expect the two explanations to be very similar since the model can be nicely decomposed into independent feature aspects.

In [None]:
# boilerplate

from interpret.glassbox import ExplainableBoostingRegressor
from interpret.blackbox import ShapKernel
from interpret import show
import numpy as np
from sklearn.metrics import mean_squared_error

from interpret import set_visualize_provider
from interpret.provider import InlineProvider
set_visualize_provider(InlineProvider())

In [None]:
# make a dataset that could be generated from a generalized additive model
X1 = np.array([[1, 1], [-1, -1], [-1, 1], [1, -1]])
y1 = np.array([6, -6, 0, 0])  # equivalent to: X1[:, 0] * 3 + X1[:, 1] * 3

# repeat this pattern to make the dataset larger
X1 = np.repeat(X1, 20, axis=0)
y1 = np.repeat(y1, 20)

In [None]:
ebm1 = ExplainableBoostingRegressor()
ebm1.fit(X1, y1)

print(mean_squared_error(y1, ebm1.predict(X1)))

In [None]:
# generate an EBM explanation and show the first sample

show(ebm1.explain_local(X1, y1), 0)

In [None]:
# generate a SHAP explanation and show the first sample

shap1 = ShapKernel(ebm1.predict, X1)
show(shap1.explain_local(X1, y1), 0)

As expected, it appears that both methods are producing nearly identical explanations.

Let's now construct an extreme dataset that consists entirely of pure pairwise interaction. Traditional GAMs cannot fit a useful model on this dataset, but EBMs can because they include pairwise interactions.

In [None]:
# make a dataset that expresses pure pairwise interaction (the XOR function)
X2 = np.array([[1, 1], [-1, -1], [-1, 1], [1, -1]])
y2 = np.array([6, 6, -6, -6])

# repeat this pattern to make the dataset larger
X2 = np.repeat(X2, 20, axis=0)
y2 = np.repeat(y2, 20)

In [None]:
ebm2 = ExplainableBoostingRegressor()
ebm2.fit(X2, y2)

print(mean_squared_error(y2, ebm2.predict(X2)))

In [None]:
# generate an EBM explanation and show the first sample

show(ebm2.explain_local(X2, y2), 0)

In [None]:
# generate a SHAP explanation and show the first sample

shap2 = ShapKernel(ebm2.predict, X2)
show(shap2.explain_local(X2, y2), 0)

The explanations have clearly diverged now. SHAP does not have an understanding of the pairwise term, so it is mapping the importance of the EBM's pair onto individual features. Spreading the importance equally here is really the best that any method could do when the importance must be expressed on a per-feature basis.

It should also be noted that the SHAP explanations for both the pure GAM and XOR models are identical. These are two very different models that produce different outputs, yet all the SHAP explanations generated from them will be the same. The SHAP values would also be the same if I blended these two models together in any proportion. If given only the SHAP explanations, it would be impossible to differentiate which of these two models was being analyzed.

The examples above used Kernel SHAP. Unlike Kernel SHAP, Tree SHAP is known to produce exact SHAP values. Many people make the mistake therefore of believing that Tree SHAP produces exact explanations. In reality, TreeShap produces exact SHAP values, but even exact SHAP values are still only an approximate explanation of a model.

Decision trees are another form of glassbox model. Similar to EBMs, they also have the ability to learn pairwise interactions like the ones expressed in the XOR dataset above. Let's see how Tree SHAP handles this.

In [None]:
from interpret.glassbox import RegressionTree

tree = RegressionTree()
tree.fit(X2, y2)

print(mean_squared_error(y2, tree.predict(X2)))

In [None]:
show(tree.explain_global())

We can see from above that the decision tree has learned the same XOR function as the EBM, so it should produce nearly identical predictions.

In [None]:
# generate a Tree SHAP explanation and show the first sample

from interpret.greybox import ShapTree

shap3 = ShapTree(tree._model(), X2)
show(shap3.explain_local(X2, y2), 0)

As you can see, Tree SHAP is returning essentially the same explanations as Kernel SHAP. In the case of Tree SHAP, the feature importances are both exactly 3, while Kernel SHAP returned feature importances of 2.99. The pairwise detail is still lost however in the Tree SHAP explanation.

We should mention here that the SHAP package has some optional functionality to handle interaction terms. Simpler models like an EBM that have a limited number of pairwise terms could have all the possible pairwise SHAP values calculated. In terms of getting exact explanations though, this method only works in practice for what are already glassbox models. Even simple blackbox models tend to have a large number of much higher dimensional terms expressed in the model. The default XGBoost model is built with max_depth=6 and n_estimators=100, which means that there are 2^5 * 100 = 3,200 potential 6-way interactions inside that model, and also many 5-ways, 4-ways, 3-ways, and pairs. In theory all of these could have SHAP values calculated, but such a complex explanation would not be understandable by humans. It would also take far to long to compute these SHAP values except on trivial datasets.

For more information about SHAP interaction terms see [Basic SHAP Interaction Value Example in XGBoost](https://github.com/slundberg/shap/blob/master/notebooks/tabular_examples/tree_based_models/Basic%20SHAP%20Interaction%20Value%20Example%20in%20XGBoost.ipynb)