Explain the output of any machine learning model using expectations and Shapley values.
Jupyter Notebook Python
Latest commit 8de2cda Feb 17, 2017 Scott Lundberg Don't do future selection when we have most of the sample space enume…
…rated.

README.md


Shap explains the output of any machine learning model using expectations and Shapley values. Under certain assumptions it can be shown to be the optimal linear explanation of a model prediction (see our current short paper for details).

Install

pip install shap

Example (run in a Jupyter notebook)

from shap import KernelExplainer, DenseData, visualize, initjs
from sklearn import datasets,neighbors
from numpy import random, arange

# print the JS visualization code to the notebook
initjs()

# train a k-nearest neighbors classifier on a random subset 
iris = datasets.load_iris()
random.seed(2)
inds = arange(len(iris.target))
random.shuffle(inds)
knn = neighbors.KNeighborsClassifier()
knn.fit(iris.data, iris.target == 0)

# use Shap to explain a single prediction
background = DenseData(iris.feature_names, iris.data[inds[:100],:]) # name the features
explainer = KernelExplainer(knn.predict, background, nsamples=100)
x = iris.data[inds[102:103],:]
visualize(explainer.explain(x))

The above explanation shows three features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to zero. If there were any features pushing the class label higher they would be shown in red.

If we take many explanations such as the one shown above, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset. This is exactly what we do below for all the examples in the iris test set:

# use Shap to explain all test set predictions
visualize([explainer.explain(iris.data[inds[i:i+1],:]) for i in range(100,len(iris.target))])

Sample notebooks

The notebooks below demonstrate different use cases for Shap. Look inside the notebooks directory of the repository if you want to try playing with the original notebooks yourself. If you have your own notebook you would like to share, or have an improvement to the notebooks below, pull requests are welcome :)

  • Iris classification - A basic demonstration using the popular iris species dataset. It explains predictions from six different models in scikit-learn using Shap.
  • Census income classification - Using the standard adult census income dataset, this notebook trains a random forest classifier using scikit-learn and then explains predictions using Shap.