Shap explains the output of any machine learning model using expectations and Shapley values. Under certain assumptions it can be shown to be the optimal linear explanation of a model prediction (see our current short paper for details).
pip install shap
Example (run in a Jupyter notebook)
from shap import KernelExplainer, DenseData, visualize, initjs from sklearn import datasets,neighbors from numpy import random, arange # print the JS visualization code to the notebook initjs() # train a k-nearest neighbors classifier on a random subset iris = datasets.load_iris() random.seed(2) inds = arange(len(iris.target)) random.shuffle(inds) knn = neighbors.KNeighborsClassifier() knn.fit(iris.data, iris.target == 0) # use Shap to explain a single prediction background = DenseData(iris.feature_names, iris.data[inds[:100],:]) # name the features explainer = KernelExplainer(knn.predict, background, nsamples=100) x = iris.data[inds[102:103],:] visualize(explainer.explain(x))
The above explanation shows three features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to zero. If there were any features pushing the class label higher they would be shown in red.
If we take many explanations such as the one shown above, rotate them 90 degrees, and then stack them horizontally, we can see explanations for an entire dataset. This is exactly what we do below for all the examples in the iris test set:
# use Shap to explain all test set predictions visualize([explainer.explain(iris.data[inds[i:i+1],:]) for i in range(100,len(iris.target))])
The notebooks below demonstrate different use cases for Shap. Look inside the notebooks directory of the repository if you want to try playing with the original notebooks yourself. If you have your own notebook you would like to share, or have an improvement to the notebooks below, pull requests are welcome :)
- Iris classification - A basic demonstration using the popular iris species dataset. It explains predictions from six different models in scikit-learn using Shap.
- Census income classification - Using the standard adult census income dataset, this notebook trains a random forest classifier using scikit-learn and then explains predictions using Shap.