Interpretation of Base Value and Predicted Value in SHAP Plots #352

ShubhamRathi · 2018-12-10T18:21:23Z

I've been trying to understand the SHAP tool and have been puzzled with two terms: base value and the predicted value.
The documentation vaguely mentions what the base value is. The documentation says the base value is 'the average model output over the training dataset we passed'. For a classification problem such as this one, I don't understand the notion of base value or the predicted value since prediction of a classifier is discreet categorization.
In this example which shows shap on a classification task on the IRIS dataset, the diagram plots the base value (0.325) and the predicted value (0.00)

In the original paper, base value (E(f(z)) is the value that would be predicted if we did not know any features to the current output
What does this mean especially for a classification problem such as the one in the example?

This is a general doubt for many. This forum has an ongoing discussion about this. Clarity on this shall benefit many.

jphall663 · 2018-12-10T18:42:41Z

I believe base value is, as you say, "the value that would be predicted if we did not know any features [for] the current output" - which is just the mean prediction, or mean(yhat) = sum(yhat_i)/N for all rows x_i.

Each individual Shapley value, phi_ij for some feature j and some row x_i, is interpreted as follows: the feature value x_ij contributed phi_ij towards the prediction, yhat_i, for instance x_i compared to the average prediction for the dataset, i.e. mean(yhat).

Now, you may be wondering why these values are larger than expected for binary targets or not in probability units. By default for binary classification the Shapley values are displayed in the logit space such that: logistic(sum(phi_ij) + mean(yhat)) = probability_i, for all non-missing features j and row x_i.

See discussion here for more info: #350

Also, aside from academic literature I found these two resources very helpful in explaining/understanding Shapley values:
https://christophm.github.io/interpretable-ml-book/shapley.html
https://medium.com/@gabrieltseng/interpreting-complex-models-with-shap-values-1c187db6ec83

ShubhamRathi · 2018-12-11T09:28:30Z

So if I understand this correctly. The predicted value should be the predicted probability of belonging to the correct class while the base value should be the proportion of samples belonging to that class.

Which is in the example discussed in the notebook, 0.325 should be the mean probability of the corresponding class. Upon investigation, I have found out that the data point passed in the plot (0th point of the test) is of class 2. Class 2 has 44 samples. Thus, the mean probability comes to 0.36

Interestingly, the 0th class has 39 samples and thus its mean probability is 39/120 = 0.325!
This makes me think, is the shap tool displaying the prediction for the 0th data point but is instead showing the base value for the wrong class? If it is, then its a flaw in the shap tool. It should show the base value for the 2nd class but is instead showing it for the 0th class. Is this the case?

Here is my notebook which has this analysis.

jphall663 · 2018-12-11T12:43:59Z

Sorry, I'm not the right person to give an informed explanation on the multinomial case.

It does look like the examples you are referring to are probably using Kernel Shap, which is based on LIME. In case you (or others) have not seen Kernel Shap, it is described here: https://arxiv.org/pdf/1705.07874.pdf, starting pg. 5.

ShubhamRathi · 2018-12-11T16:16:47Z

Thanks @jphall663 . I shall read up but I guess this doubt is distinct from the working of Shap Kernel.
@slundberg Please do let us know what the case is with multinomial classification.

slundberg · 2018-12-14T21:33:09Z

Thanks @jphall663 and @ShubhamRathi for looking into this! The value 0.325 is the base value because you passed the expected value of the first class as the base_value in the notebook:

shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0,:])

The three arguments to force_plot above represent the expected_value of the first class, the SHAP values of the first class prediction model, and the data of the first sample row. If you want to explain the output of the second class you would need to change the index to 1 for the first two arguments.

Whenever you explain a multi-class output model with KernelSHAP you will get a list of shap_value arrays as the explanation, one for each of the outputs. The expected_value attribute will also then be a vector. Note also that while the true label of this sample might be class 2, you are plotting the explanation of the output corresponding to the first class.

For future reference I should note this example is a bit unusual since it only explains a single sample. Typically you would need to do shap_values[0][0,:] to get this effect if you explained a matrix of samples (and so had a matrix of shap_values).

ShubhamRathi · 2018-12-25T18:28:14Z

Thanks Scott! Helps clarify! 👍

hcz28 · 2019-01-19T10:09:23Z

Hi @slundberg，I also have a problem about the expected output of the model. To my understanding, the base value / expected value equals to the average of predicted outputs of the training dataset. However, the following code shows that the two values are not equal. Is there anything wrong?

from __future__ import division
from sklearn import datasets
from sklearn.cross_validation import train_test_split
import xgboost as xgb
import numpy as np

iris = datasets.load_iris()
data = iris.data[:100]
print data.shape

label = iris.target[:100]
print label.shape

train_x, test_x, train_y, test_y = train_test_split(data, label, random_state=0)

feature_names = [
    'sepal_length_(cm)',
    'sepal_width_(cm)',
    'petal_length_(cm)',
    'petal_width_(cm)']

dtrain = xgb.DMatrix(train_x, label=train_y, feature_names=feature_names)
dtest = xgb.DMatrix(test_x, feature_names=feature_names)

params={'booster': 'gbtree',
        'objective': 'binary:logistic',
        'max_depth': 4,
        'eta': 0.1,
        'nthread':-1}

bst = xgb.train(params=params, dtrain=dtrain, num_boost_round=100)

shap_values = bst.predict(dtrain, pred_contribs=True)

print 'base value is %s' % shap_values[0, -1]  # 0.0538114

log_odds = bst.predict(dtrain, output_margin=True)

print "average output of training dataset is %s" % np.mean(log_odds)  # 0.0614985

slundberg · 2019-01-19T16:42:37Z

@hcz28 That's gets at a tricky limitation of how XGBoost models are implemented. XGBoost does not actually record how many samples went through each node, but rather just records the sum of the hessians. In all the XGBoost code base "sum of the hessians" is used whenever you would normally use the "number of samples". For the linear regression loss these are the same values, but not for the logistic loss (or other losses). Changing this would require changing XGBoost to record the number of samples when it builds a model. See also #29 (comment)

hcz28 · 2019-01-21T06:10:29Z

Thank you very much for your answer!

This addresses an issue in #90 #29 #352

NancyLele · 2019-03-29T02:05:29Z

@slundberg I have a question.That is how to compute the base-value. I have 165 samples and two class [1,0] .The class 1 has 85 samples, and i use function force_plot and the base value is 0.456.

slundberg · 2019-04-02T22:54:41Z

@NancyLele the base value is the mean of the model output over the background dataset. So it depends what your model output is and what the background dataset is. For TreeExplainer the background dataset is typically the training dataset, and the model output depends on the type of problem (it is log-odds for XGBoost by default).

NancyLele · 2019-04-03T07:58:12Z

@slundberg
I used lightgbm ,and the model output is probability. And used treeexplainer, then background dataset is training dataset .I computed the mean probability over training dataset, the mean is 0.504 ,not the 0.456

So I don't understand the base-value ,

floidgilbert · 2019-06-17T17:12:34Z

@NancyLele
Nancy, I also use lightgbm and this code works for me.

print(explainer.expected_value)
ytp_raw = model.predict(Xt, raw_score=True)
print(np.mean(ytp_raw))

DaliaJaber · 2020-01-11T15:02:25Z

@slundberg thank you for your great explanation, however, your help is deeply appreciated since as I read all your feedback, I still do not understand what the predicted value is. I am working with a Binary Classification Problem.
With the Random Forest Classifier, using the force Plot as below with Link='Logit:
shap.force_plot(explainer.expected_value[0], shap_values[0][Index], X_test.iloc[Index,:],link='logit')
I calculate the Predicted output as follows:
explainer.expected_value[0] + shap_values[0][Index].sum().

Computing these, I get that the expected value is 0.78 and the predicted output is 0.89. However, with link='logit' the predicted output is 0.71 as seen in the figure below:

and when using the default Link value the predicted output is 0.89.

Let us suppose that the first class is 0, this means that the probability of this instance belonging to class 0 is 71% and what does the value 0.89 represents?
And for this case, each feature represented in blue is considered to be pushing the prediction away for the actual predicted value (in my case the actual predicted value is 0), and all the pink features are the one contributing towards augmenting the probability of this instance to belong to the class 0?

And another issue, when using XGBoost with 'logit' i only get the predicted output to be 0 or 1 and no feature names are appearing on the force plot when the predicted output shown on the plot is 0.

Compared to:

annezhangxue · 2021-04-06T20:07:23Z

Hello, can I say that, Deepexplainer force plot is not good for regression issues of time series data, where multiple output points of each output vector represents continuous time steps, rather than real classes used in classification problems? Thanks.

alexmirrington · 2021-09-23T06:33:29Z

Hi there, I just wanted to clarify if the base value is the expectation of the model output over the entire background dataset or just over the background dataset not including the sample x that we are currently trying to explain?

marcosbd · 2022-03-23T18:36:57Z

Hi there, I just wanted to clarify if the base value is the expectation of the model output over the entire background dataset or just over the background dataset not including the sample x that we are currently trying to explain?

AFAIK No, it doesn't include the sample x. Unless if the sample x that you are currently trying to explain belongs to the training set. But generally you want to explain samples that doesn't belong into your training set.

amylee-lixinyi · 2023-12-12T07:39:38Z

Hi. If shap.kmeans(X_train_scaled, 10) is used for background data generation, how the base_value is been calculated then (regression)?

matifq · 2023-12-13T03:07:35Z

Hi, I am using LogisticRegression and seeing the base value through the waterfall plot, which differs from the mean prediction of training samples.

f = LogisticRegression()
f.fit(X_train, y_train)
explainer = shap.Explainer(f, X_train)

shap_values = explainer(X_test)
shap.plots.waterfall(shap_values[0])

base_value is 1.264 (can't get it, how it can be greater than 1)

f.predict(X_train).mean() # this returns 0.6432160804020101

slundberg added a commit that referenced this issue Jan 22, 2019

Allow XGBoost node weights to be overridden

a3de87a

This addresses an issue in #90 #29 #352

slundberg mentioned this issue Dec 27, 2019

Unable to manually calculate expected value for CatBoostClassifier #966

Closed

DaliaJaber mentioned this issue Jan 13, 2020

Not Understanding the Predicted Value in Binary Classification problem #997

Open

jameslamb mentioned this issue Jan 17, 2021

[dask] Support pred_contrib in Dask predict() methods (fixes #3713) microsoft/LightGBM#3774

Merged

memoryz mentioned this issue Mar 21, 2021

How to treat divide by zero in SHAP kernel? bjlkeng/bjlkeng.github.io#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interpretation of Base Value and Predicted Value in SHAP Plots #352

Interpretation of Base Value and Predicted Value in SHAP Plots #352

ShubhamRathi commented Dec 10, 2018

jphall663 commented Dec 10, 2018 •

edited

ShubhamRathi commented Dec 11, 2018 •

edited

jphall663 commented Dec 11, 2018

ShubhamRathi commented Dec 11, 2018 •

edited

slundberg commented Dec 14, 2018

ShubhamRathi commented Dec 25, 2018

hcz28 commented Jan 19, 2019 •

edited

slundberg commented Jan 19, 2019

hcz28 commented Jan 21, 2019

NancyLele commented Mar 29, 2019

slundberg commented Apr 2, 2019

NancyLele commented Apr 3, 2019

floidgilbert commented Jun 17, 2019 •

edited

DaliaJaber commented Jan 11, 2020 •

edited

annezhangxue commented Apr 6, 2021

alexmirrington commented Sep 23, 2021

marcosbd commented Mar 23, 2022

amylee-lixinyi commented Dec 12, 2023 •

edited

matifq commented Dec 13, 2023

Interpretation of Base Value and Predicted Value in SHAP Plots #352

Interpretation of Base Value and Predicted Value in SHAP Plots #352

Comments

ShubhamRathi commented Dec 10, 2018

jphall663 commented Dec 10, 2018 • edited

ShubhamRathi commented Dec 11, 2018 • edited

jphall663 commented Dec 11, 2018

ShubhamRathi commented Dec 11, 2018 • edited

slundberg commented Dec 14, 2018

ShubhamRathi commented Dec 25, 2018

hcz28 commented Jan 19, 2019 • edited

slundberg commented Jan 19, 2019

hcz28 commented Jan 21, 2019

NancyLele commented Mar 29, 2019

slundberg commented Apr 2, 2019

NancyLele commented Apr 3, 2019

floidgilbert commented Jun 17, 2019 • edited

DaliaJaber commented Jan 11, 2020 • edited

annezhangxue commented Apr 6, 2021

alexmirrington commented Sep 23, 2021

marcosbd commented Mar 23, 2022

amylee-lixinyi commented Dec 12, 2023 • edited

matifq commented Dec 13, 2023

jphall663 commented Dec 10, 2018 •

edited

ShubhamRathi commented Dec 11, 2018 •

edited

ShubhamRathi commented Dec 11, 2018 •

edited

hcz28 commented Jan 19, 2019 •

edited

floidgilbert commented Jun 17, 2019 •

edited

DaliaJaber commented Jan 11, 2020 •

edited

amylee-lixinyi commented Dec 12, 2023 •

edited