Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of Base Value and Predicted Value in SHAP Plots #352

Open
ShubhamRathi opened this issue Dec 10, 2018 · 19 comments
Open

Interpretation of Base Value and Predicted Value in SHAP Plots #352

ShubhamRathi opened this issue Dec 10, 2018 · 19 comments

Comments

@ShubhamRathi
Copy link

I've been trying to understand the SHAP tool and have been puzzled with two terms: base value and the predicted value.
The documentation vaguely mentions what the base value is. The documentation says the base value is 'the average model output over the training dataset we passed'. For a classification problem such as this one, I don't understand the notion of base value or the predicted value since prediction of a classifier is discreet categorization.
In this example which shows shap on a classification task on the IRIS dataset, the diagram plots the base value (0.325) and the predicted value (0.00)
image

In the original paper, base value (E(f(z)) is the value that would be predicted if we did not know any features to the current output
What does this mean especially for a classification problem such as the one in the example?

This is a general doubt for many. This forum has an ongoing discussion about this. Clarity on this shall benefit many.

@jphall663
Copy link

jphall663 commented Dec 10, 2018

I believe base value is, as you say, "the value that would be predicted if we did not know any features [for] the current output" - which is just the mean prediction, or mean(yhat) = sum(yhat_i)/N for all rows x_i.

Each individual Shapley value, phi_ij for some feature j and some row x_i, is interpreted as follows: the feature value x_ij contributed phi_ij towards the prediction, yhat_i, for instance x_i compared to the average prediction for the dataset, i.e. mean(yhat).

Now, you may be wondering why these values are larger than expected for binary targets or not in probability units. By default for binary classification the Shapley values are displayed in the logit space such that: logistic(sum(phi_ij) + mean(yhat)) = probability_i, for all non-missing features j and row x_i.

See discussion here for more info: #350

Also, aside from academic literature I found these two resources very helpful in explaining/understanding Shapley values:
https://christophm.github.io/interpretable-ml-book/shapley.html
https://medium.com/@gabrieltseng/interpreting-complex-models-with-shap-values-1c187db6ec83

@ShubhamRathi
Copy link
Author

ShubhamRathi commented Dec 11, 2018

So if I understand this correctly. The predicted value should be the predicted probability of belonging to the correct class while the base value should be the proportion of samples belonging to that class.

Which is in the example discussed in the notebook, 0.325 should be the mean probability of the corresponding class. Upon investigation, I have found out that the data point passed in the plot (0th point of the test) is of class 2. Class 2 has 44 samples. Thus, the mean probability comes to 0.36

Interestingly, the 0th class has 39 samples and thus its mean probability is 39/120 = 0.325!
This makes me think, is the shap tool displaying the prediction for the 0th data point but is instead showing the base value for the wrong class? If it is, then its a flaw in the shap tool. It should show the base value for the 2nd class but is instead showing it for the 0th class. Is this the case?

Here is my notebook which has this analysis.

@jphall663
Copy link

Sorry, I'm not the right person to give an informed explanation on the multinomial case.

It does look like the examples you are referring to are probably using Kernel Shap, which is based on LIME. In case you (or others) have not seen Kernel Shap, it is described here: https://arxiv.org/pdf/1705.07874.pdf, starting pg. 5.

@ShubhamRathi
Copy link
Author

ShubhamRathi commented Dec 11, 2018

Thanks @jphall663 . I shall read up but I guess this doubt is distinct from the working of Shap Kernel.
@slundberg Please do let us know what the case is with multinomial classification.

@slundberg
Copy link
Collaborator

Thanks @jphall663 and @ShubhamRathi for looking into this! The value 0.325 is the base value because you passed the expected value of the first class as the base_value in the notebook:

shap.force_plot(explainer.expected_value[0], shap_values[0], X_test.iloc[0,:])

The three arguments to force_plot above represent the expected_value of the first class, the SHAP values of the first class prediction model, and the data of the first sample row. If you want to explain the output of the second class you would need to change the index to 1 for the first two arguments.

Whenever you explain a multi-class output model with KernelSHAP you will get a list of shap_value arrays as the explanation, one for each of the outputs. The expected_value attribute will also then be a vector. Note also that while the true label of this sample might be class 2, you are plotting the explanation of the output corresponding to the first class.

For future reference I should note this example is a bit unusual since it only explains a single sample. Typically you would need to do shap_values[0][0,:] to get this effect if you explained a matrix of samples (and so had a matrix of shap_values).

@ShubhamRathi
Copy link
Author

Thanks Scott! Helps clarify! 👍

@hcz28
Copy link

hcz28 commented Jan 19, 2019

Hi @slundberg,I also have a problem about the expected output of the model. To my understanding, the base value / expected value equals to the average of predicted outputs of the training dataset. However, the following code shows that the two values are not equal. Is there anything wrong?

from __future__ import division
from sklearn import datasets
from sklearn.cross_validation import train_test_split
import xgboost as xgb
import numpy as np

iris = datasets.load_iris()
data = iris.data[:100]
print data.shape

label = iris.target[:100]
print label.shape

train_x, test_x, train_y, test_y = train_test_split(data, label, random_state=0)

feature_names = [
    'sepal_length_(cm)',
    'sepal_width_(cm)',
    'petal_length_(cm)',
    'petal_width_(cm)']

dtrain = xgb.DMatrix(train_x, label=train_y, feature_names=feature_names)
dtest = xgb.DMatrix(test_x, feature_names=feature_names)

params={'booster': 'gbtree',
        'objective': 'binary:logistic',
        'max_depth': 4,
        'eta': 0.1,
        'nthread':-1}

bst = xgb.train(params=params, dtrain=dtrain, num_boost_round=100)

shap_values = bst.predict(dtrain, pred_contribs=True)

print 'base value is %s' % shap_values[0, -1]  # 0.0538114

log_odds = bst.predict(dtrain, output_margin=True)

print "average output of training dataset is %s" % np.mean(log_odds)  # 0.0614985

@slundberg
Copy link
Collaborator

@hcz28 That's gets at a tricky limitation of how XGBoost models are implemented. XGBoost does not actually record how many samples went through each node, but rather just records the sum of the hessians. In all the XGBoost code base "sum of the hessians" is used whenever you would normally use the "number of samples". For the linear regression loss these are the same values, but not for the logistic loss (or other losses). Changing this would require changing XGBoost to record the number of samples when it builds a model. See also #29 (comment)

@hcz28
Copy link

hcz28 commented Jan 21, 2019

Thank you very much for your answer!

slundberg added a commit that referenced this issue Jan 22, 2019
@NancyLele
Copy link

@slundberg I have a question.That is how to compute the base-value. I have 165 samples and two class [1,0] .The class 1 has 85 samples, and i use function force_plot and the base value is 0.456.

@slundberg
Copy link
Collaborator

@NancyLele the base value is the mean of the model output over the background dataset. So it depends what your model output is and what the background dataset is. For TreeExplainer the background dataset is typically the training dataset, and the model output depends on the type of problem (it is log-odds for XGBoost by default).

@NancyLele
Copy link

@slundberg
I used lightgbm ,and the model output is probability. And used treeexplainer, then background dataset is training dataset .I computed the mean probability over training dataset, the mean is 0.504 ,not the 0.456

67F22D8F-55E4-11E9-9B80-B8CA3AB540C7

So I don't understand the base-value ,

@floidgilbert
Copy link
Contributor

floidgilbert commented Jun 17, 2019

@NancyLele
Nancy, I also use lightgbm and this code works for me.

print(explainer.expected_value)
ytp_raw = model.predict(Xt, raw_score=True)
print(np.mean(ytp_raw))

@DaliaJaber
Copy link

DaliaJaber commented Jan 11, 2020

@slundberg thank you for your great explanation, however, your help is deeply appreciated since as I read all your feedback, I still do not understand what the predicted value is. I am working with a Binary Classification Problem.
With the Random Forest Classifier, using the force Plot as below with Link='Logit:
shap.force_plot(explainer.expected_value[0], shap_values[0][Index], X_test.iloc[Index,:],link='logit')
I calculate the Predicted output as follows:
explainer.expected_value[0] + shap_values[0][Index].sum().

Computing these, I get that the expected value is 0.78 and the predicted output is 0.89. However, with link='logit' the predicted output is 0.71 as seen in the figure below:
Figure_3

and when using the default Link value the predicted output is 0.89.
Figure_2

Let us suppose that the first class is 0, this means that the probability of this instance belonging to class 0 is 71% and what does the value 0.89 represents?
And for this case, each feature represented in blue is considered to be pushing the prediction away for the actual predicted value (in my case the actual predicted value is 0), and all the pink features are the one contributing towards augmenting the probability of this instance to belong to the class 0?

And another issue, when using XGBoost with 'logit' i only get the predicted output to be 0 or 1 and no feature names are appearing on the force plot when the predicted output shown on the plot is 0.

image
Compared to:
image

@annezhangxue
Copy link

Hello, can I say that, Deepexplainer force plot is not good for regression issues of time series data, where multiple output points of each output vector represents continuous time steps, rather than real classes used in classification problems? Thanks.

@alexmirrington
Copy link

Hi there, I just wanted to clarify if the base value is the expectation of the model output over the entire background dataset or just over the background dataset not including the sample x that we are currently trying to explain?

@marcosbd
Copy link

Hi there, I just wanted to clarify if the base value is the expectation of the model output over the entire background dataset or just over the background dataset not including the sample x that we are currently trying to explain?

AFAIK No, it doesn't include the sample x. Unless if the sample x that you are currently trying to explain belongs to the training set. But generally you want to explain samples that doesn't belong into your training set.

@amylee-lixinyi
Copy link

amylee-lixinyi commented Dec 12, 2023

Hi. If shap.kmeans(X_train_scaled, 10) is used for background data generation, how the base_value is been calculated then (regression)?

@matifq
Copy link

matifq commented Dec 13, 2023

Hi, I am using LogisticRegression and seeing the base value through the waterfall plot, which differs from the mean prediction of training samples.

f = LogisticRegression()
f.fit(X_train, y_train)
explainer = shap.Explainer(f, X_train)

shap_values = explainer(X_test)
shap.plots.waterfall(shap_values[0])

base_value is 1.264 (can't get it, how it can be greater than 1)

f.predict(X_train).mean() # this returns 0.6432160804020101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests