TreeSHAP Values are inconsistent #670

ntrost-targ · 2021-06-09T13:11:27Z

Describe the bug
When calculating TreeSHAP values for random forest classification they dont add up.
I would expect that the prediction from .vote() minus the respective SHAP Values gives me the base value which is constant and should be the same for different observations. Note that this is the behaviour we observe in lundbergs python module. Also it would be really handy if there were a function that just calculates the base value (expected_value in python) for me.

Expected behavior
When calculating TreeSHAP Values I expect them to add up together with the base value to the predicted probability

Actual behavior
Calculated Base Values vary even for observations from the same class

Code snippet

val iris = read.arff("../data/weka/iris.arff")

val formula: Formula = "class" ~
val x = formula.x(iris).toArray
val y = formula.y(iris).toIntArray

val model = smile.classification.randomForest(formula,iris)

val arr50 = new Array[Double](3)
val arr52 = new Array[Double](3)

model.vote(iris(50),arr50)
model.vote(iris(52),arr52)

val shap_50 = model.shap(iris(50))
val shap_52 = model.shap(iris(52))

arr50(1)-shap_50.indices.filter(x => (x+2) % 3 == 0).map(shap_50).sum
// res15: Double = 0.41123849878987584
arr52(1)-shap_52.indices.filter(x => (x+2) % 3 == 0).map(shap_52).sum
// res16: Double = 0.4571260444068466

Input data
Iris data set

Additional context

using smile from the Try-It-Online binder

The text was updated successfully, but these errors were encountered:

haifengl · 2021-06-09T16:27:33Z

What if you use model.predict() instead of mode.vote()?

ntrost-targ · 2021-06-09T19:30:13Z

When i tried model.predict() gave me only the most probable class as output, not the class probabilities?

ntrost-targ · 2021-06-09T19:33:08Z

i also tried model.score() but that threw an error

haifengl · 2021-06-10T01:52:03Z

predict is overloaded. Try predict(x, prob) where prob is an array for output.

ntrost-targ · 2021-06-10T13:55:11Z

I tried, the problem persists, ableit with smaller variance. I now get 0.3267 vs 0.3289 - which is a more realistic value given the balanced 3 class data set. With lundbergs package that variance is much smaller.

haifengl · 2021-06-10T20:28:32Z

I don't understand this part:

I would expect that the prediction from .vote() minus the respective SHAP Values gives me the base value which is constant

Why? Do you have link to somewhere of paper to prove this?

mroettig · 2021-06-10T21:36:17Z

Hi,

I am a colleague of ntrost-targ. Our assumption comes from the local accuray / additivity of explainability (see https://ema.drwhy.ai/breakDown.html#BDMethodGen and Titanic sample with base value = 0.2353095 and posterior probabilities as f(x) ) which should give
the posterior probability for any sample x as the sum of a common base-probability (or base shap value, being the mean class posterior over the full dataset) and of local attribution effects (or local shap values coming from model.shap(iris(50))) for a given sample x (here iris(50)).

That is
phi(x) = phi_0 + \sum_i=1^M phi_i(x) = p(x) = class posterior

(see also https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7326367/#S10title , Property (1) ).

We just wanted to assert equality with the python lundberg implementation and did the reverse computation
as posterior p(x) - \sum_i=1^M phi_i(x) = phi(x) - \sum_i=1^M phi_i(x) = phi_0
which should give us a somewhat constant phi_0 for all samples (modulo numerical issues). The python implementation gives always phi_0 (diverging from the fourth decimal place. The SMILE values start to diverge from the third decimal place). The range for phi_0 in one setting was [0.50026,0.50597]. Might be nitpicking here ;) We were just wondering.

Cheers,
Marc

haifengl · 2021-06-11T01:05:45Z

Hi Marc, thanks for the explanation. Although not 100% sure, I think that this small difference comes from the smoothing of posteriori probability. Depending on the leaf node size, this smoothing may have slightly different impact on posteriori probability calculation.

If you choose two samples hitting the same leaf node, I guess that this difference will be smaller. It is hard to know if two samples arrive the same leaf node. As a work around, I suggest you to compute the difference on all the samples of one class. I guess that you will find several clusters of values with tiny difference.

ntrost-targ · 2021-06-11T15:17:31Z

Hi Haifeng,

i tried the same calculation but with Gradient Boosted Trees (smile.classification.gbm(formula,iris) ) but arrive at 0.59 vs 0.69 (which is also in absolute an odd value, i'm expecting roughly 0.33). Also the variance for the whole iris dataset in python is negligible on the order of 1e-16 - see the script below. For us the local explainability with shap is very important and we would be thankful if you take a deeper look into the issue.
From what i'm thinking any numerical issues should be way smaller in variance than what i see here.

import shap
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import numpy as np

iris = datasets.load_iris()
clf = RandomForestClassifier(max_depth=2, random_state=0)
explainer = shap.TreeExplainer(clf)

probs = clf.predict_proba(iris.data).transpose()[0]
shap_values = explainer.shap_values(iris.data)

b_0 = np.array([probs[i]-shap_values[0][i].sum() for i in range(150)])

b_0.std()
# 1.6922557229846184e-16

b_0.mean()
# 0.32906666666666645

explainer.expected_value
# array([0.32906667, 0.33373333, 0.3372    ])

notice how the backward calculation matches the explainer.expected_value for the first class.

Bests,
Nikolaus :)

mroettig · 2021-06-17T12:37:05Z

Hi Haifeng,

I just came across your Commercial License Usage clause in the SMILE license when using SMILE in a commercial setting (i.e. incorporation of SMILE in commercial products). But I could not find any further details on the website regarding modalities and costs for the commercial license and setting.

Could you give us details on that topic and when commercial licensing is required ? And could we request a deeper look into our SHAP issue on your side when being commercial subscribers ?

Thanks a lot in advance + Cheers,
Marc

haifengl · 2021-06-17T12:57:22Z

@mroettig please contact me by email.

haifengl closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TreeSHAP Values are inconsistent #670

TreeSHAP Values are inconsistent #670

ntrost-targ commented Jun 9, 2021

haifengl commented Jun 9, 2021

ntrost-targ commented Jun 9, 2021

ntrost-targ commented Jun 9, 2021

haifengl commented Jun 10, 2021

ntrost-targ commented Jun 10, 2021

haifengl commented Jun 10, 2021

mroettig commented Jun 10, 2021

haifengl commented Jun 11, 2021

ntrost-targ commented Jun 11, 2021 •

edited

mroettig commented Jun 17, 2021

haifengl commented Jun 17, 2021

TreeSHAP Values are inconsistent #670

TreeSHAP Values are inconsistent #670

Comments

ntrost-targ commented Jun 9, 2021

haifengl commented Jun 9, 2021

ntrost-targ commented Jun 9, 2021

ntrost-targ commented Jun 9, 2021

haifengl commented Jun 10, 2021

ntrost-targ commented Jun 10, 2021

haifengl commented Jun 10, 2021

mroettig commented Jun 10, 2021

haifengl commented Jun 11, 2021

ntrost-targ commented Jun 11, 2021 • edited

mroettig commented Jun 17, 2021

haifengl commented Jun 17, 2021

ntrost-targ commented Jun 11, 2021 •

edited