Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeSHAP Values are inconsistent #670

Closed
ntrost-targ opened this issue Jun 9, 2021 · 11 comments
Closed

TreeSHAP Values are inconsistent #670

ntrost-targ opened this issue Jun 9, 2021 · 11 comments

Comments

@ntrost-targ
Copy link

Describe the bug
When calculating TreeSHAP values for random forest classification they dont add up.
I would expect that the prediction from .vote() minus the respective SHAP Values gives me the base value which is constant and should be the same for different observations. Note that this is the behaviour we observe in lundbergs python module. Also it would be really handy if there were a function that just calculates the base value (expected_value in python) for me.

Expected behavior
When calculating TreeSHAP Values I expect them to add up together with the base value to the predicted probability

Actual behavior
Calculated Base Values vary even for observations from the same class

Code snippet

val iris = read.arff("../data/weka/iris.arff")

val formula: Formula = "class" ~
val x = formula.x(iris).toArray
val y = formula.y(iris).toIntArray

val model = smile.classification.randomForest(formula,iris)

val arr50 = new Array[Double](3)
val arr52 = new Array[Double](3)

model.vote(iris(50),arr50)
model.vote(iris(52),arr52)

val shap_50 = model.shap(iris(50))
val shap_52 = model.shap(iris(52))

arr50(1)-shap_50.indices.filter(x => (x+2) % 3 == 0).map(shap_50).sum
// res15: Double = 0.41123849878987584
arr52(1)-shap_52.indices.filter(x => (x+2) % 3 == 0).map(shap_52).sum
// res16: Double = 0.4571260444068466

Input data
Iris data set

Additional context

  • using smile from the Try-It-Online binder
@haifengl
Copy link
Owner

haifengl commented Jun 9, 2021

What if you use model.predict() instead of mode.vote()?

@ntrost-targ
Copy link
Author

When i tried model.predict() gave me only the most probable class as output, not the class probabilities?

@ntrost-targ
Copy link
Author

i also tried model.score() but that threw an error

@haifengl
Copy link
Owner

predict is overloaded. Try predict(x, prob) where prob is an array for output.

@ntrost-targ
Copy link
Author

I tried, the problem persists, ableit with smaller variance. I now get 0.3267 vs 0.3289 - which is a more realistic value given the balanced 3 class data set. With lundbergs package that variance is much smaller.

@haifengl
Copy link
Owner

I don't understand this part:

I would expect that the prediction from .vote() minus the respective SHAP Values gives me the base value which is constant

Why? Do you have link to somewhere of paper to prove this?

@mroettig
Copy link

Hi,

I am a colleague of ntrost-targ. Our assumption comes from the local accuray / additivity of explainability (see https://ema.drwhy.ai/breakDown.html#BDMethodGen and Titanic sample with base value = 0.2353095 and posterior probabilities as f(x) ) which should give
the posterior probability for any sample x as the sum of a common base-probability (or base shap value, being the mean class posterior over the full dataset) and of local attribution effects (or local shap values coming from model.shap(iris(50))) for a given sample x (here iris(50)).

That is
phi(x) = phi_0 + \sum_i=1^M phi_i(x) = p(x) = class posterior

(see also https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7326367/#S10title , Property (1) ).

We just wanted to assert equality with the python lundberg implementation and did the reverse computation
as posterior p(x) - \sum_i=1^M phi_i(x) = phi(x) - \sum_i=1^M phi_i(x) = phi_0
which should give us a somewhat constant phi_0 for all samples (modulo numerical issues). The python implementation gives always phi_0 (diverging from the fourth decimal place. The SMILE values start to diverge from the third decimal place). The range for phi_0 in one setting was [0.50026,0.50597]. Might be nitpicking here ;) We were just wondering.

Cheers,
Marc

@haifengl
Copy link
Owner

Hi Marc, thanks for the explanation. Although not 100% sure, I think that this small difference comes from the smoothing of posteriori probability. Depending on the leaf node size, this smoothing may have slightly different impact on posteriori probability calculation.

If you choose two samples hitting the same leaf node, I guess that this difference will be smaller. It is hard to know if two samples arrive the same leaf node. As a work around, I suggest you to compute the difference on all the samples of one class. I guess that you will find several clusters of values with tiny difference.

@ntrost-targ
Copy link
Author

ntrost-targ commented Jun 11, 2021

Hi Haifeng,

i tried the same calculation but with Gradient Boosted Trees (smile.classification.gbm(formula,iris) ) but arrive at 0.59 vs 0.69 (which is also in absolute an odd value, i'm expecting roughly 0.33). Also the variance for the whole iris dataset in python is negligible on the order of 1e-16 - see the script below. For us the local explainability with shap is very important and we would be thankful if you take a deeper look into the issue.
From what i'm thinking any numerical issues should be way smaller in variance than what i see here.

import shap
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
import numpy as np

iris = datasets.load_iris()
clf = RandomForestClassifier(max_depth=2, random_state=0)
explainer = shap.TreeExplainer(clf)

probs = clf.predict_proba(iris.data).transpose()[0]
shap_values = explainer.shap_values(iris.data)

b_0 = np.array([probs[i]-shap_values[0][i].sum() for i in range(150)])

b_0.std()
# 1.6922557229846184e-16

b_0.mean()
# 0.32906666666666645

explainer.expected_value
# array([0.32906667, 0.33373333, 0.3372    ])

notice how the backward calculation matches the explainer.expected_value for the first class.

Bests,
Nikolaus :)

@mroettig
Copy link

Hi Haifeng,

I just came across your Commercial License Usage clause in the SMILE license when using SMILE in a commercial setting (i.e. incorporation of SMILE in commercial products). But I could not find any further details on the website regarding modalities and costs for the commercial license and setting.

Could you give us details on that topic and when commercial licensing is required ? And could we request a deeper look into our SHAP issue on your side when being commercial subscribers ?

Thanks a lot in advance + Cheers,
Marc

@haifengl
Copy link
Owner

@mroettig please contact me by email.

@haifengl haifengl closed this as completed Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants