Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Strange base value of exact explainer #3174

Closed
3 of 4 tasks
mayer79 opened this issue Aug 1, 2023 · 8 comments
Closed
3 of 4 tasks

BUG: Strange base value of exact explainer #3174

mayer79 opened this issue Aug 1, 2023 · 8 comments
Labels
bug Indicates an unexpected problem or unintended behaviour

Comments

@mayer79
Copy link

mayer79 commented Aug 1, 2023

Issue Description

I expect the baseline of the exact explainer to equal the average prediction on the background data (masker). However, there are cases where this is not true, see example.

Minimal Reproducible Example

import numpy as np
import pandas as pd
import shap

n = 101
x = np.arange(n)
X = pd.DataFrame(dict(x1=x, x2=np.flip(x)))
print(X.x1.mean())  # 50
X.head()

def true_model(X):
    return X.x1

ex = shap.Explainer(true_model, masker=X, algorithm="exact")

ps = ex(X[0:3])
ps

# Output
#.values =
#array([[-50.06,   0.  ],
#       [-49.06,   0.  ],
#       [-48.06,   0.  ]])
#
#.base_values =
#array([50.06, 50.06, 50.06])
#
#.data =
#array([[  0, 100],
#       [  1,  99],
#       [  2,  98]])

Expected Behavior

The average of 0, ..., 100 is 50, so I'd expect the baseline to equal 50.

Running above example with n = 100 gives a baseline of 49.5 (as expected).

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

0.42.1

@mayer79 mayer79 added the bug Indicates an unexpected problem or unintended behaviour label Aug 1, 2023
@znacer
Copy link
Contributor

znacer commented Aug 14, 2023

Hi,
This seems to be caused by a hard coded limitation of the masker size.
The class handling Tabular maskers has a limitation (max_samples) set to 100. Above this limit, only a subsample of the masker dataset is considered.

A possible fix could be to add a property max_samples to Explainer make it possible for users to tune this limitation.

@mayer79
Copy link
Author

mayer79 commented Aug 17, 2023

You are right, it affects even linear explainers.

@CloseChoice
Copy link
Collaborator

@znacer I implemented your proposal. That sounds like a good approach to me.

@mayer79
Copy link
Author

mayer79 commented Sep 29, 2023

Awesome, thanks @znacer and @CloseChoice

@connortann
Copy link
Collaborator

Hi, This seems to be caused by a hard coded limitation of the masker size. The class handling Tabular maskers has a limitation (max_samples) set to 100. Above this limit, only a subsample of the masker dataset is considered.

A possible fix could be to add a property max_samples to Explainer make it possible for users to tune this limitation.

The explainer object already accepts a masker, so I think it would be preferable to pass in a masker with the desired number of samples rather than expose more masker params in the Explainer class.

What do you think, would this be acceptable?

ex = shap.Explainer(true_model, masker=maskers.Independent(X, max_samples=1000), algorithm="exact")

@mayer79
Copy link
Author

mayer79 commented Dec 4, 2023

Good idea, that would be quite elegant, indeed.

@connortann
Copy link
Collaborator

What do you think @CloseChoice , happy for me to close this issue for now with the approach above as a recommended way to set the masker params?

@CloseChoice
Copy link
Collaborator

@connortann, that's fine for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behaviour
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants