Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432

Open
2 of 4 tasks
joelostblom opened this issue Dec 10, 2023 · 1 comment
Open
2 of 4 tasks
Labels
bug Indicates an unexpected problem or unintended behaviour

Comments

@joelostblom
Copy link
Contributor

joelostblom commented Dec 10, 2023

Issue Description

I'm struggling with some inconsistencies between different tree based models when passing 'probability' vs 'predict_proba' as the model_output to a shap TreeExplainer. There are two different issues here:

  1. XGBClassifier and HistGradientBoostingClassifier both work with either model_output='predict_proba' or 'probability', whereas RandomForestClassifier and LGBMClassifier only works with 'probability' and will throw a warning with 'predict_proba' (and this warning seems incorrect since the TreeExplainer works well for the other two tree-based models, also reported in LightGBM categorical feature support for Shap values in probability #2899):
ExplainerError: Currently TreeExplainer can only handle models with categorical splits when feature_perturbation="tree_path_dependent" and no background data is passed. Please try again using shap.TreeExplainer(model, feature_perturbation="tree_path_dependent").
  1. The shape of the explanation array is inconsistent when model_output='probability'. For XGBClassifier, HistGradientBoostingClassifier, and LGBMClassifier the array has a single number per observation for the .base_values attribute (the proportion/probability of the majority class). However, with the RandomForestClassifier, there are two values per observations: the probability of both the majority and minority class. It would be convenient if this could be standardized since 'probability' is a built-in option in shap, so that the syntax when plotting could be the same (currently the indexing for the random forest would look like shap.plots.waterfall(explainer[0, :, 1]), whereas for the other three it is just shap.plots.waterfall(explainer[0])

Minimal Reproducible Example

from sklearn.ensemble import HistGradientBoostingClassifier
from lightgbm.sklearn import LGBMClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.datasets import make_classification
import shap


X, y = make_classification(
    n_samples=1000,
    n_features=4,
    n_informative=2,
    n_redundant=0,
    random_state=0,
    shuffle=False
)
# Works with 'predict_proba' and 'probability'
# clf = HistGradientBoostingClassifier()
# clf = XGBClassifier()

# Only works with 'probability'
# clf = RandomForestClassifier()
clf = LGBMClassifier()

clf.fit(X, y)

explainer = shap.TreeExplainer(
    clf,
    data=X,
    model_output='probability'
    # model_output='predict_proba'
)
explanation = explainer(X)
print(explanation.shape)  # The shape and indexing is different for RF
shap.plots.waterfall(explanation[0])

Traceback

No response

Expected Behavior

I would expect the tree based models to behave similarly and work with predict_proba as well as output the same shape for the explanation object when used with 'probability', since as far as I know they all have a predict_proba attribute that outputs values in the same format.

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

0.43

@joelostblom joelostblom added the bug Indicates an unexpected problem or unintended behaviour label Dec 10, 2023
@CloseChoice
Copy link
Collaborator

related to #3318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behaviour
Projects
None yet
Development

No branches or pull requests

2 participants