BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432

joelostblom · 2023-12-10T02:13:21Z

Issue Description

I'm struggling with some inconsistencies between different tree based models when passing 'probability' vs 'predict_proba' as the model_output to a shap TreeExplainer. There are two different issues here:

XGBClassifier and HistGradientBoostingClassifier both work with either model_output='predict_proba' or 'probability', whereas RandomForestClassifier and LGBMClassifier only works with 'probability' and will throw a warning with 'predict_proba' (and this warning seems incorrect since the TreeExplainer works well for the other two tree-based models, also reported in LightGBM categorical feature support for Shap values in probability #2899):

ExplainerError: Currently TreeExplainer can only handle models with categorical splits when feature_perturbation="tree_path_dependent" and no background data is passed. Please try again using shap.TreeExplainer(model, feature_perturbation="tree_path_dependent").

The shape of the explanation array is inconsistent when model_output='probability'. For XGBClassifier, HistGradientBoostingClassifier, and LGBMClassifier the array has a single number per observation for the .base_values attribute (the proportion/probability of the majority class). However, with the RandomForestClassifier, there are two values per observations: the probability of both the majority and minority class. It would be convenient if this could be standardized since 'probability' is a built-in option in shap, so that the syntax when plotting could be the same (currently the indexing for the random forest would look like shap.plots.waterfall(explainer[0, :, 1]), whereas for the other three it is just shap.plots.waterfall(explainer[0])

Minimal Reproducible Example

from sklearn.ensemble import HistGradientBoostingClassifier
from lightgbm.sklearn import LGBMClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from sklearn.datasets import make_classification
import shap


X, y = make_classification(
    n_samples=1000,
    n_features=4,
    n_informative=2,
    n_redundant=0,
    random_state=0,
    shuffle=False
)
# Works with 'predict_proba' and 'probability'
# clf = HistGradientBoostingClassifier()
# clf = XGBClassifier()

# Only works with 'probability'
# clf = RandomForestClassifier()
clf = LGBMClassifier()

clf.fit(X, y)

explainer = shap.TreeExplainer(
    clf,
    data=X,
    model_output='probability'
    # model_output='predict_proba'
)
explanation = explainer(X)
print(explanation.shape)  # The shape and indexing is different for RF
shap.plots.waterfall(explanation[0])

Traceback

No response

Expected Behavior

I would expect the tree based models to behave similarly and work with predict_proba as well as output the same shape for the explanation object when used with 'probability', since as far as I know they all have a predict_proba attribute that outputs values in the same format.

Bug report checklist

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest release of shap.
I have confirmed this bug exists on the master branch of shap.
I'd be interested in making a PR to fix this bug

Installed Versions

0.43

The text was updated successfully, but these errors were encountered:

CloseChoice · 2023-12-15T16:32:06Z

related to #3318

joelostblom added the bug Indicates an unexpected problem or unintended behaviour label Dec 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432

BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432

joelostblom commented Dec 10, 2023 •

edited

CloseChoice commented Dec 15, 2023

BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432

BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432

Comments

joelostblom commented Dec 10, 2023 • edited

Issue Description

Minimal Reproducible Example

Traceback

Expected Behavior

Bug report checklist

Installed Versions

CloseChoice commented Dec 15, 2023

joelostblom commented Dec 10, 2023 •

edited