BUG: Inconsistencies between different tree-based ensembles with the TreeExplainer #3432
Open
2 of 4 tasks
Labels
bug
Indicates an unexpected problem or unintended behaviour
Issue Description
I'm struggling with some inconsistencies between different tree based models when passing
'probability'
vs'predict_proba'
as themodel_output
to a shapTreeExplainer
. There are two different issues here:XGBClassifier
andHistGradientBoostingClassifier
both work with eithermodel_output='predict_proba'
or'probability'
, whereasRandomForestClassifier
andLGBMClassifier
only works with'probability'
and will throw a warning with'predict_proba'
(and this warning seems incorrect since the TreeExplainer works well for the other two tree-based models, also reported in LightGBM categorical feature support for Shap values in probability #2899):explanation
array is inconsistent whenmodel_output='probability'
. ForXGBClassifier
,HistGradientBoostingClassifier
, andLGBMClassifier
the array has a single number per observation for the.base_values
attribute (the proportion/probability of the majority class). However, with theRandomForestClassifier
, there are two values per observations: the probability of both the majority and minority class. It would be convenient if this could be standardized since'probability'
is a built-in option in shap, so that the syntax when plotting could be the same (currently the indexing for the random forest would look likeshap.plots.waterfall(explainer[0, :, 1])
, whereas for the other three it is justshap.plots.waterfall(explainer[0])
Minimal Reproducible Example
Traceback
No response
Expected Behavior
I would expect the tree based models to behave similarly and work with
predict_proba
as well as output the same shape for the explanation object when used with'probability'
, since as far as I know they all have apredict_proba
attribute that outputs values in the same format.Bug report checklist
Installed Versions
0.43
The text was updated successfully, but these errors were encountered: