You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I have noticed that shap produces weird results when used on the HistGradientBoostingRegressor from scikit-learn.
This model has a parameter categorical_features to indicate which columns should be treated as categorical, but using it leads to strange results.
Below I made a simple example with only one feature X with 5 discrete values, and y = X ** 2, and I compare the SHAP values with/without the use of the categorical_features parameter.
Figure 1: SHAP values WITHOUT categorical_features
Figure 2: SHAP values WITH categorical_features
Minimal Reproducible Example
importshapimportnumpyasnpimportpandasaspdfromsklearn.ensembleimportHistGradientBoostingRegressorX=pd.DataFrame(np.random.randint(0, 5, size=1000), columns=["A"])
y=X["A"] **2# Use HistGradientBoostingRegressor WITHOUT categorical featuresmodel=HistGradientBoostingRegressor()
model.fit(X, y)
explainer=shap.Explainer(model)
shap_values=explainer(X)
shap.plots.scatter(shap_values[:, "A"], color=shap_values, hist=False)
# Use HistGradientBoostingRegressor WITH categorical featuresmodel=HistGradientBoostingRegressor(categorical_features=["A"])
model.fit(X, y)
explainer=shap.Explainer(model)
shap_values=explainer(X)
shap.plots.scatter(shap_values[:, "A"], color=shap_values, hist=False)
Traceback
No response
Expected Behavior
No response
Bug report checklist
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest release of shap.
I have confirmed this bug exists on the master branch of shap.
I'd be interested in making a PR to fix this bug
Installed Versions
0.44
The text was updated successfully, but these errors were encountered:
Issue Description
Hi! I have noticed that shap produces weird results when used on the HistGradientBoostingRegressor from scikit-learn.
This model has a parameter
categorical_features
to indicate which columns should be treated as categorical, but using it leads to strange results.Below I made a simple example with only one feature
X
with 5 discrete values, andy = X ** 2
, and I compare the SHAP values with/without the use of thecategorical_features
parameter.Figure 1: SHAP values WITHOUT
categorical_features
Figure 2: SHAP values WITH
categorical_features
Minimal Reproducible Example
Traceback
No response
Expected Behavior
No response
Bug report checklist
Installed Versions
0.44
The text was updated successfully, but these errors were encountered: