Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with sklearn.ensemble.GradientBoostingClassifier #2

Closed
james-pearce opened this issue Jun 29, 2020 · 3 comments
Closed

Error with sklearn.ensemble.GradientBoostingClassifier #2

james-pearce opened this issue Jun 29, 2020 · 3 comments

Comments

@james-pearce
Copy link

With a model using GradientBoostingClassifier, get an error

AssertionError: len(shap_explainer.expected_value)=1and len(labels)={len(self.labels)} do not match!

Code:

from explainerdashboard.explainers import *
from explainerdashboard.dashboards import *
from explainerdashboard.datasets import *
 
import plotly.io as pio
import os
 
# load classifier data
 
X_train, y_train, X_test, y_test = titanic_survive()
train_names, test_names = titanic_names()
 
# one-line example
 
#from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
 
 
#model = RandomForestClassifier(n_estimators=50, max_depth=5)
model = GradientBoostingClassifier(n_estimators=50, max_depth=5)
model.fit(X_train, y_train)
 
explainer = ClassifierExplainer(model, X_test, y_test)
 
explainer.plot_shap_contributions(index=0)
@oegedijk
Copy link
Owner

Ah, one of the models that I had not gotten around to writing a test for :)

In any case I can see the issue. shap.TreeExplainer(model).expected_value outputs array([-0.5844817]). For most binary classification models expected_value outputs a np.ndarray([float probability/logodds negative class, float probability/logodds positive class]), or simply a float with the probability/logodds of the positive class. However, for GradientBoostingClassifier for some reason it outputs an np.ndarray([logodds positive class]). Had not taken care of that corner case yet. (I think shap tries to follow the output format of the underlying model, but this does lead to some confusing heterogeneity in outputs formats)

In any case will add some code to autodetect this (and also force GradientBoostingClassifier to outputs probabilities by default and add some integration tests for GradientBoostingClassifier and HistGradientBoostingClassifier).

Thanks for letting me know, and let me know if you run into any other issues with other models!

@oegedijk
Copy link
Owner

released a fix with version 0.1.13

Seems to work on my end, let me know if it works for you as well..

@james-pearce
Copy link
Author

Works for me! I am astonished by the speed of your response.

Best
James

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants