Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Added AdaBoostClassifier and its test_sum_match #1546

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

diarmaidfinnerty
Copy link

This commit implements the AdaBoostClassifier to the
shap/explainers/_tree.py file. I have also added the corresponding
test_sum_match function in tests/explainers/test_tree.py

This is an extension of a previous pull request #1219 which
was awaiting the test in test_tree.py.

Pull Request #1219

I had issues using AdaBoostClassifier with your package, and came across #335 and this Stack Overflow question and answer. It seemed like the OP never added a pull request, so I figured I'd add it with the updated code that uses safe_isinstance instead.

This commit implements the AdaBoostClassifier to the
shap/explainers/_tree.py. I have also added the corresponding
test_sum_match function in tests/explainers/test_tree.py

This is an extension of a previous pull request shap#1219 which
was awaiting the test in test_tree.py.
@diarmaidfinnerty
Copy link
Author

I only just realised that tree_branch is a really stupid branch name :(

@slundberg
Copy link
Collaborator

Thanks! It looks like the unit test for AdaBoostClassifier is failing right, I am happy to review this once that is passing :)

@diarmaidfinnerty
Copy link
Author

diarmaidfinnerty commented Nov 1, 2020

Hi,

I've implemented the changes as necessary but I'm having trouble with the unit test. Everything appears to be working are expected, however the test_sum_match_adaboost_classifier test doesn't sum exactly.

In some cases it appears to sum well and in other cases there are some issues.

np.abs(shap_values[0].sum(1) + ex.expected_value[0] - predicted[:,0]).max()

>> 0.06125793925777512

Here's the image below. I am hoping you could give some direction here as I am at a loss as to why this is not summing correctly.

image

@diarmaidfinnerty
Copy link
Author

@slundberg can you provide any guidance here?

@slundberg
Copy link
Collaborator

Sorry to leave this hanging! It could be because adaboost uses < comparison for thresholds instead of <= like the other sklearn methods. Or it could be some param is not passed properly in setup. The best way to sort this out is to generate the smallest simplest model that still has an issue and then see what broke. I can help with the seeing what broke. Do you mind trying to strip the unit test to a small single tree ensemble for debugging?

@diarmaidfinnerty
Copy link
Author

diarmaidfinnerty commented Jan 9, 2021

Hi, I've just seen this now. I will break down the unit test to each of the trees involved in the Adaboost. I'll probably need some 'explain like i'm five' level of help on certain parts. I'll come back in a couple of days with some extra info

@slundberg
Copy link
Collaborator

slundberg commented Feb 10, 2021

Just checking if there are any updates here? Thanks!

@diarmaidfinnerty
Copy link
Author

Hi, apologies this completely slipped my mind. I'll take a look this weekend and revert

@Kuchteq
Copy link

Kuchteq commented Sep 7, 2021

Hello, has there been any progress in adding the feature? Unfortunately my paper depends on analysis of the Adaboost results. Do you maybe know a way to explain the model in other way?

@ArkanEmre
Copy link

ArkanEmre commented Aug 9, 2022

Hi, I am trying to solve this problem at the moment. I think, I could use some support in identifying the problem @slundberg 😊
I implemented a test for a model that consists of a single tree and that works without a problem. As soon as the ensemble contains more than one tree the additivity check fails.

Here is the if statement I am using for the AdaBoostClassifier:

...
elif safe_isinstance(model, ("sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier", "imblearn.ensemble.RUSBoostClassifier", "imblearn.ensemble._weight_boosting.RUSBoostClassifier")):
    assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
    self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
    self.input_dtype = np.float64
    self.trees = [SingleTree(e.tree_, normalize=True, scaling=weight, data=data, data_missing=data_missing) for e, weight in zip(model.estimators_, model.estimator_weights_/sum(model.estimator_weights_))]
    self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
    self.tree_output = "probability"
...

Here is the additivity check:

def test_sum_match_adaboost_classifier(): 
    X_train,X_test,Y_train,Y_test = sklearn.model_selection.train_test_split(*shap.datasets.adult(), test_size=0.2, random_state=0) 
    clf = sklearn.ensemble.AdaBoostClassifier(random_state=202, n_estimators=1)
    clf.fit(X_train, Y_train) 
    predicted = clf.predict_proba(X_test) 
    ex = shap.TreeExplainer(clf) 
    shap_values = ex.shap_values(X_test) 
    assert np.abs(shap_values[0].sum(1) + ex.expected_value[0] - predicted[:,0]).max() < 1e-4, \
        "SHAP values don't sum to model output!" 

Looking forward to hear about your insight!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants