feature: Added AdaBoostClassifier and its test_sum_match #1546

diarmaidfinnerty · 2020-10-24T16:00:26Z

This commit implements the AdaBoostClassifier to the
shap/explainers/_tree.py file. I have also added the corresponding
test_sum_match function in tests/explainers/test_tree.py

This is an extension of a previous pull request #1219 which
was awaiting the test in test_tree.py.

Pull Request #1219

I had issues using AdaBoostClassifier with your package, and came across #335 and this Stack Overflow question and answer. It seemed like the OP never added a pull request, so I figured I'd add it with the updated code that uses safe_isinstance instead.

This commit implements the AdaBoostClassifier to the shap/explainers/_tree.py. I have also added the corresponding test_sum_match function in tests/explainers/test_tree.py This is an extension of a previous pull request shap#1219 which was awaiting the test in test_tree.py.

diarmaidfinnerty · 2020-10-24T16:07:25Z

I only just realised that tree_branch is a really stupid branch name :(

Somehow copy/pasted some incorrect code into shap/explainers/_tree.py

slundberg · 2020-10-27T21:21:12Z

Thanks! It looks like the unit test for AdaBoostClassifier is failing right, I am happy to review this once that is passing :)

diarmaidfinnerty · 2020-11-01T18:41:41Z

Hi,

I've implemented the changes as necessary but I'm having trouble with the unit test. Everything appears to be working are expected, however the test_sum_match_adaboost_classifier test doesn't sum exactly.

In some cases it appears to sum well and in other cases there are some issues.

np.abs(shap_values[0].sum(1) + ex.expected_value[0] - predicted[:,0]).max()

>> 0.06125793925777512

Here's the image below. I am hoping you could give some direction here as I am at a loss as to why this is not summing correctly.

diarmaidfinnerty · 2020-11-06T10:12:19Z

@slundberg can you provide any guidance here?

slundberg · 2020-12-09T19:24:27Z

Sorry to leave this hanging! It could be because adaboost uses < comparison for thresholds instead of <= like the other sklearn methods. Or it could be some param is not passed properly in setup. The best way to sort this out is to generate the smallest simplest model that still has an issue and then see what broke. I can help with the seeing what broke. Do you mind trying to strip the unit test to a small single tree ensemble for debugging?

diarmaidfinnerty · 2021-01-09T17:18:55Z

Hi, I've just seen this now. I will break down the unit test to each of the trees involved in the Adaboost. I'll probably need some 'explain like i'm five' level of help on certain parts. I'll come back in a couple of days with some extra info

slundberg · 2021-02-10T17:59:19Z

Just checking if there are any updates here? Thanks!

diarmaidfinnerty · 2021-02-10T18:35:57Z

Hi, apologies this completely slipped my mind. I'll take a look this weekend and revert

Kuchteq · 2021-09-07T00:34:03Z

Hello, has there been any progress in adding the feature? Unfortunately my paper depends on analysis of the Adaboost results. Do you maybe know a way to explain the model in other way?

ArkanEmre · 2022-08-09T16:00:42Z

Hi, I am trying to solve this problem at the moment. I think, I could use some support in identifying the problem @slundberg 😊
I implemented a test for a model that consists of a single tree and that works without a problem. As soon as the ensemble contains more than one tree the additivity check fails.

Here is the if statement I am using for the AdaBoostClassifier:

...
elif safe_isinstance(model, ("sklearn.ensemble.AdaBoostClassifier", "sklearn.ensemble._weighted_boosting.AdaBoostClassifier", "imblearn.ensemble.RUSBoostClassifier", "imblearn.ensemble._weight_boosting.RUSBoostClassifier")):
    assert hasattr(model, "estimators_"), "Model has no `estimators_`! Have you called `model.fit`?"
    self.internal_dtype = model.estimators_[0].tree_.value.dtype.type
    self.input_dtype = np.float64
    self.trees = [SingleTree(e.tree_, normalize=True, scaling=weight, data=data, data_missing=data_missing) for e, weight in zip(model.estimators_, model.estimator_weights_/sum(model.estimator_weights_))]
    self.objective = objective_name_map.get(model.base_estimator_.criterion, None) #This line is done to get the decision criteria, for example gini.
    self.tree_output = "probability"
...

Here is the additivity check:

def test_sum_match_adaboost_classifier(): 
    X_train,X_test,Y_train,Y_test = sklearn.model_selection.train_test_split(*shap.datasets.adult(), test_size=0.2, random_state=0) 
    clf = sklearn.ensemble.AdaBoostClassifier(random_state=202, n_estimators=1)
    clf.fit(X_train, Y_train) 
    predicted = clf.predict_proba(X_test) 
    ex = shap.TreeExplainer(clf) 
    shap_values = ex.shap_values(X_test) 
    assert np.abs(shap_values[0].sum(1) + ex.expected_value[0] - predicted[:,0]).max() < 1e-4, \
        "SHAP values don't sum to model output!"

Looking forward to hear about your insight!

diarmaidfinnerty added 4 commits October 24, 2020 18:36

fix: fixed issue causing integration failures

aac8e96

Somehow copy/pasted some incorrect code into shap/explainers/_tree.py

fix: trailing whitespace removed which was causing syntaxerror

5644509

fix: removed trailing whitepace in test_tree.py

57af0ca

fix: Changed Tree to SingleTree in Adaboost

ca625c7

Helias mentioned this pull request Oct 8, 2023

feat: add support for AdaBoostClassifier #3319

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Added AdaBoostClassifier and its test_sum_match #1546

feature: Added AdaBoostClassifier and its test_sum_match #1546

diarmaidfinnerty commented Oct 24, 2020

diarmaidfinnerty commented Oct 24, 2020

slundberg commented Oct 27, 2020

diarmaidfinnerty commented Nov 1, 2020 •

edited

diarmaidfinnerty commented Nov 6, 2020

slundberg commented Dec 9, 2020

diarmaidfinnerty commented Jan 9, 2021 •

edited

slundberg commented Feb 10, 2021 •

edited

diarmaidfinnerty commented Feb 10, 2021

Kuchteq commented Sep 7, 2021

ArkanEmre commented Aug 9, 2022 •

edited

feature: Added AdaBoostClassifier and its test_sum_match #1546

Are you sure you want to change the base?

feature: Added AdaBoostClassifier and its test_sum_match #1546

Conversation

diarmaidfinnerty commented Oct 24, 2020

diarmaidfinnerty commented Oct 24, 2020

slundberg commented Oct 27, 2020

diarmaidfinnerty commented Nov 1, 2020 • edited

diarmaidfinnerty commented Nov 6, 2020

slundberg commented Dec 9, 2020

diarmaidfinnerty commented Jan 9, 2021 • edited

slundberg commented Feb 10, 2021 • edited

diarmaidfinnerty commented Feb 10, 2021

Kuchteq commented Sep 7, 2021

ArkanEmre commented Aug 9, 2022 • edited

diarmaidfinnerty commented Nov 1, 2020 •

edited

diarmaidfinnerty commented Jan 9, 2021 •

edited

slundberg commented Feb 10, 2021 •

edited

ArkanEmre commented Aug 9, 2022 •

edited