Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to Calculate Shapley Values on a Re-weighted Tree #7550

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments
Closed

Ability to Calculate Shapley Values on a Re-weighted Tree #7550

exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments

Comments

@exalate-issue-sync
Copy link

Add the ability to update the node weights of a tree based on a subset of the training population while keeping the original leaf-node predictions. Calculate Shapley on this re-weighted tree. The goal of this approach is to be able to calculate Shap values based on a subset of population.

This is somewhat similar to the refit option in LightGBM with decay_rate = 1.

(Note: Keep the original prediction so the Shapley values sum-up to the actual raw model prediction before sigmoid transformation.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: First version implemented in an experimental API, example usage:

{code:python} import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator

h2o.init()

prostate_frame = h2o.import_file("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
prostate_frame["RACE"] = prostate_frame["RACE"].asfactor()
prostate_frame["CAPSULE"] = prostate_frame["CAPSULE"].asfactor()

x = ["AGE", "RACE", "GLEASON", "DCAPS", "PSA", "VOL", "DPROS"]
y = 'CAPSULE'

gbm_model = H2OGradientBoostingEstimator()
gbm_model.train(x=x, y=y, training_frame=prostate_frame)

# 1. Get original contributions
contribs_original = gbm_model.predict_contributions(prostate_frame)
print(contribs_original)

# 2. Scale weights => contributions should stay the same 
prostate_frame["weights"] = 2
h2o.rapids('(tree.update.weights {} {} "{}")'.format(gbm_model.model_id, prostate_frame.frame_id, "weights"))
contribs_reweighted = gbm_model.predict_contributions(prostate_frame)
print(contribs_reweighted)

# 3. Reweight based on small subset of the data => contributions are expected to change
prostate_subset = prostate_frame.head(10)
h2o.rapids('(tree.update.weights {} {} "{}")'.format(gbm_model.model_id, prostate_subset.frame_id, "weights"))
contribs_subset = gbm_model.predict_contributions(prostate_subset)
print(contribs_subset){code}

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:557058:f0137791-c6cb-47bd-bcce-fc81ad4cfefa] I changed the name of the rapid function from {{sharedtree.update.weights}} to {{tree.update.weights}} because now we will also support XGBoost.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: PR for reweighting in XGBoost: [https://github.com//pull/5502|https://github.com//pull/5502|smart-link]

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Reopening to add fix for zero weights (for some zero weights were not calculating contributions correctly)

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8099
Assignee: Michal Kurka
Reporter: Megan Kurka
State: Resolved
Fix Version: 3.34.0.1
Attachments: N/A
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#5465
#5502
#5518
#5568

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant