Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SHAP scores for feature importance #945

Open
shuttie opened this issue Mar 4, 2023 · 0 comments
Open

SHAP scores for feature importance #945

shuttie opened this issue Mar 4, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@shuttie
Copy link
Collaborator

shuttie commented Mar 4, 2023

Now we emit splits as feature importance for all the boosters. But they have some drawbacks:

  • they depend on number of trees and tree depth
  • features with high cardinality will more frequently be part of split
  • hard to compare importance between boosters and different runs

We propose to compute SHAP scores for feature importance:

  • a linear estimation on how each feature contributes to the final metric
  • same implementation for all the boosters
  • does not depend on # of trees and tree depth

TreeSHAP is a major PITA (as shap4j does not support windows, and requires python-specific pickling), so we can just implement KernelSHAP and call it a day:

  • there should be a time budget for the whole MC sampling during KernelSHAP evaluation
  • we may also emit dispersion as a measure of confidence in the estimation of each feature.
@shuttie shuttie added the enhancement New feature or request label Mar 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant