You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now we emit splits as feature importance for all the boosters. But they have some drawbacks:
they depend on number of trees and tree depth
features with high cardinality will more frequently be part of split
hard to compare importance between boosters and different runs
We propose to compute SHAP scores for feature importance:
a linear estimation on how each feature contributes to the final metric
same implementation for all the boosters
does not depend on # of trees and tree depth
TreeSHAP is a major PITA (as shap4j does not support windows, and requires python-specific pickling), so we can just implement KernelSHAP and call it a day:
there should be a time budget for the whole MC sampling during KernelSHAP evaluation
we may also emit dispersion as a measure of confidence in the estimation of each feature.
The text was updated successfully, but these errors were encountered:
Now we emit splits as feature importance for all the boosters. But they have some drawbacks:
We propose to compute SHAP scores for feature importance:
TreeSHAP is a major PITA (as shap4j does not support windows, and requires python-specific pickling), so we can just implement KernelSHAP and call it a day:
The text was updated successfully, but these errors were encountered: