-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Rework computing SHAP to avoid using data frame storage #1023
Conversation
…ame and finish up changes for multi-parameter loss functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. My only comment concerns the computation of internal node values. I think it is unnecessary if SHAP computation is not required.
Thanks @valeriy42. I followed your suggestion, but left a TODO. Can you take another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This also finishes up multi-parameter loss function support.
Note I don't save and restore
CTreeShapFeatureImportance
because train is always run and so this will always be reinitialised before any SHAP values are computed. For failover we should almost certainly keep track of SHAP values already computed and written, but it might make more sense for the Java to pass this information. I'm therefore deferring this detail to a later PR.Finally, I've removed multithreading of the SHAP code, because it only gets passed one row at a time now. We should be able to parallelise over trees, but this will require a small tweak to parallel_for_each. Since we don't currently pass number threads from Java, I'll make this change in a follow on PR.