Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Rework computing SHAP to avoid using data frame storage #1023

Merged
merged 5 commits into from
Feb 26, 2020

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Feb 24, 2020

This also finishes up multi-parameter loss function support.

Note I don't save and restore CTreeShapFeatureImportance because train is always run and so this will always be reinitialised before any SHAP values are computed. For failover we should almost certainly keep track of SHAP values already computed and written, but it might make more sense for the Java to pass this information. I'm therefore deferring this detail to a later PR.

Finally, I've removed multithreading of the SHAP code, because it only gets passed one row at a time now. We should be able to parallelise over trees, but this will require a small tweak to parallel_for_each. Since we don't currently pass number threads from Java, I'll make this change in a follow on PR.

…ame and finish up changes for multi-parameter loss functions
Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. My only comment concerns the computation of internal node values. I think it is unnecessary if SHAP computation is not required.

lib/maths/CBoostedTreeImpl.cc Outdated Show resolved Hide resolved
@tveasey
Copy link
Contributor Author

tveasey commented Feb 26, 2020

Thanks @valeriy42. I followed your suggestion, but left a TODO. Can you take another look.

Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey tveasey merged commit 2fcd104 into elastic:master Feb 26, 2020
@tveasey tveasey deleted the multiclass-7 branch February 26, 2020 16:59
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants