[ML] Rework computing SHAP to avoid using data frame storage #1023

tveasey · 2020-02-24T14:32:47Z

This also finishes up multi-parameter loss function support.

Note I don't save and restore CTreeShapFeatureImportance because train is always run and so this will always be reinitialised before any SHAP values are computed. For failover we should almost certainly keep track of SHAP values already computed and written, but it might make more sense for the Java to pass this information. I'm therefore deferring this detail to a later PR.

Finally, I've removed multithreading of the SHAP code, because it only gets passed one row at a time now. We should be able to parallelise over trees, but this will require a small tweak to parallel_for_each. Since we don't currently pass number threads from Java, I'll make this change in a follow on PR.

…ame and finish up changes for multi-parameter loss functions

valeriy42

This looks good. My only comment concerns the computation of internal node values. I think it is unnecessary if SHAP computation is not required.

lib/maths/CBoostedTreeImpl.cc

…rence model

tveasey · 2020-02-26T11:36:22Z

Thanks @valeriy42. I followed your suggestion, but left a TODO. Can you take another look.

valeriy42

LGTM

…#1023)

) Backport #1023.

tveasey added 2 commits February 24, 2020 14:14

Rework computing shap values to avoid storing anything in the data fr…

2044c67

…ame and finish up changes for multi-parameter loss functions

Small tidy

33f0a07

tveasey added >enhancement review v8.0.0 :ml/DataFrameAnalysis v7.7.0 labels Feb 24, 2020

tveasey requested a review from valeriy42 February 24, 2020 14:32

tveasey added 2 commits February 24, 2020 14:34

Docs

7d830b0

Unit test fallout

8388877

valeriy42 reviewed Feb 26, 2020

View reviewed changes

lib/maths/CBoostedTreeImpl.cc Outdated Show resolved Hide resolved

Don't compute internal node values until they are exposed in the infe…

24d7c4e

…rence model

valeriy42 approved these changes Feb 26, 2020

View reviewed changes

tveasey merged commit 2fcd104 into elastic:master Feb 26, 2020

tveasey deleted the multiclass-7 branch February 26, 2020 16:59

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Feb 26, 2020

[ML] Rework computing SHAP to avoid using data frame storage (elastic…

d63f3f1

…#1023)

tveasey mentioned this pull request Feb 26, 2020

[7.7][ML] Rework computing SHAP to avoid using data frame storage #1029

Merged

tveasey added a commit that referenced this pull request Feb 26, 2020

[7.7][ML] Rework computing SHAP to avoid using data frame storage (#1029

04c55a4

) Backport #1023.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Rework computing SHAP to avoid using data frame storage #1023

[ML] Rework computing SHAP to avoid using data frame storage #1023

tveasey commented Feb 24, 2020

valeriy42 left a comment

tveasey commented Feb 26, 2020

valeriy42 left a comment

[ML] Rework computing SHAP to avoid using data frame storage #1023

[ML] Rework computing SHAP to avoid using data frame storage #1023

Conversation

tveasey commented Feb 24, 2020

valeriy42 left a comment

Choose a reason for hiding this comment

tveasey commented Feb 26, 2020

valeriy42 left a comment

Choose a reason for hiding this comment