[ML] Feature importance performance optimization #1005

valeriy42 · 2020-02-17T12:56:02Z

This PR aims to improve the computation efficiency of the feature importance computation. To this end, it introduces continuous memory array to store elements from the split path, pre-reserved at the beginning. Furthermore, I split scale values away from fractionOnes, fractionZeros, and splitIndex to improve cache efficiency. Altogether, I could reduce the computation time by half.

I will look into improving SHAP algorithms by introducing suitable heuristics in a follow-up PR.

tveasey

Overall this looks great @valeriy42! My main comment is that it feels like we could better encapsulate this implementation if we made use of CPathElementAccessor to wrap both iterators. Can you see any obstacles to doing this, also WDYT?

include/maths/CTreeShapFeatureImportance.h

tveasey · 2020-02-17T14:44:11Z

lib/maths/CTreeShapFeatureImportance.cc

+        TDoubleVec scaleVector;
+        // need a bit more memory than max depth
+        pathVector.reserve(((maxDepthOverall + 2) * (maxDepthOverall + 3)) / 2);
+        scaleVector.reserve(((maxDepthOverall + 2) * (maxDepthOverall + 3)) / 2);


I think these probably need to be resizes to avoid copies in shapRecursive being undefined behaviour. Or else you need to use std::back_inserter iterator wrappers of the containers.

I also think it would be nice to note down the origin of + 2 and + 3. I can see that why this is coming from the sum of an arithmetic progression in the worst case up to maxDepthOverall, but would be useful to explain this better. Also, at the same time you could explain the overall strategy of copying the "current path" to the end of each memory arena.

I added a comment in ca3d891

lib/maths/CTreeShapFeatureImportance.cc

…rmance-improvement

valeriy42 · 2020-02-19T10:55:23Z

Thank you @tveasey for the review comments. I refactored the code and implemented your suggestions. Please let me know if everything is ok now.

tveasey

Thanks @valeriy42. There are a couple other loop simplifications in unwindPath that got missed. However, happy to go ahead and approve. Great work!

lib/maths/CTreeShapFeatureImportance.cc

include/maths/CTreeShapFeatureImportance.h

This PR aims to improve the computation efficiency of the feature importance computation. To this end, it introduces continuous memory array to store elements from the split path, pre-reserved at the beginning. Furthermore, I split scale values away from fractionOnes, fractionZeros, and splitIndex to improve cache efficiency. Altogether, I could reduce the computation time by half. I will look into improving SHAP algorithms by introducing suitable heuristics in a follow-up PR.

This PR aims to improve the computation efficiency of the feature importance computation. To this end, it introduces continuous memory array to store elements from the split path, pre-reserved at the beginning. Furthermore, I split scale values away from fractionOnes, fractionZeros, and splitIndex to improve cache efficiency. Altogether, I could reduce the computation time by half. I will look into improving SHAP algorithms by introducing suitable heuristics in a follow-up PR. Backport of #1005

valeriy42 added 12 commits February 13, 2020 11:05

stuct of vectors to vector of structs

747deaa

eliminate s_NextIndex

8624ca3

single instance of path

cacc369

single instance of path

a16c113

magic formula for memory reservation

f0c4f8b

scale extracted

9cc441a

formatting

f30da51

performance optimization. all tests passed

16f3906

simplify extendPath

af3c587

sumUnwoundPath simplified

9731f41

unwindPath simplified

1cb6bd0

formatting

33f8429

valeriy42 added >enhancement WIP :ml v8.0.0 v7.7.0 labels Feb 17, 2020

valeriy42 added 3 commits February 17, 2020 14:56

fix for root method

d827cb5

minor refactorings, undo test changes

ed86747

comment formatting

da4d17d

valeriy42 requested a review from tveasey February 17, 2020 14:10

valeriy42 removed the WIP label Feb 17, 2020

tveasey reviewed Feb 17, 2020

View reviewed changes

valeriy42 added 7 commits February 18, 2020 15:28

move scale into element accessor

488e6d1

fix numerical issue with random test

685ad17

formatting

5106e28

move find method

33026b9

extendPath refactoring

cbd1558

sumUnwoundPath refactoring

3276b9d

formatting

ff42b5e

valeriy42 added 5 commits February 19, 2020 10:26

refactorings

5e58c57

reduce memory requirement

82ff3f8

Merge branch 'master' of https://github.com/elastic/ml-cpp into Perfo…

323b676

…rmance-improvement

Comments

ca3d891

Change method signature

b864d9f

tveasey approved these changes Feb 19, 2020

View reviewed changes

lib/maths/CTreeShapFeatureImportance.cc Show resolved Hide resolved

lib/maths/CTreeShapFeatureImportance.cc Outdated Show resolved Hide resolved

tveasey reviewed Feb 19, 2020

View reviewed changes

lib/maths/CTreeShapFeatureImportance.cc Outdated Show resolved Hide resolved

tveasey reviewed Feb 19, 2020

View reviewed changes

include/maths/CTreeShapFeatureImportance.h Outdated Show resolved Hide resolved

reviewers comments

9152695

valeriy42 merged commit bd8a143 into elastic:master Feb 19, 2020

valeriy42 deleted the Performance-improvement branch February 19, 2020 14:13

valeriy42 mentioned this pull request Feb 19, 2020

[7.7][ML] Feature importance performance optimization #1016

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Feature importance performance optimization #1005

[ML] Feature importance performance optimization #1005

valeriy42 commented Feb 17, 2020 •

edited

Loading

tveasey left a comment

tveasey Feb 17, 2020

tveasey Feb 17, 2020

valeriy42 Feb 19, 2020

valeriy42 commented Feb 19, 2020

tveasey left a comment

[ML] Feature importance performance optimization #1005

[ML] Feature importance performance optimization #1005

Conversation

valeriy42 commented Feb 17, 2020 • edited Loading

tveasey left a comment

Choose a reason for hiding this comment

tveasey Feb 17, 2020

Choose a reason for hiding this comment

tveasey Feb 17, 2020

Choose a reason for hiding this comment

valeriy42 Feb 19, 2020

Choose a reason for hiding this comment

valeriy42 commented Feb 19, 2020

tveasey left a comment

Choose a reason for hiding this comment

valeriy42 commented Feb 17, 2020 •

edited

Loading