[ML] Speed up the lat_long function #1102

tveasey · 2020-03-30T15:07:42Z

Clustering can be the bottleneck for the lat_long function.

This reworks the calculation of the distances to the selected points in k-means++ initialisation. Before we were creating a k-d tree for each point we added and looking up nearest neighbours. This is unnecessary since we can simply update the distances directly, i.e. distance_i = min(distance_i, distance(selected, x_i)). The other main speedup comes from the fact that in #1037 I reworked online k-means to remove the points buffer. This saves us memory and we can spend this memory by accumulating more points before we re-cluster.

I also made some tweaks to cutdown the number of allocations (principally by moving various variables into place). Finally, I corrected CKMeansOnline::split to copy all the parent clustering parameters into each split and made a couple of other small tidy ups.

In total, I get around a 60% improvement in runtime from these changes for CXMeansOnline.

droberts195

LGTM

docs/CHANGELOG.asciidoc

Backport #1102.

Speedup k-means++ initialization

b9e6153

tveasey added >enhancement review :ml v8.0.0 v7.8.0 labels Mar 30, 2020

tveasey requested a review from droberts195 March 30, 2020 15:07

Docs

1235cc2

droberts195 approved these changes Apr 2, 2020

View reviewed changes

docs/CHANGELOG.asciidoc Outdated Show resolved Hide resolved

Merge branch 'master' into speedup-x-means

4c64b67

tveasey merged commit d54de38 into elastic:master Apr 2, 2020

tveasey deleted the speedup-x-means branch April 2, 2020 16:15

tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Apr 2, 2020

[ML] Speed up the lat_long function (elastic#1102)

7c2a810

tveasey mentioned this pull request Apr 2, 2020

[7.8][ML] Speed up the lat_long function #1118

Merged

tveasey added a commit that referenced this pull request Apr 3, 2020

[7.8][ML] Speed up the lat_long function (#1118)

a0f2908

Backport #1102.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Speed up the lat_long function #1102

[ML] Speed up the lat_long function #1102

tveasey commented Mar 30, 2020 •

edited

Loading

droberts195 left a comment

[ML] Speed up the lat_long function #1102

[ML] Speed up the lat_long function #1102

Conversation

tveasey commented Mar 30, 2020 • edited Loading

droberts195 left a comment

Choose a reason for hiding this comment

tveasey commented Mar 30, 2020 •

edited

Loading