Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Speed up the lat_long function #1102

Merged
merged 3 commits into from
Apr 2, 2020
Merged

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Mar 30, 2020

Clustering can be the bottleneck for the lat_long function.

This reworks the calculation of the distances to the selected points in k-means++ initialisation. Before we were creating a k-d tree for each point we added and looking up nearest neighbours. This is unnecessary since we can simply update the distances directly, i.e. distance_i = min(distance_i, distance(selected, x_i)). The other main speedup comes from the fact that in #1037 I reworked online k-means to remove the points buffer. This saves us memory and we can spend this memory by accumulating more points before we re-cluster.

I also made some tweaks to cutdown the number of allocations (principally by moving various variables into place). Finally, I corrected CKMeansOnline::split to copy all the parent clustering parameters into each split and made a couple of other small tidy ups.

In total, I get around a 60% improvement in runtime from these changes for CXMeansOnline.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/CHANGELOG.asciidoc Outdated Show resolved Hide resolved
@tveasey tveasey merged commit d54de38 into elastic:master Apr 2, 2020
@tveasey tveasey deleted the speedup-x-means branch April 2, 2020 16:15
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Apr 2, 2020
tveasey added a commit that referenced this pull request Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants