Add k-means++ initialization. #2813

rcurtin · 2021-01-21T14:50:08Z

I implemented the k-means++ initialization strategy for k-means like three years ago but for some reason never got around to contributing it back upstream. So, here it is.

https://en.wikipedia.org/wiki/K-means%2B%2B

It's quite an effective initialization strategy, and is often used in practice. It seems to often outperform the Bradley/Fayyad refined start strategy that we have implemented.

src/mlpack/methods/kmeans/kmeans_plus_plus_initialization.hpp

zoq · 2021-01-21T18:12:54Z

src/mlpack/methods/kmeans/kmeans_plus_plus_initialization.hpp

+      const double sampleValue = mlpack::math::Random();
+      double* elem = std::lower_bound(distribution.begin(), distribution.end(),
+          sampleValue);
+      size_t position = (size_t) (elem - distribution.begin()) / sizeof(double);


Perhaps we should use const here as well, to help the compiler.

Good point; done in 79d2621.

src/mlpack/tests/main.cpp

Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de>

rcurtin · 2021-01-21T23:54:31Z

Oops, I forgot to run [KMeansMainTest] locally, but the issue should be resolved now. 👍

zoq

Looks like it would make sense to slightly increase the threshold for the kmeans test.

rcurtin · 2021-01-24T22:22:11Z

You're right---I ran the test 1000 times locally and it failed a decent number of times. I reset the tolerance to be something a decent bit larger than the largest value I saw. So, hopefully, we should never see this test fail in practice. :)

zoq

Looks great to me.

mlpack-bot

Second approval provided automatically after 24 hours. 👍

src/mlpack/methods/kmeans/kmeans_plus_plus_initialization.hpp

Add k-means++ initialization.

d81f60c

rcurtin added c: methods t: added feature labels Jan 21, 2021

Update HISTORY.

295d24b

zoq added this to Need Review in PR Tracking Jan 21, 2021

zoq reviewed Jan 21, 2021

View reviewed changes

rcurtin and others added 5 commits January 21, 2021 18:36

Apply suggestions from code review

2ea6d50

Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de>

Update header guard macro name.

1d90bf8

Add const.

79d2621

Modify RequireOnlyOnePassed() to allow none to be passed.

4e39217

Update src/mlpack/tests/main.cpp

5ffb8b4

Co-authored-by: Marcus Edel <marcus.edel@fu-berlin.de>

zoq reviewed Jan 23, 2021

View reviewed changes

Update tolerance.

3890be6

zoq approved these changes Jan 25, 2021

View reviewed changes

zoq moved this from Need Review to Done in PR Tracking Jan 25, 2021

mlpack-bot bot approved these changes Jan 26, 2021

View reviewed changes

shrit reviewed Jan 26, 2021

View reviewed changes

src/mlpack/methods/kmeans/kmeans_plus_plus_initialization.hpp Show resolved Hide resolved

shrit approved these changes Jan 26, 2021

View reviewed changes

zoq merged commit f1c5ed9 into mlpack:master Jan 26, 2021

This was referenced Oct 14, 2022

Release version 4.0.0 #3285

Closed

Release version 4.0.0 #3286

Closed

rcurtin mentioned this pull request Oct 23, 2022

Release version 4.0.0 #3293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add k-means++ initialization. #2813

Add k-means++ initialization. #2813

rcurtin commented Jan 21, 2021

zoq Jan 21, 2021

rcurtin Jan 21, 2021

rcurtin commented Jan 21, 2021

zoq left a comment

rcurtin commented Jan 24, 2021

zoq left a comment

mlpack-bot bot left a comment

Add k-means++ initialization. #2813

Add k-means++ initialization. #2813

Conversation

rcurtin commented Jan 21, 2021

zoq Jan 21, 2021

Choose a reason for hiding this comment

rcurtin Jan 21, 2021

Choose a reason for hiding this comment

rcurtin commented Jan 21, 2021

zoq left a comment

Choose a reason for hiding this comment

rcurtin commented Jan 24, 2021

zoq left a comment

Choose a reason for hiding this comment

mlpack-bot bot left a comment

Choose a reason for hiding this comment