Adding implementation of Hartigan's K-Means #16123

assaftibm · 2020-01-14T20:59:24Z

Hi,

I'm implementing Hartigan's K-Means in C++ with a Cython wrapper, and when it's done I'd be glad to contribute it to scikit-learn. The implementation follows the pseudo code described in the IJCAI '13 paper by Slonim, Aharoni and Crammer (https://dl.acm.org/doi/10.5555/2540128.2540369) + some optimizations of my own that make the run-time comparable to Lloyd's K-Means.

I'd like to know if the community welcomes this addition.

Thank you.

ogrisel · 2020-01-15T14:20:54Z

I was not familiar with Hartigan's K-Means but it looks interesting.

However we would rather not add anymore C++ in the scikit-learn codebase and rather focus on Cython.

But before considering implementing Hartigan's K-Means in Cython, let's focus on finishing the new implementation of Lloyd's in #11950 which is significantly more memory efficient and scalable efficient on machines with many CPU cores.

glemaitre added the Enhancement label Jan 15, 2020

cmarmo added the module:cluster label Jan 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding implementation of Hartigan's K-Means #16123

Adding implementation of Hartigan's K-Means #16123

assaftibm commented Jan 14, 2020

ogrisel commented Jan 15, 2020

Adding implementation of Hartigan's K-Means #16123

Adding implementation of Hartigan's K-Means #16123

Comments

assaftibm commented Jan 14, 2020

ogrisel commented Jan 15, 2020