Adding K-Medoids clustering algorithm #5085

terkkila · 2015-08-04T19:02:55Z

Added K-Medoids clustering algorithm

Introducing new KMedoids class, which is derived from the BaseEstimator
Unit test coverage 91%
Unit tests passing: nosetests -v sklearn/cluster/tests/test_k_medoids.py
Code is documented
Example script: python examples/cluster/plot_kmedoids_digits.py

… private.

…ded tests for KMedoids::fit() and KMedoids::fit_predict()

… tests.

…ptive ones.

…Euclidean distance.

…into kmedoids

…etrics and plots the results.

…into kmedoids

terkkila · 2015-08-04T19:07:36Z

Additional notes:

PEP8 passing without complaints: pep8 sklearn/cluster/k_medoids_.py
Pyflakes passing without complaints: pyflakes sklearn/cluster/k_medoids_.py

amueller · 2015-08-04T19:12:01Z

Travis is unhappy.

…doids.

…ith python 2.7

terkkila · 2015-08-06T10:12:33Z

Travis is happy now. Is there something I can do to the AppVeyor error?

TomDLT · 2015-08-06T12:24:29Z

Nice work !
Appveyor is down

TomDLT · 2015-08-06T12:28:24Z

sklearn/cluster/k_medoids_.py

+
+        # Check n_clusters
+        if (n_clusters is None or 
+            n_clusters <= 0 or 


trailing whitespace lines 52 and 53

TomDLT · 2016-08-18T14:24:00Z

sklearn/cluster/k_medoids_.py

+        return labels
+
+    def inertia(self, X):
+


TomDLT · 2016-08-18T14:40:52Z

This work looks good to me, thanks @terkkila !
Address my comments and I give my +1.

agramfort · 2016-08-20T14:41:17Z

sklearn/cluster/k_medoids_.py

+
+    Parameters
+    ----------
+    n_clusters : int, optional, default: 8


is it the same default as kmeans?

yes, it is.

agramfort · 2016-08-20T14:46:07Z

all CIs are not happy

before considering estimators for inclusion, we ask for arguments why this does better/more than what we already have. I see that you motivate KMedoids by the fact that any metric can be employed. ok but in terms of use case do you see any compelling argument? Said differently for what problem should I use KMedoids vs KMeans and why?

amueller · 2016-08-26T19:53:13Z

I think the main advantage is that you can specify custom metrics for which computing the mean is not easy / feasible / known. An example illustrating that would be nice. Maybe L1? (though I have no idea how that looks actually) Doing robustness to outliers would also be interesting (also see http://stackoverflow.com/questions/21619794/what-makes-the-distance-measure-in-k-medoid-better-than-k-means)

terkkila · 2016-08-26T20:18:59Z

Hi,

The most compelling use cases that I have faced is when working with time
series data, say demand of a products or consumption of electricity of
households, where comparing one time series along the original time axis is
not optimal due to shifts by some offsets. In such cases, Dynamic Time
Warping can be used, but is just one among many options, that can be
plugged into K-Medoids. I have worked with this DTW & KMedoids combination
using different datasets, and it seems to be robust and well-performing
choice.

Cheers,
Timo

On Fri, Aug 26, 2016 at 10:54 PM, Andreas Mueller notifications@github.com
wrote:

I think the main advantage is that you can specify custom metrics for
which computing the mean is not easy / feasible / known. An example
illustrating that would be nice.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5085 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABKkLWM7hudHY3j7lezKjgkALTL5h1y0ks5qj0RZgaJpZM4FljPk
.

amueller · 2016-08-26T20:38:06Z

there's no dtw in scipy, right?

@bmcfee you're gonna contribute upstream, IIRC ;)

It sounds like an awesome example, but without the metric being in scipy, no dice.
EMD would also be cool, but same problem.

bmcfee · 2016-08-26T21:33:33Z

We recently merged a numba-accelerated dtw in librosa. No plans yet to push
upstream, but it might be worth looking into. I'm not all that keen on
reimplementing it in cython though.

On Fri, Aug 26, 2016, 16:39 Andreas Mueller notifications@github.com
wrote:

there's no dtw in scipy, right?

@bmcfee https://github.com/bmcfee you're gonna contribute upstream,
IIRC ;)

It sounds like an awesome example, but without the metric being in scipy,
no dice.
EMD would also be cool, but same problem.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5085 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABIqjGWIXTyXeCCAjm2uCmaERg_U8S0eks5qj07fgaJpZM4FljPk
.

jnothman · 2016-08-27T11:17:49Z

I guess the questions are:

Is dtw able to be written simple enough that it can be included directly
in an example
Is there another distance measure used frequently that could motivate
KMedoids ?
Are we able to show the benefit using an external library to get this
merged, but merely mention it in documentation without motivating example
(and/or point to an external library where the example is shown)?

On 27 August 2016 at 07:33, Brian McFee notifications@github.com wrote:

We recently merged a numba-accelerated dtw in librosa. No plans yet to push
upstream, but it might be worth looking into. I'm not all that keen on
reimplementing it in cython though.

On Fri, Aug 26, 2016, 16:39 Andreas Mueller notifications@github.com
wrote:

there's no dtw in scipy, right?

@bmcfee https://github.com/bmcfee you're gonna contribute upstream,
IIRC ;)

It sounds like an awesome example, but without the metric being in scipy,
no dice.
EMD would also be cool, but same problem.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5085
issuecomment-242845204>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/
ABIqjGWIXTyXeCCAjm2uCmaERg_U8S0eks5qj07fgaJpZM4FljPk>

.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#5085 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz61EXkIf7JUQJ82rIgGdLVkpfXciEks5qj1uvgaJpZM4FljPk
.

Kornel · 2016-10-15T14:05:21Z

Hi, what's the status on this pull request? As far as I can follow this discussion there is no agreement weather this should be merged or not? In my opinion, it would be worth to include this to scikit. Recently I tried to find clusters within a space where the distance was calculated in a unusual way and this copy pasted code from the PR helped me to gain some insights into my data.

amueller · 2016-10-17T16:20:23Z

I think we do want this. It would have been nice to have a compelling example, but I don't want this to be the blocker, either. The tests are failing, though, and the PR needs additional reviews.

Kornel · 2016-10-17T19:11:00Z

@terkkila
I did a rebase on top of scikit-learn/master and created a PR to @terkkila's repository here: terkkila#1 , it was an easy exercise as there was only a small conflict in the doc. Please take a look if it makes sense.

@amueller
However, I can't compile the rebased version, I get a lot of errors like these:

ERROR: Failure: ImportError (dlopen(/Users/kornel/workspace/github/scikit-learn/sklearn/metrics/pairwise_fast.so, 2): Library not loaded: libmkl_intel_lp64.dylib
  Referenced from: /Users/kornel.kielczewski/workspace/github/scikit-learn/sklearn/metrics/pairwise_fast.so
  Reason: image not found)

I tried setting

LD_LIBRARY_PATH=/Users/kornel/anaconda/envs/pymc1/lib make

as this is where libmkl_intel_lp64.dylib is located but that did not change anything

amueller · 2016-10-17T22:23:21Z

@Kornel sorry, I'm not sure why that is. Possibly issues with which numpy you're using? what does ldd say?

Kornel · 2016-10-18T08:50:35Z

@amueller an update to mlk fixed this issue. Now make succeeds!

I've fixed the tests and did a rebase and created a new pull request here: #7694

WDYT?

terkkila · 2016-10-20T09:32:21Z

Thanks a lot Kornel for taking this forward, much appreciated!

Cheers,
Timo

On Tue, Oct 18, 2016 at 11:51 AM, Kornel Kiełczewski <
notifications@github.com> wrote:

@amueller https://github.com/amueller an update to mlk fixed this
issue. I've fixed the tests and did a rebase and created a new pull request
here: #7694 #7694

WDYT?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#5085 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABKkLR6aAM2_qj4inkGOxEBx-Brdnn4jks5q1IiPgaJpZM4FljPk
.

qinhanmin2014 · 2018-07-08T14:01:44Z

Work continued in #11099

qinhanmin2014 · 2018-07-08T14:02:57Z

Thanks @terkkila for your great work.

terkkila added 19 commits July 31, 2015 20:48

Added new input arguments: clustering and distance_metric.

e90d3f5

Removed deprecated mlpy import.

cd26565

Allowed usage of any pairwise distance metric defined in Scikit-Learn.

61a41a8

Added unit tests. Renamed k -> n_clusters. Made some member variables…

c889133

… private.

KMedoids is now derived from BaseEstimator, and has proper mixins. Ad…

2782f95

…ded tests for KMedoids::fit() and KMedoids::fit_predict()

Added KMedoids::transform(). Update transform and predict methods and…

88d3fe4

… tests.

Updated initialization. Updated unit tests.

3654625

Added naive test for KMedoids::fit(). Updated KMedoids interface.

8125449

Modifed according to PEP8 requirements.

0ddc25c

Added authors and license. Changed some variable names to more descri…

f635fca

…ptive ones.

Testing all available pairwise distance functions for fitting in a loop.

0748852

Added testing with iris data set.

de7048e

In unit test, comparing K-Medoids to K-Means when K-Medoids is using …

2a1ee1d

…Euclidean distance.

Refactored. Added comments.

e1bdf71

Small tweaks.

4343efe

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

bffcd9e

…into kmedoids

Testing the initialization method for K-Medoids. Unit test coverage 91%.

a9f7682

Adding example script, which runs K-Medoids with different distance m…

179421f

…etrics and plots the results.

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

a795c45

…into kmedoids

terkkila added 4 commits August 4, 2015 22:39

Added checks for input data, which also convert the data if needed.

cabb24b

Added try/catch around importing exceptions.

1dc8107

Added try/catch around importing exceptions in testing script of K-Me…

5f12bb2

…doids.

Made parts compatible with python 3.4 while retaining compatibility w…

59f2348

…ith python 2.7

TomDLT reviewed Aug 6, 2015
View reviewed changes

TomDLT reviewed Aug 18, 2016
View reviewed changes

sklearn/cluster/k_medoids_.py

return labels

def inertia(self, X):

Copy link

Member

TomDLT Aug 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring

agramfort reviewed Aug 20, 2016
View reviewed changes

amueller added Need Contributor New Feature and removed Need Contributor labels Aug 26, 2016

Kornel mentioned this pull request Oct 17, 2016

Rebase of your code on top of scikit-learn/master terkkila/scikit-learn#1

Closed

Kornel mentioned this pull request Oct 18, 2016

[MRG] Adding K-Medoids clustering algorithm revival #7694

Closed

qinhanmin2014 closed this Jul 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding K-Medoids clustering algorithm #5085

Adding K-Medoids clustering algorithm #5085

terkkila commented Aug 4, 2015

terkkila commented Aug 4, 2015

amueller commented Aug 4, 2015

terkkila commented Aug 6, 2015

TomDLT commented Aug 6, 2015

TomDLT Aug 6, 2015

TomDLT Aug 18, 2016

TomDLT commented Aug 18, 2016

agramfort Aug 20, 2016

Kornel Oct 24, 2016

agramfort commented Aug 20, 2016

amueller commented Aug 26, 2016 •

edited

Loading

terkkila commented Aug 26, 2016

amueller commented Aug 26, 2016

bmcfee commented Aug 26, 2016

jnothman commented Aug 27, 2016

Kornel commented Oct 15, 2016

amueller commented Oct 17, 2016

Kornel commented Oct 17, 2016

amueller commented Oct 17, 2016

Kornel commented Oct 18, 2016 •

edited

Loading

terkkila commented Oct 20, 2016

qinhanmin2014 commented Jul 8, 2018

qinhanmin2014 commented Jul 8, 2018

Adding K-Medoids clustering algorithm #5085

Adding K-Medoids clustering algorithm #5085

Conversation

terkkila commented Aug 4, 2015

terkkila commented Aug 4, 2015

amueller commented Aug 4, 2015

terkkila commented Aug 6, 2015

TomDLT commented Aug 6, 2015

TomDLT Aug 6, 2015

Choose a reason for hiding this comment

TomDLT Aug 18, 2016

Choose a reason for hiding this comment

TomDLT commented Aug 18, 2016

agramfort Aug 20, 2016

Choose a reason for hiding this comment

Kornel Oct 24, 2016

Choose a reason for hiding this comment

agramfort commented Aug 20, 2016

amueller commented Aug 26, 2016 • edited Loading

terkkila commented Aug 26, 2016

amueller commented Aug 26, 2016

bmcfee commented Aug 26, 2016

jnothman commented Aug 27, 2016

Kornel commented Oct 15, 2016

amueller commented Oct 17, 2016

Kornel commented Oct 17, 2016

amueller commented Oct 17, 2016

Kornel commented Oct 18, 2016 • edited Loading

terkkila commented Oct 20, 2016

qinhanmin2014 commented Jul 8, 2018

qinhanmin2014 commented Jul 8, 2018

amueller commented Aug 26, 2016 •

edited

Loading

Kornel commented Oct 18, 2016 •

edited

Loading