-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding K-Medoids clustering algorithm #5085
Conversation
…ded tests for KMedoids::fit() and KMedoids::fit_predict()
…Euclidean distance.
…etrics and plots the results.
Additional notes:
|
Travis is unhappy. |
Travis is happy now. Is there something I can do to the AppVeyor error? |
Nice work ! |
sklearn/cluster/k_medoids_.py
Outdated
|
||
# Check n_clusters | ||
if (n_clusters is None or | ||
n_clusters <= 0 or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing whitespace lines 52 and 53
return labels | ||
|
||
def inertia(self, X): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstring
This work looks good to me, thanks @terkkila ! |
|
||
Parameters | ||
---------- | ||
n_clusters : int, optional, default: 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the same default as kmeans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is.
all CIs are not happy before considering estimators for inclusion, we ask for arguments why this does better/more than what we already have. I see that you motivate KMedoids by the fact that any metric can be employed. ok but in terms of use case do you see any compelling argument? Said differently for what problem should I use KMedoids vs KMeans and why? |
I think the main advantage is that you can specify custom metrics for which computing the mean is not easy / feasible / known. An example illustrating that would be nice. Maybe L1? (though I have no idea how that looks actually) Doing robustness to outliers would also be interesting (also see http://stackoverflow.com/questions/21619794/what-makes-the-distance-measure-in-k-medoid-better-than-k-means) |
Hi, The most compelling use cases that I have faced is when working with time Cheers, On Fri, Aug 26, 2016 at 10:54 PM, Andreas Mueller notifications@github.com
|
there's no dtw in scipy, right? @bmcfee you're gonna contribute upstream, IIRC ;) It sounds like an awesome example, but without the metric being in scipy, no dice. |
We recently merged a numba-accelerated dtw in librosa. No plans yet to push On Fri, Aug 26, 2016, 16:39 Andreas Mueller notifications@github.com
|
I guess the questions are:
On 27 August 2016 at 07:33, Brian McFee notifications@github.com wrote:
|
Hi, what's the status on this pull request? As far as I can follow this discussion there is no agreement weather this should be merged or not? In my opinion, it would be worth to include this to scikit. Recently I tried to find clusters within a space where the distance was calculated in a unusual way and this copy pasted code from the PR helped me to gain some insights into my data. |
I think we do want this. It would have been nice to have a compelling example, but I don't want this to be the blocker, either. The tests are failing, though, and the PR needs additional reviews. |
@terkkila @amueller
I tried setting
as this is where libmkl_intel_lp64.dylib is located but that did not change anything |
@Kornel sorry, I'm not sure why that is. Possibly issues with which numpy you're using? what does |
Thanks a lot Kornel for taking this forward, much appreciated! Cheers, On Tue, Oct 18, 2016 at 11:51 AM, Kornel Kiełczewski <
|
Work continued in #11099 |
Thanks @terkkila for your great work. |
Added K-Medoids clustering algorithm
nosetests -v sklearn/cluster/tests/test_k_medoids.py
python examples/cluster/plot_kmedoids_digits.py