Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG] Adding K-Medoids clustering algorithm revival #7694
2 times, most recently
Oct 24, 2016
@zdog234 A precomputed matrix should definitely work.. @jnothman yes, I think I need help with this one, I can't find enough time to fix it. Sorry it takes so long :( Recently I was struggling with a repetitive build process on my local machine, one issue being e.g. 10213.…
On 16 April 2018 at 00:08, zdog234 ***@***.***> wrote: Also, I don’t know if this has been explored for KMedoids, but it could be useful to implement something like the kmeans++ algorithm for initial medoid selection if it hasn’t been already — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7694 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAmW6WoIoJ7jWz-VKsLUttgdsVr_e5Zyks5to8TIgaJpZM4KZh05> .
Shouldn’t deduplication be a separate step in a processing pipeline specific to the task you’re trying to solve? Deduplication raises a question what does it mean for two objects to be equal which could be addressed using the similarity/distance function used in the kmedoids algorithm itself, but then sometimes it’s enough for a specific use case to determine two objects equal when their distance d is less than a specific epsilon, sometimes they should be equal without any margin. I’d implement that as a separate class. What do you think? Kornel…
On 17 Apr 2018, at 04:36, zdog234 ***@***.***> wrote: In this situation, by degeneracy, I mean having duplicate samples. I’m pretty sure KMedoids can be done with just the unique values and the count of each unique value, potentially cutting the size of the distance matrix by multiple orders of magnitude. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
D_train = pairwise_distances(X_train) D_test = pairwise_distances(X_test, X_train)
This is what's we work with in NearestNeighbors given mertric='precomputed'.
Predict relies on checking which is the nearest centroid, yeah? It doesn't need the complete distance matrix in general, but surely it needs the relative distances to each centroid.
Users who want to be efficient will have to be clever to pass:
D_test = np.zeros(len(X_test), len(X_train)) + np.inf D_test[:, kmedoids.centroid_idx_] = pairwise_distances(X_test, X_train[kmedoids.centroid_idx_])
So we can document in