Hierarchical clustering: distance threshold #3796

vmirly · 2014-10-23T17:04:24Z

So, the output of hierarchichal clustering results can be determined either by number of clusters, or by the a distance thereshold to cut the tree at that threshold. However, scikit learn only supports one way!

class sklearn.cluster.AgglomerativeClustering(n_clusters=2, ...

As a suggestion, is it possible to add the other option, to give distance_threshold as input argument and get as many clusters are created in the output! So, one of this argument should be given, either n_clusters or the threshold, not both!

agramfort · 2014-10-23T18:18:21Z

sure it is possible !

code is open and we always welcome good contributions...

eyaler · 2015-02-05T15:04:43Z

actually there are more options for the stopping/threshold criteria:
https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.cluster.hierarchy.fcluster.html#scipy.cluster.hierarchy.fcluster
http://www.mathworks.com/help/stats/cluster.html

the inconsistency coefficient seems interesting and is the default in matlab and scipy...

mjbommar · 2015-02-05T20:31:37Z

You might also review some of the literate on graph modularity as a more robust measure of cluster quality. The igraph package (with python bindings) has a large number of these functions available for reference.

Some discussion here also re: clustering and "communitiy detection", agglomerative/divisive/spectrall approaches, etc.:
http://arxiv.org/abs/0906.0612
http://bommaritollc.com/2012/06/summary-community-detection-algorithms-igraph-0-6/

VathsalaAchar · 2017-05-11T10:12:54Z

Do you mind if I pick this up? I have implemented clustering in C++ years ago so I'll brush up and get back with questions.

jnothman · 2017-05-11T13:37:27Z

I think you'd be welcome to

…

On 11 May 2017 at 20:12, Vathsala Achar ***@***.***> wrote: Do you mind if I pick this up? I have implemented clustering in C++ years ago so I'll brush up and get back with questions. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#3796 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz66oNne-dsdHaN8gyrMBFploImN5bks5r4t8ngaJpZM4CyRYi> .

VathsalaAchar · 2017-05-12T09:45:46Z

Thanks @jnothman
I saw that @vmichel has made the necessary changes for a distance threshold (and no PR?), but I was interested in the criterion that scipy.fcluster uses (mention by @eyaler) and was wondering if that would be useful along with the distance threshold. Or is it a feature of it's own?

jnothman · 2017-05-13T23:53:15Z

designing an api that is extensible to alternate stopping criteria may bea good idea, but it's often good to implement one at a time

…

On 12 May 2017 7:45 pm, "Vathsala Achar" ***@***.***> wrote: Thanks @jnothman <https://github.com/jnothman> I saw that @vmichel <https://github.com/vmichel> has made the necessary changes for a distance threshold (and no PR?), but I was interested in the criterion that scipy.fcluster uses (mention by @eyaler <https://github.com/eyaler>) and was wondering if that would be useful along with the distance threshold. Or is it a feature of it's own? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3796 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz668aVTFBtfzK_lPIIEOJUhCcNc0Vks5r5CpNgaJpZM4CyRYi> .

amueller added the New Feature label Jan 22, 2015

vmichel pushed a commit to vmichel/scikit-learn that referenced this issue Apr 2, 2015

Add distance threshold on Hierarchical Clustering, see scikit-learn#3796

4125c57

vmichel pushed a commit to vmichel/scikit-learn that referenced this issue Apr 2, 2015

Add distance threshold on Hierarchical Clustering, see scikit-learn#3796

b016074

amueller added the Need Contributor label Oct 27, 2016

VathsalaAchar mentioned this issue Jun 8, 2017

Added distance_threshold parameter to hierarchical clustering #9069

Merged

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

jnothman mentioned this issue Nov 9, 2017

implement different cut criteria for agglomerative clustering #6197

Closed

NicolasHug closed this as completed in #9069 Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hierarchical clustering: distance threshold #3796

Hierarchical clustering: distance threshold #3796

vmirly commented Oct 23, 2014

agramfort commented Oct 23, 2014

eyaler commented Feb 5, 2015

mjbommar commented Feb 5, 2015

VathsalaAchar commented May 11, 2017

jnothman commented May 11, 2017 via email

VathsalaAchar commented May 12, 2017

jnothman commented May 13, 2017 via email

Hierarchical clustering: distance threshold #3796

Hierarchical clustering: distance threshold #3796

Comments

vmirly commented Oct 23, 2014

agramfort commented Oct 23, 2014

eyaler commented Feb 5, 2015

mjbommar commented Feb 5, 2015

VathsalaAchar commented May 11, 2017

jnothman commented May 11, 2017 via email

VathsalaAchar commented May 12, 2017

jnothman commented May 13, 2017 via email