Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchical clustering: distance threshold #3796

Closed
vmirly opened this issue Oct 23, 2014 · 7 comments · Fixed by #9069
Closed

Hierarchical clustering: distance threshold #3796

vmirly opened this issue Oct 23, 2014 · 7 comments · Fixed by #9069

Comments

@vmirly
Copy link

vmirly commented Oct 23, 2014

So, the output of hierarchichal clustering results can be determined either by number of clusters, or by the a distance thereshold to cut the tree at that threshold. However, scikit learn only supports one way!

class sklearn.cluster.AgglomerativeClustering(n_clusters=2, ...

As a suggestion, is it possible to add the other option, to give distance_threshold as input argument and get as many clusters are created in the output! So, one of this argument should be given, either n_clusters or the threshold, not both!

@agramfort
Copy link
Member

sure it is possible !

code is open and we always welcome good contributions...

@eyaler
Copy link

eyaler commented Feb 5, 2015

actually there are more options for the stopping/threshold criteria:
https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.cluster.hierarchy.fcluster.html#scipy.cluster.hierarchy.fcluster
http://www.mathworks.com/help/stats/cluster.html

the inconsistency coefficient seems interesting and is the default in matlab and scipy...

@mjbommar
Copy link
Contributor

mjbommar commented Feb 5, 2015

You might also review some of the literate on graph modularity as a more robust measure of cluster quality. The igraph package (with python bindings) has a large number of these functions available for reference.

Some discussion here also re: clustering and "communitiy detection", agglomerative/divisive/spectrall approaches, etc.:
http://arxiv.org/abs/0906.0612
http://bommaritollc.com/2012/06/summary-community-detection-algorithms-igraph-0-6/

vmichel pushed a commit to vmichel/scikit-learn that referenced this issue Apr 2, 2015
vmichel pushed a commit to vmichel/scikit-learn that referenced this issue Apr 2, 2015
@VathsalaAchar
Copy link
Contributor

Do you mind if I pick this up? I have implemented clustering in C++ years ago so I'll brush up and get back with questions.

@jnothman
Copy link
Member

jnothman commented May 11, 2017 via email

@VathsalaAchar
Copy link
Contributor

Thanks @jnothman
I saw that @vmichel has made the necessary changes for a distance threshold (and no PR?), but I was interested in the criterion that scipy.fcluster uses (mention by @eyaler) and was wondering if that would be useful along with the distance threshold. Or is it a feature of it's own?

@jnothman
Copy link
Member

jnothman commented May 13, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants