-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hierarchical clustering: distance threshold #3796
Comments
sure it is possible ! code is open and we always welcome good contributions... |
actually there are more options for the stopping/threshold criteria: the inconsistency coefficient seems interesting and is the default in matlab and scipy... |
You might also review some of the literate on graph modularity as a more robust measure of cluster quality. The igraph package (with python bindings) has a large number of these functions available for reference. Some discussion here also re: clustering and "communitiy detection", agglomerative/divisive/spectrall approaches, etc.: |
Do you mind if I pick this up? I have implemented clustering in C++ years ago so I'll brush up and get back with questions. |
I think you'd be welcome to
…On 11 May 2017 at 20:12, Vathsala Achar ***@***.***> wrote:
Do you mind if I pick this up? I have implemented clustering in C++ years
ago so I'll brush up and get back with questions.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3796 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz66oNne-dsdHaN8gyrMBFploImN5bks5r4t8ngaJpZM4CyRYi>
.
|
designing an api that is extensible to alternate stopping criteria may bea
good idea, but it's often good to implement one at a time
…On 12 May 2017 7:45 pm, "Vathsala Achar" ***@***.***> wrote:
Thanks @jnothman <https://github.com/jnothman>
I saw that @vmichel <https://github.com/vmichel> has made the necessary
changes for a distance threshold (and no PR?), but I was interested in the
criterion that scipy.fcluster uses (mention by @eyaler
<https://github.com/eyaler>) and was wondering if that would be useful
along with the distance threshold. Or is it a feature of it's own?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3796 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz668aVTFBtfzK_lPIIEOJUhCcNc0Vks5r5CpNgaJpZM4CyRYi>
.
|
So, the output of hierarchichal clustering results can be determined either by number of clusters, or by the a distance thereshold to cut the tree at that threshold. However, scikit learn only supports one way!
class sklearn.cluster.AgglomerativeClustering(n_clusters=2, ...
As a suggestion, is it possible to add the other option, to give distance_threshold as input argument and get as many clusters are created in the output! So, one of this argument should be given, either n_clusters or the threshold, not both!
The text was updated successfully, but these errors were encountered: