implement different cut criteria for agglomerative clustering #6197

amueller · 2016-01-20T16:07:24Z

we currently have "n_clusters" as a criterion, and a single way to cut the tree (I'm not sure what our strategy is called). scipy implements many more strategies, in particular "distance": http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.fcluster.html#scipy.cluster.hierarchy.fcluster

@GaelVaroquaux do we have the "inconsistent" in this nomenclature?

BiaDarkia · 2016-01-20T16:22:50Z

I can have a closer look on the current implementation and expand it to have more clustering strategies similar to scipy.

GaelVaroquaux · 2016-01-21T06:52:40Z

scipy implements many more strategies, in particular "distance":
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.cluster.hierarchy.fcluster.html

@GaelVaroquaux do we have the "inconsistent" in this nomenclature?

I don't understand a significant fraction of the information on the above
page :$. I don't know the definitions of words like "cophenetic
distance", "inconsistent value"

The strategy "distance" is what people have been most asking for, and I
guess that it should be the priority for scikit-learn.

BiaDarkia · 2016-01-21T14:17:14Z

I will have a look on implementing distance at the weekend.

BiaDarkia · 2016-01-25T04:31:24Z

I am nearly done with a working implementation for distance, but I have one more thing I would like to check on. I will submit a pull request once I am done.

BiaDarkia · 2016-01-26T13:14:16Z

I submitted a pull request for clustering based on cophenetic distance. When the additional arguments distance="True" and a threshold (float) is passed on to Agglomerative Clustering, you can cluster your data based on the cophenetic distance.

twistedcubic · 2016-01-27T04:34:41Z

@BiaDarkia I looked over your code, and I'm wondering: how and where are you determining the cophenetic distances? I only see the distance list being used. Thanks.

BiaDarkia · 2016-01-27T07:33:53Z

@twistedcubic: When building the tree in line 842 to 846 I specify the argument 'return_distance' to be true. Based on the documentation linkage_tree, the returned distance[i] refers to the distance between children[i[[0] and children[i][1] at which they are merged. If I understood the concept of cophenetic distance, it refers to the distance at which two children are merged into a single branch, which is reflected by the list returned_distance.

twistedcubic · 2016-01-28T00:21:15Z

@BiaDarkia I see, you are using the distance tree_builder returns. I will look over the code for linkage_tree more carefully.

jnothman · 2017-11-09T02:09:34Z

See also #3796, #9069

jnothman · 2017-11-09T03:36:23Z

From matlab docs "The inconsistency coefficient characterizes each link in a cluster tree by comparing its height with the average height of other links at the same level of the hierarchy. The higher the value of this coefficient, the less similar the objects connected by the link." I still think some of the terms here are confusing. Distance between clusters when merged are included when calculating this coefficient.

amueller · 2019-08-14T22:02:34Z

we have distance thresholding now

amueller added the Enhancement label Jan 20, 2016

BiaDarkia mentioned this issue Jan 26, 2016

Clustering based on cophenetic distance added #6234

Closed

amueller closed this as completed Aug 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement different cut criteria for agglomerative clustering #6197

implement different cut criteria for agglomerative clustering #6197

amueller commented Jan 20, 2016

BiaDarkia commented Jan 20, 2016

GaelVaroquaux commented Jan 21, 2016

BiaDarkia commented Jan 21, 2016

BiaDarkia commented Jan 25, 2016

BiaDarkia commented Jan 26, 2016

twistedcubic commented Jan 27, 2016

BiaDarkia commented Jan 27, 2016

twistedcubic commented Jan 28, 2016

jnothman commented Nov 9, 2017

jnothman commented Nov 9, 2017

amueller commented Aug 14, 2019

implement different cut criteria for agglomerative clustering #6197

implement different cut criteria for agglomerative clustering #6197

Comments

amueller commented Jan 20, 2016

BiaDarkia commented Jan 20, 2016

GaelVaroquaux commented Jan 21, 2016

BiaDarkia commented Jan 21, 2016

BiaDarkia commented Jan 25, 2016

BiaDarkia commented Jan 26, 2016

twistedcubic commented Jan 27, 2016

BiaDarkia commented Jan 27, 2016

twistedcubic commented Jan 28, 2016

jnothman commented Nov 9, 2017

jnothman commented Nov 9, 2017

amueller commented Aug 14, 2019