Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added distance_threshold parameter to hierarchical clustering #9069
Added distance_threshold parameter to hierarchical clustering #9069
Changes from 24 commits
6c5f957
8ea9afa
e56670d
03709c8
fda248d
c45ec1f
6efc3ec
47790a3
71dd010
0764729
0649b29
83600ea
43ef071
7a8fc68
b37c183
38659e4
b17a818
fd44b65
cd6c9aa
c07fa4d
ba326ac
0b94bef
680e515
c84f429
3981918
3f1a6be
36cd205
4ae66fd
8060020
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this needs a description. It isn't clear what this parameter does right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should test something more explicit, just to be sure that your logic here is correct, like check that in single linkage, the maximum within-cluster pairwise distance for each sample is under the threshold and the minimum out-of-cluster pairwise distance is greater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean along the lines of this test?
I could use the same dataset to do a more explicit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how it relates to that test. I mean that for some
X
and some predictedlabels
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delayed response, but as far as I understand the pairwise distance will give the distance between each point in
X
not the distance between the clusters as they join up. The distance between each cluster is in thedistances
matrix calculated using thescipy.cluster.hierarchy.linkage
method.So is there still a need to have an explicit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. But surely a similar invariance could be constructed about the average distances with average linkage...? I've not thought about it too rigorously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although what I said is true only when
connectivity
isNone
. If aconnectivity
matrix is passed in then calculating the clusters would mean deciphering what is happening in theward_tree
andlinkage_tree
methods, and I'm not sure it's worth the effort...I suppose, but is there a need to do this?
I'm really not sure how to do a more explicit test, so I'd really appreciate help with this.