New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distances for agglomerativeclustering #17984
Distances for agglomerativeclustering #17984
Conversation
Fix conflict with master on sklearn/cluster/_agglomerative.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @FrancescoCasalegno , we would need to add a test for this parameter in sklearn/cluster/tests/test_hierarchical.py
Please add an entry to the change log at doc/whats_new/v0.24.rst
. Like the other entries there, please reference this pull request with :pr:
and credit yourself (and other contributors if applicable) with :user:
.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @FrancescoCasalegno !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the documentation of the distances_
attribute should be reworked to better describe what those distances mean.
Bonjour @ogrisel , thank you for your detailed feedback! I hope I managed to address all your concerns.
On this specific point, could you please clarify what you mean by "better describing"? Moreover, notice that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you very much @FrancescoCasalegno and @EmilieDel.
Maybe in a follow-up PR, could you please try to extehd the example https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html to show how to plot the dendrogram of non-complete clustering where we would use the compute_distance
parameter without setting the threshold to 0 (for instance with n_clusters=4
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @EmilieDel and @FrancescoCasalegno .
LGTM, aside for a minor comment about the what's new.
Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>
Merged! Thanks again to all. |
Co-authored-by: Michael Riedmann <michael_riedmann@live.com> Co-authored-by: Emilie Delattre <emilie.delattre@epfl.ch> Co-authored-by: EmilieDel <47669575+EmilieDel@users.noreply.github.com>
Reference Issues/PRs
Fixes #16701.
Closes #16903.
What does this implement/fix? Explain your changes.
It adds an optional argument to AgglomerativeClustering that enables the return_distance switch on all subsequent calls to set the distances_ attribute. This could make plotting a dendrogram from the resulting model much easier (as requested in #16701)
Any other comments?
This PR takes over where the stalled PR #16903 left.
ToDo
distances_
insklearn/cluster/tests/test_hierarchical.py
distances_
attribute (we can just use [MRG] DOC document distances_ attribute #17308).