Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distances for agglomerativeclustering #17984

Merged
merged 14 commits into from Jul 26, 2020
Merged

Distances for agglomerativeclustering #17984

merged 14 commits into from Jul 26, 2020

Conversation

FrancescoCasalegno
Copy link
Contributor

@FrancescoCasalegno FrancescoCasalegno commented Jul 25, 2020

Reference Issues/PRs

Fixes #16701.
Closes #16903.

What does this implement/fix? Explain your changes.

It adds an optional argument to AgglomerativeClustering that enables the return_distance switch on all subsequent calls to set the distances_ attribute. This could make plotting a dendrogram from the resulting model much easier (as requested in #16701)

Any other comments?

This PR takes over where the stalled PR #16903 left.

ToDo

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FrancescoCasalegno , we would need to add a test for this parameter in sklearn/cluster/tests/test_hierarchical.py

Please add an entry to the change log at doc/whats_new/v0.24.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
EmilieDel and others added 3 commits July 25, 2020 15:30
@FrancescoCasalegno FrancescoCasalegno marked this pull request as ready for review July 25, 2020 13:59
doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
sklearn/cluster/_agglomerative.py Show resolved Hide resolved
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @FrancescoCasalegno !

sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
sklearn/cluster/_agglomerative.py Outdated Show resolved Hide resolved
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the documentation of the distances_ attribute should be reworked to better describe what those distances mean.

@FrancescoCasalegno
Copy link
Contributor Author

Bonjour @ogrisel , thank you for your detailed feedback! I hope I managed to address all your concerns.

I believe the documentation of the distances_ attribute should be reworked to better describe what those distances mean.

On this specific point, could you please clarify what you mean by "better describing"?
Imho reading the doc of the previous attributes it is quite clear we are talking about the nodes of the hierarchical tree representing the recursive agglomerative approach of this clustering model. What would you precisely like to further precise in the docstring?

Moreover, notice that the distances_ and the definition of its meaning are not introduced by this PR, so maybe this can be addressed in another Issue/PR?

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you very much @FrancescoCasalegno and @EmilieDel.

Maybe in a follow-up PR, could you please try to extehd the example https://scikit-learn.org/stable/auto_examples/cluster/plot_agglomerative_dendrogram.html to show how to plot the dendrogram of non-complete clustering where we would use the compute_distance parameter without setting the threshold to 0 (for instance with n_clusters=4)?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @EmilieDel and @FrancescoCasalegno .

LGTM, aside for a minor comment about the what's new.

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved
@ogrisel ogrisel merged commit 5a804fb into scikit-learn:master Jul 26, 2020
1 check was pending
@ogrisel
Copy link
Member

ogrisel commented Jul 26, 2020

Merged! Thanks again to all.

@FrancescoCasalegno FrancescoCasalegno deleted the distances-for-agglomerativeclustering branch July 26, 2020 19:22
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Co-authored-by: Michael Riedmann <michael_riedmann@live.com>
Co-authored-by: Emilie Delattre <emilie.delattre@epfl.ch>
Co-authored-by: EmilieDel <47669575+EmilieDel@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AttributeError: 'AgglomerativeClustering' object has no attribute 'distances_'
6 participants