Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add method to AgglomerativeClustering which returns the linkage matrix #17322

Open
fotisj opened this issue May 24, 2020 · 4 comments
Open

add method to AgglomerativeClustering which returns the linkage matrix #17322

fotisj opened this issue May 24, 2020 · 4 comments

Comments

@fotisj
Copy link

fotisj commented May 24, 2020

Describe the workflow you want to enable

Many people want to visualize the output of AgglomerativeClustering with a dendrogram. A recent update added an example how to do this (thanks, very useful in teaching!), but people still have to create the linkage matrix which is needed as input to the dendrogram-method in scipy by hand. It would be really cool to be able to plug-in the output of AgglomerativeClustering into dendrogram.

Describe your proposed solution

add a method which returns the linkage matrix

Describe alternatives you've considered, if relevant

An attribute like 'linkage_matrix' would be more in line with the workings of scikit-learn, but then, I guess, it would be calculated every time, even if people don't need the matrix. With large numbers of data points the computation time will increase and add to the runtime. So a methods sounds like a better solution

@thomasjpfan
Copy link
Member

Are you setting distance_threshold?

This issue could be related to #16903

@fotisj
Copy link
Author

fotisj commented May 25, 2020

Yes, I do. I am not talking about the bug, when you don't get any distances, but I am proposing a convenience function which returns the whole linkage matrix. At the moment, we have 3 of 4 columns with children_ and distances_, but we have to calculate the last column ourselves (see the example in the documentation of AgglomerativeClustering). The code is already there ...

@jnothman
Copy link
Member

jnothman commented May 25, 2020 via email

@fotisj
Copy link
Author

fotisj commented May 26, 2020

Maybe it could be done as an attribute. Using the code from the documentation and creating synthetic data with 50.000 samples, it only takes 136 ms (on a Intel i7-6700 CPU / Windows / Python 3.7) to create the link matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants