Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scipy.cluster.hierarchy.optimal_leaf_ordering should use newer algorithm from Bar-Joseph et al. 2001 #11228

Open
xplat opened this issue Dec 16, 2019 · 2 comments
Labels
enhancement A new feature or improvement scipy.cluster

Comments

@xplat
Copy link

xplat commented Dec 16, 2019

The current algorithm is O(n^4) so borderline impractical even for moderately-sized problems.

In [BJ01] is described an O(n^3) algorithm for the same problem. Compared to the current algorithm, the list of authors overlaps considerably and it is actually only a slight variation. It should be practical for considerably larger problems and more in line with the time spent computing the actual clustering, so it should see a lot more use.

[BJ01] Bar-Joseph Z, Biedl T, Brejova B, Demaine E, Gifford D, Hamel A, Jaakola T, Srebro N, Vinar T: Optimal arrangement of leaves in the tree representing hierarchical clustering of gene expression data. In Tech Rep 14. Department of Computer Science, University of Waterloo; 2001.

@tylerjereddy tylerjereddy added enhancement A new feature or improvement scipy.cluster labels Dec 16, 2019
@rgommers
Copy link
Member

Thanks for the suggestion @xplat, sounds like a good idea to update the algorithm. Would you be interested in working on this? Or @jamestwebber maybe?

@jamestwebber
Copy link
Contributor

It seems like a good idea. I will have to read the paper before knowing how hard it might be to implement...and it seems like that we'll need some tests to verify the results and prevent issues like #11227 from happening again. @xplat if you are willing to help test and review code that would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.cluster
Projects
None yet
Development

No branches or pull requests

4 participants