Python implementations of a few algorithms to cluster data based on minimum spanning trees, including:
- HEMST (Grygorash, Zhou, Jorgensen, 2006)
- CTCEHC (Ma et al., 2021)
- MSDR [experimental]
- Zahn's method (1971) [experimental]
For other algorithms based on MSTs, see:
- single linkage in sklearn and fastcluster
- genieclust – the Genie algorithm
- ITM – information-theoretic clustering
- hdbscan
HEMST and MSDR: Grygorash, O., Zhou, Y., Jorgensen, Z., 2006. Minimum spanning tree based clustering algorithms, in: Proc. ICTAI’06, pp. 1–9. DOI: 10.1109/ICTAI.2006.83.
CTCEHC: Ma, Y., Lin, H., Wang, Y., Huang, H., He, X., 2021. A multi-stage hierarchical clustering algorithm based on centroid of tree and cut edge constraint. Information Sciences 557, 194–219. DOI: 10.1016/j.ins.2020.12.016.
Zahn's algoritm: Zahn, C., 1971. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transactions on Computers C-20, 68–86. DOI: 10.1109/T-C.1971.223083.
Campello R.J.G.B., Moulavi D., Sander J. (2013). Density-based clustering based on hierarchical density estimates. Lecture Notes in Computer Science 7819:160–172, DOI:10.1007/978-3-642-37456-2_14.
Gagolewski M. (2022). A framework for benchmarking clustering algorithms, SoftwareX 20:101270. URL: https://clustering-benchmarks.gagolewski.com. DOI: 10.1016/j.softx.2022.101270.
Gagolewski M., Bartoszuk M., Cena A. (2016). Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Information Sciences 363:8–23. DOI: 10.1016/j.ins.2016.05.003.
Gagolewski M. (2021). genieclust: Fast and robust hierarchical clustering. SoftwareX 15:100722. URL: https://genieclust.gagolewski.com/. DOI: 10.1016/j.softx.2021.100722.
McInnes L., Healy J., Astels S. (2017). hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2(11):205. DOI: 10.21105/joss.00205.
Müller A.C., Nowozin S., Lampert C.H. (2012). Information theoretic clustering using minimum spanning trees. In: Proc. German Conference on Pattern Recognition. URL: https://github.com/amueller/information-theoretic-mst.
Müllner D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software 53(9):1–18. DOI: 10.18637/jss.v053.i09.