Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Use cut_tree as a backup for fcluster
I observed that sometimes the number of clusters returned by fcluster is not equal to the specified number of clusters -- the returned number of clusters is smaller. This seems to happen when n_clusters is close to n_samples. This issue causes some problems downstream. I tried cut_tree instead of fcluster, and it seems that cut_tree is returning the correct solution in the examples I saw. However, I didn't want to replace fcluster by cut_tree, because there also seems to be some issue with cut_tree, see scipy/scipy#8063, which is not yet resolved in scipy. An alternative solution is to use AgglomerativeClustering in sklearn: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html. See https://stackoverflow.com/questions/47535256/how-to-make-fcluster-to-return-the-same-output-as-cut-tree. However, doing that requires a lot of code changes. So for now, as a temporary solution, I use cut_tree as a backup for fcluster, i.e., when the fcluster output is not desired, I run cut_tree. When both solutions are not desired, I output a warning. We'll need to fix this in a better way in the future if this is really a problem that happens a lot. By far it does not seem to be happening a lot.
- Loading branch information