Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add min_samples_leaf parameter to fcluster to be able to set minimal number of samples in cluster #17228

Open
dimadgo11 opened this issue Oct 14, 2022 · 0 comments
Labels
enhancement A new feature or improvement scipy.cluster

Comments

@dimadgo11
Copy link

Is your feature request related to a problem? Please describe.

Function scipy.cluster.hierarchy.fcluster(Z=Z, t=30, criterion='maxclust') may return clusters with very small sizes like 1, 2, 3 etc.
I believe that in many cases this behavior is not desired.

Describe the solution you'd like.

I expect min_samples_leaf or min_samples_per_cluster parameter to work this way: if it's set, then we walk through the dendrogram/tree/linkage matrix as usual, but stop splitting the node if number of samples in leaf is less then it is set. Then we freeze this branch and keep walking down the tree in other branches unless we get required number of clusters or hit this limitation.

Describe alternatives you've considered.

No response

Additional context (e.g. screenshots, GIFs)

No response

@dimadgo11 dimadgo11 added the enhancement A new feature or improvement label Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.cluster
Projects
None yet
Development

No branches or pull requests

3 participants
@dimadgo11 @j-bowhay and others