-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better description of the hierarchical clustering parameter #9171
Conversation
the reasons pypy3 failed with PR seems to be for reasons completely unrelated to my changes - any idea whats going on? |
scipy/cluster/hierarchy.py
Outdated
@@ -2471,8 +2471,11 @@ def fcluster(Z, t, criterion='inconsistent', depth=2, R=None, monocrit=None): | |||
Z : ndarray | |||
The hierarchical clustering encoded with the matrix returned | |||
by the `linkage` function. | |||
t : float | |||
The threshold to apply when forming flat clusters. | |||
t : float or int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could really be any numeric that can be safely cast to int
and float
, correct? A boolean would work here as well. I like your new comment below, but am unsure if changing to "float or int" is an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its more of a semantic hint to further reinforce the point that it is referring to number of clusters, an integer.. Making it explicit, speaking in python terms!. Without that hint, people may supply 2.0, thinking they are expected a float.
in fact, it might even be better to rename to t
to t_or_num_clust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in other places referring to kmeans, scipy already uses k_or_guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generic term to use for float or int
is scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take your point @raamana, I think scalar is the right way to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure - what about the renaming the variable to t_or_num_clust
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot rename variables in the signature, that breaks backwards compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. revised the docs now.
Merged, thanks @raamana, @jeffyancey |
Yay! thanks. So glad to be able to add a few characters into the mighty |
Clarifies that
t
could be an integer specifying the max number of clusters undermaxclust*
criteria