-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX Fix inconsistent naming convention for algorithm selection of HDBSCAN #26744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Fix inconsistent naming convention for algorithm selection of HDBSCAN #26744
Conversation
sklearn/cluster/_hdbscan/hdbscan.py
Outdated
mst_func = _hdbscan_prims | ||
kwargs["algo"] = "kd_tree" | ||
kwargs["leaf_size"] = self.leaf_size | ||
elif self.algorithm == "balltree": | ||
elif self.algorithm == "ball_tree": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch code coverage failed with partial hit
for this line 800
due to missing else block at elif == "ball_tree"
. Probably not added since parameter validation
is completed by validate_parameter_constraints
.
Thanks for the PR! Since scikit-learn 1.3.0 has been released with those names, our policy would mandate a deprecation cycle for the inconsistent names: Though since HDBSCAN was just introduced in that release we could maybe do an exception here? Personally I feel we should follow our backward compat policy. Any opinion @glemaitre and others? |
I should have pushed for the change before the release :) When speaking about this issue with @jeremiedbb, we thought to put it as a fix and include it in the next bug fixes release for which However, I don't want that we introduce bad practices. |
I can make sure the |
So, for backstory purposes, the reason that the algorithm names here are inconsistent is because originally they were going to also encode the MST algorithm in their names when we had both Still, I agree that after trimming I'd personally be okay considering it a backport fix, since most likely very few people are working with this new scikit-learn implementation of |
Although this estimator is new, I prefer deprecating the parameter. The default is "auto", so it should not impact many users. |
…stent_kdtree_balltree
# TODO(1.6): Remove | ||
filterwarnings_kdtree = pytest.mark.filterwarnings( | ||
"ignore:`algorithm='kdtree'`has been deprecated in 1.4 and will be renamed" | ||
" to'kd_tree'`in 1.6. To keep the past behaviour, set `algorithm=kd_tree`." | ||
) | ||
# TODO(1.6): Remove | ||
filterwarnings_balltree = pytest.mark.filterwarnings( | ||
"ignore:`algorithm='balltree'`has been deprecated in 1.4 and will be renamed" | ||
" to'ball_tree'`in 1.6. To keep the past behaviour, set `algorithm=ball_tree`." | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than filterwarnings everywhere, we can go ahead and use the new algorithm names. We need only test that the deprecation works as expected in one place and move on to future behavior everywhere else.
…stent_kdtree_balltree
…stent_kdtree_balltree
Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, failing CI is unrelated. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you for work @Shreesha3112 😄 |
…SCAN (scikit-learn#26744) Co-authored-by: shreesha3112 <shreesha3112.com> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
…SCAN (scikit-learn#26744) Co-authored-by: shreesha3112 <shreesha3112.com> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com>
Reference Issues/PRs
Fixes #26732
What does this implement/fix? Explain your changes.
algorithm
selection inHDBSCAN
for"kd_tree"
and"ball_tree"
"kdtree"
and"balltree"
and will be removed in 1.6?current naming conventions
Other estimators(
K-NN
,DBSCAN
, etc.) :algorithms : {"kd_tree", "bal_tree"}
HDBSCAN
:algorithm : {"kdtree", "balltree"}
updated naming convention for
HDBSCAN
algorithm : {"kd_tree", "ball_tree"}