Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Discrepancy in HDBSCAN core distance computation between cuML and HDBSCAN #4877

Closed
tarang-jain opened this issue Aug 29, 2022 · 0 comments · Fixed by #4872
Closed

[BUG] Discrepancy in HDBSCAN core distance computation between cuML and HDBSCAN #4877

tarang-jain opened this issue Aug 29, 2022 · 0 comments · Fixed by #4872
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@tarang-jain
Copy link
Contributor

Describe the bug
In HDBSCAN, while computing the core distances for all the points in the dataset, scikit-learn-contrib uses k = min_samples + 1. However, cuML uses k=min_samples as seen in

return knn_dists[row * min_samples + (min_samples - 1)];

@tarang-jain tarang-jain added ? - Needs Triage Need team to review and classify bug Something isn't working labels Aug 29, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Aug 29, 2022
@rapids-bot rapids-bot bot closed this as completed in #4872 Sep 3, 2022
Bug Squashing automation moved this from Needs prioritizing to Closed Sep 3, 2022
rapids-bot bot pushed a commit that referenced this issue Sep 3, 2022
PR for HDBSCAN approximate_predict

- [x] Building cluster_map
- [x] Modifying PredictionData class
- [x] Obtaining nearest neighbor in MR space
- [x] Computing probability
- [x] Tests

Closes #4877
Closes #4448

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4872
jakirkham pushed a commit to jakirkham/cuml that referenced this issue Feb 27, 2023
PR for HDBSCAN approximate_predict

- [x] Building cluster_map
- [x] Modifying PredictionData class
- [x] Obtaining nearest neighbor in MR space
- [x] Computing probability
- [x] Tests

Closes rapidsai#4877
Closes rapidsai#4448

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4872
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

1 participant