Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing intermittent HBDSCAN pytest failure in CI #4025

Merged
merged 2 commits into from
Jul 6, 2021

Conversation

divyegala
Copy link
Member

There was an OOB access occurring here

value_t core_dist_b = max(core_dist_rit, max(core_dists[b.key], b.value));
when b.key == 1. This error was getting swallowed up and we were seeing those thrust::transform errors in CI.

Tagging @cjnolet to confirm if this is the intended way to use this functor and the fix is correct.

@divyegala divyegala requested a review from a team as a code owner July 2, 2021 01:13
@divyegala divyegala added 3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change CUDA / C++ CUDA issue and removed CUDA/C++ labels Jul 2, 2021
@divyegala divyegala mentioned this pull request Jul 2, 2021
21 tasks
@divyegala divyegala added this to PR-WIP in v21.08 Release via automation Jul 2, 2021
@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.08@f7fb363). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.08    #4025   +/-   ##
===============================================
  Coverage                ?   85.46%           
===============================================
  Files                   ?      230           
  Lines                   ?    18133           
  Branches                ?        0           
===============================================
  Hits                    ?    15498           
  Misses                  ?     2635           
  Partials                ?        0           
Flag Coverage Δ
dask 48.14% <0.00%> (?)
non-dask 77.75% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7fb363...47989a0. Read the comment docs.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for chasing this down, Divye. The assumptions and expectations in the fused l2 knn primitive are not clearly noted anywhere and it's been a bit of challenge to decompose them while customizing the k-select behavior for the reachability distances in hdbscan.

v21.08 Release automation moved this from PR-WIP to PR-Reviewer approved Jul 6, 2021
@cjnolet
Copy link
Member

cjnolet commented Jul 6, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 91abe67 into rapidsai:branch-21.08 Jul 6, 2021
v21.08 Release automation moved this from PR-Reviewer approved to Done Jul 6, 2021
rapids-bot bot pushed a commit that referenced this pull request Jul 14, 2021
closes #4025 again. This is happening because the underlying assumptions in the fused-1nn have likely changed, and that trickles down to HDBSCAN and the functors we use for accessing that API - as discussed with @cjnolet

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: #4052
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
There was an OOB access occurring here https://github.com/rapidsai/cuml/blob/0707a46f717023ca0b5047c3aeee9ead1a093272/cpp/src/hdbscan/runner.h#L74 when `b.key == 1`. This error was getting swallowed up and we were seeing those `thrust::transform` errors in CI.

Tagging @cjnolet to confirm if this is the intended way to use this functor and the fix is correct.

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4025
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
closes rapidsai#4025 again. This is happening because the underlying assumptions in the fused-1nn have likely changed, and that trickles down to HDBSCAN and the functors we use for accessing that API - as discussed with @cjnolet

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4052
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working CUDA / C++ CUDA issue non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants