Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix out-of-bound access in cudf::detail::label_segments #11497

Merged
merged 3 commits into from
Aug 9, 2022

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Aug 9, 2022

In cudf::detail::label_segments, when the input lists column has empty/nulls lists at the end of the column, its offsets column will contain out-of-bound indices. This leads to invalid memory access bug. Such bug is elusive and doesn't show up consistently. Test failures reported in NVIDIA/spark-rapids#6249 are due to this.

The existing unit tests already cover such corner case. Unfortunately, the bug didn't show up until being tested on some systems. Even that, it was very difficult to reproduce it.

Closes #11495.

@ttnghia ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Aug 9, 2022
@ttnghia ttnghia requested a review from a team as a code owner August 9, 2022 13:57
@ttnghia ttnghia self-assigned this Aug 9, 2022
@ttnghia ttnghia added this to PR-WIP in v22.08 Release via automation Aug 9, 2022
@ttnghia ttnghia requested review from upsj and bdice August 9, 2022 13:57
@ttnghia ttnghia added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Aug 9, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Aug 9, 2022

Added DO_NOT_MERGE label, waiting for Spark team to verify.

Copy link
Contributor

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

cpp/include/cudf/detail/labeling/label_segments.cuh Outdated Show resolved Hide resolved
v22.08 Release automation moved this from PR-WIP to PR-Reviewer approved Aug 9, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for the fix.

v22.08 Release automation moved this from PR-Reviewer approved to PR-Needs review Aug 9, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test that catches this?

@ttnghia
Copy link
Contributor Author

ttnghia commented Aug 9, 2022

Can we add a test that catches this?

No way to catch this consistently. I had to run tests in Spark with large random input columns to get the test inconsistently failed.

compute-sanitizer doesn't catch anything unless the test failed.

@ttnghia
Copy link
Contributor Author

ttnghia commented Aug 9, 2022

In addition, the current list distinct tests already covered this case but they never failed.

v22.08 Release automation moved this from PR-Needs review to PR-Reviewer approved Aug 9, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Aug 9, 2022

I guess maybe rmm typically allocates more than enough memory for the column thus reading just off-by-one out-of-bound still be a valid read? Only in some special situations when such off-by-one out-of-bound indices are actually out of valid memory bound then the bug shows up?

@abellina
Copy link
Contributor

abellina commented Aug 9, 2022

I ran this locally with a test that was reproducing and with the patch it doesn't anymore. LGTM

Copy link
Collaborator

@jbrennan333 jbrennan333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 good catch!

@ttnghia ttnghia removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Aug 9, 2022
@ttnghia ttnghia changed the base branch from branch-22.10 to branch-22.08 August 9, 2022 15:37
@ttnghia
Copy link
Contributor Author

ttnghia commented Aug 9, 2022

Rerun tests.

@codecov
Copy link

codecov bot commented Aug 9, 2022

Codecov Report

Merging #11497 (dcff820) into branch-22.08 (622e0f4) will not change coverage.
The diff coverage is n/a.

❗ Current head dcff820 differs from pull request most recent head 0489a3f. Consider uploading reports for the commit 0489a3f to get more accurate results

@@              Coverage Diff              @@
##           branch-22.08   #11497   +/-   ##
=============================================
  Coverage         86.47%   86.47%           
=============================================
  Files               144      144           
  Lines             22856    22856           
=============================================
  Hits              19765    19765           
  Misses             3091     3091           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mythrocks
Copy link
Contributor

👍 Yikes. This is a good corner case to look out for.

@raydouglass raydouglass merged commit dccb586 into rapidsai:branch-22.08 Aug 9, 2022
v22.08 Release automation moved this from PR-Reviewer approved to Done Aug 9, 2022
@ttnghia ttnghia deleted the fix_out_of_bound branch August 9, 2022 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team ! - Hotfix Hotfix is a bug that affects the majority of users for which there is no reasonable workaround bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

[BUG] Out-of-bound access in cudf::detail::label_segments
8 participants