Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hash join when the input tables have nulls on only one side #13120

Merged
merged 22 commits into from
Apr 13, 2023

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Apr 11, 2023

This is very similar to #11284, which fixes a bug when only one input table has nulls while the other doesn't. This is due to the new experimental hasher producing different hash values depending on an input flag has_nulls. In order to properly use it, has_nulls must be computed by checking all the possible input tables, or set to a constant value (true).

Closes:

@ttnghia ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Apr 11, 2023
@ttnghia ttnghia self-assigned this Apr 11, 2023
@ttnghia ttnghia changed the title Fix hash join when the input tables have nulls in only one side Fix hash join when the input tables have nulls on only one side Apr 11, 2023
@github-actions github-actions bot added the Java Affects Java cuDF API. label Apr 11, 2023
@ttnghia ttnghia added breaking Breaking change and removed non-breaking Non-breaking change labels Apr 11, 2023
@ttnghia ttnghia marked this pull request as ready for review April 11, 2023 23:11
@ttnghia ttnghia requested review from a team as code owners April 11, 2023 23:11
cpp/include/cudf/join.hpp Outdated Show resolved Hide resolved
cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved
cpp/tests/join/join_tests.cpp Outdated Show resolved Hide resolved
cpp/src/join/hash_join.cu Show resolved Hide resolved
@ttnghia ttnghia removed the request for review from a team April 12, 2023 17:31
cpp/include/cudf/detail/join.hpp Outdated Show resolved Hide resolved
cpp/src/join/hash_join.cu Outdated Show resolved Hide resolved
cpp/tests/join/join_tests.cpp Outdated Show resolved Hide resolved
ttnghia and others added 2 commits April 12, 2023 12:00
Co-authored-by: Divye Gala <divyegala@gmail.com>
@ttnghia ttnghia added non-breaking Non-breaking change and removed Java Affects Java cuDF API. breaking Breaking change labels Apr 12, 2023
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

cpp/include/cudf/detail/join.hpp Outdated Show resolved Hide resolved
@ttnghia
Copy link
Contributor Author

ttnghia commented Apr 13, 2023

/merge

@rapids-bot rapids-bot bot merged commit d415ffe into rapidsai:branch-23.06 Apr 13, 2023
@ttnghia ttnghia deleted the fix_hash_join branch April 13, 2023 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Left joins on struct key producing incorrect null results for right table
5 participants