Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor semi_anti_join #11100

Merged
merged 63 commits into from
Jul 14, 2022
Merged

Conversation

ttnghia
Copy link
Contributor

@ttnghia ttnghia commented Jun 13, 2022

A (left) semi-join between the left and right tables returns a set of rows in the left table that has matching rows (i.e., compared equally) in the right table. As such, for each row in the left table, it needs to check if that row has a match in the right table.

Such check is very generic and has applications in many other places, not just in semi-join. This PR exposes that check functionality as a new cudf::detail::contains(table_view, table_view) for internal usage.

Closes #11037.

Depends on:

@ttnghia ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress libcudf blocker libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jun 13, 2022
@ttnghia ttnghia self-assigned this Jun 13, 2022
@github-actions github-actions bot added the CMake CMake build issue label Jun 13, 2022
@ttnghia ttnghia changed the title Refactor semi_anti_join to expose semi_join_contains Refactor semi_anti_join to expose left_semi_join_contains Jun 14, 2022
@ttnghia
Copy link
Contributor Author

ttnghia commented Jun 14, 2022

Benchmark comparing before and after the modification:

Benchmark                                                                                           Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Join<int32_t, int32_t>/left_anti_join_32bit/100000/100000/manual_time                            +0.0080         +0.0195             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/100000/400000/manual_time                            +0.2293         +0.2204             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/100000/1000000/manual_time                           +0.3630         +0.3643             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/10000000/manual_time                        -0.0812         -0.0811            15            14            15            14
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/40000000/manual_time                        -0.1041         -0.1040            33            30            33            30
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/100000000/manual_time                       -0.1277         -0.1281            71            62            71            62
Join<int32_t, int32_t>/left_anti_join_32bit/100000000/100000000/manual_time                      -0.0772         -0.0778           155           143           155           143
Join<int32_t, int32_t>/left_anti_join_32bit/80000000/240000000/manual_time                       -0.1123         -0.1123           228           203           228           203
Join<int64_t, int64_t>/left_anti_join_64bit/50000000/50000000/manual_time                        -0.1051         -0.1046            81            73            81            73
Join<int64_t, int64_t>/left_anti_join_64bit/40000000/120000000/manual_time                       -0.1215         -0.1215           119           105           119           105
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/100000/manual_time                      +0.0291         +0.0243             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/400000/manual_time                      +0.1580         +0.1450             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/1000000/manual_time                     +0.2847         +0.2648             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/10000000/manual_time                  -0.1329         -0.1318             5             5             5             5
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/40000000/manual_time                  +0.0241         +0.0231            11            11            11            11
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/100000000/manual_time                 +0.0847         +0.0868            22            24            23            24
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000000/100000000/manual_time                -0.1402         -0.1401            52            44            52            45
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/80000000/240000000/manual_time                 +0.0208         +0.0201            72            74            72            74
Join<int64_t, int64_t>/left_anti_join_64bit_nulls/50000000/50000000/manual_time                  -0.1225         -0.1221            28            24            28            25
Join<int64_t, int64_t>/left_anti_join_64bit_nulls/40000000/120000000/manual_time                 -0.0015         -0.0008            40            40            40            40
Join<int32_t, int32_t>/left_semi_join_32bit/100000/100000/manual_time                            -0.0386         -0.0345             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/100000/400000/manual_time                            +0.1529         +0.1410             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/100000/1000000/manual_time                           +0.3739         +0.3534             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/10000000/manual_time                        -0.1041         -0.1122            15            13            15            13
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/40000000/manual_time                        -0.1314         -0.1398            33            28            33            28
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/100000000/manual_time                       -0.1445         -0.1522            68            58            68            58
Join<int32_t, int32_t>/left_semi_join_32bit/100000000/100000000/manual_time                      -0.1175         -0.1253           157           138           157           137
Join<int32_t, int32_t>/left_semi_join_32bit/80000000/240000000/manual_time                       -0.1209         -0.1290           223           196           223           194
Join<int64_t, int64_t>/left_semi_join_64bit/50000000/50000000/manual_time                        -0.1392         -0.1467            82            71            82            70
Join<int64_t, int64_t>/left_semi_join_64bit/40000000/120000000/manual_time                       -0.1570         -0.1585           115            97           115            97
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/100000/manual_time                      +0.0003         -0.0080             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/400000/manual_time                      +0.1304         +0.1115             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/1000000/manual_time                     +0.3206         +0.2783             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/10000000/manual_time                  -0.1502         -0.1571             4             4             5             4
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/40000000/manual_time                  -0.0050         -0.0156             8             8             8             8
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/100000000/manual_time                 +0.0594         +0.0490            17            18            17            17
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000000/100000000/manual_time                -0.1602         -0.1688            46            39            46            38
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/80000000/240000000/manual_time                 -0.0376         -0.0472            58            56            58            55
Join<int64_t, int64_t>/left_semi_join_64bit_nulls/50000000/50000000/manual_time                  -0.1639         -0.1720            23            20            23            19
Join<int64_t, int64_t>/left_semi_join_64bit_nulls/40000000/120000000/manual_time                 -0.0424         -0.0523            29            28            29            28

So the performance is not always better, although in most cases we gain >10% speedup. The slower places are due to null search. I'm still trying to optimize it.

@ttnghia ttnghia added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jun 14, 2022
@ttnghia ttnghia marked this pull request as ready for review June 14, 2022 20:01
@ttnghia ttnghia requested review from a team as code owners June 14, 2022 20:01
@ttnghia ttnghia requested a review from jrhemstad July 8, 2022 00:27
@ttnghia ttnghia removed 0 - Blocked Cannot progress due to external reasons 5 - Merge After Dependencies labels Jul 12, 2022
@ttnghia ttnghia removed the request for review from codereport July 13, 2022 03:26
Copy link
Member

@PointKernel PointKernel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non-blocking nitpick. Thanks @ttnghia ! Great Work!

cpp/src/join/join_common_utils.cuh Show resolved Hide resolved
cpp/src/join/join_common_utils.hpp Outdated Show resolved Hide resolved
// Otherwise, we have only one nullable column and can use its null mask directly.
auto const row_bitmask =
haystack_nullable_columns.size() > 1
? std::move(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This move is redundant.

Copy link
Contributor Author

@ttnghia ttnghia Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move is necessary here because we need to access to the .first element of the bitmask_and result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I didn't know that. Thanks 😄

Copy link
Contributor Author

@ttnghia ttnghia Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was wrong, the .first element here is a device_buffer instead.
It seems that by not using std::move, the device_buffer is incorrectly copied (??) and CI test failed:

13:43:43 [ RUN      ] DeathTest.CudaFatalError
13:43:43 /workspace/.conda-bld/work/cpp/tests/error/error_handling_test.cu:102: Failure
13:43:43 Death test: call_kernel()
13:43:43     Result: failed to die.
13:43:43  Error msg:
13:43:43 [  DEATH   ] 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's related. #11100 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try one more time to see if removing std::move actually can get rid of the death test failure.

Copy link
Contributor Author

@ttnghia ttnghia Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the failure here is just random? I just checked rmm::device_buffer and see that its copy constructor is deleted so there should not be any copying.

Let's wait to see the CI test result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay that test failed again. So indeed removing std::move is not related.

Copy link
Contributor Author

@ttnghia ttnghia Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all CI tests passed (std::move removed)! Definitely the test failed randomly. I'm going to merge this soon.

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 13, 2022

Rerun tests.

@ttnghia ttnghia dismissed jrhemstad’s stale review July 14, 2022 15:22

Unfortunately Jake has no time to review again

@ttnghia
Copy link
Contributor Author

ttnghia commented Jul 14, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit ed9355f into rapidsai:branch-22.08 Jul 14, 2022
@ttnghia ttnghia deleted the refactor_semijoin branch July 14, 2022 15:23
rapids-bot bot pushed a commit that referenced this pull request Jul 26, 2022
This PR adds the following APIs for set operations:
 * `lists::have_overlap`
 * `lists::intersect_distinct`
 * `lists::union_distinct`
 * `lists::difference_distinct`

### Name Convention
Except for the first API (`lists::have_overlap`) that returns a boolean column, the suffix `_distinct` of the rest APIs denotes that their results will be lists columns in which all list rows have been post-processed to remove duplicates. As such, their results are actually "set" columns in which each row is a "set" of distinct elements.

---

Depends on:
 * #10945
 * #11017
 * NVIDIA/cuCollections#175
 * #11052
 * #11118
 * #11100
 * #11149

Closes #10409.

Authors:
  - Nghia Truong (https://github.com/ttnghia)
  - Yunsong Wang (https://github.com/PointKernel)

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Bradley Dice (https://github.com/bdice)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #11043
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Share code between left_semi_anti_join, cudf::contains, and set operations
4 participants