Refactor `semi_anti_join` #11100

ttnghia · 2022-06-13T17:10:16Z

A (left) semi-join between the left and right tables returns a set of rows in the left table that has matching rows (i.e., compared equally) in the right table. As such, for each row in the left table, it needs to check if that row has a match in the right table.

Such check is very generic and has applications in many other places, not just in semi-join. This PR exposes that check functionality as a new cudf::detail::contains(table_view, table_view) for internal usage.

Closes #11037.

Depends on:

Add static_multimap::pair_contains NVIDIA/cuCollections#175

ttnghia · 2022-06-14T17:44:11Z

Benchmark comparing before and after the modification:

Benchmark                                                                                           Time             CPU      Time Old      Time New       CPU Old       CPU New
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Join<int32_t, int32_t>/left_anti_join_32bit/100000/100000/manual_time                            +0.0080         +0.0195             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/100000/400000/manual_time                            +0.2293         +0.2204             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/100000/1000000/manual_time                           +0.3630         +0.3643             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/10000000/manual_time                        -0.0812         -0.0811            15            14            15            14
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/40000000/manual_time                        -0.1041         -0.1040            33            30            33            30
Join<int32_t, int32_t>/left_anti_join_32bit/10000000/100000000/manual_time                       -0.1277         -0.1281            71            62            71            62
Join<int32_t, int32_t>/left_anti_join_32bit/100000000/100000000/manual_time                      -0.0772         -0.0778           155           143           155           143
Join<int32_t, int32_t>/left_anti_join_32bit/80000000/240000000/manual_time                       -0.1123         -0.1123           228           203           228           203
Join<int64_t, int64_t>/left_anti_join_64bit/50000000/50000000/manual_time                        -0.1051         -0.1046            81            73            81            73
Join<int64_t, int64_t>/left_anti_join_64bit/40000000/120000000/manual_time                       -0.1215         -0.1215           119           105           119           105
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/100000/manual_time                      +0.0291         +0.0243             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/400000/manual_time                      +0.1580         +0.1450             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000/1000000/manual_time                     +0.2847         +0.2648             0             0             0             0
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/10000000/manual_time                  -0.1329         -0.1318             5             5             5             5
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/40000000/manual_time                  +0.0241         +0.0231            11            11            11            11
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/10000000/100000000/manual_time                 +0.0847         +0.0868            22            24            23            24
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/100000000/100000000/manual_time                -0.1402         -0.1401            52            44            52            45
Join<int32_t, int32_t>/left_anti_join_32bit_nulls/80000000/240000000/manual_time                 +0.0208         +0.0201            72            74            72            74
Join<int64_t, int64_t>/left_anti_join_64bit_nulls/50000000/50000000/manual_time                  -0.1225         -0.1221            28            24            28            25
Join<int64_t, int64_t>/left_anti_join_64bit_nulls/40000000/120000000/manual_time                 -0.0015         -0.0008            40            40            40            40
Join<int32_t, int32_t>/left_semi_join_32bit/100000/100000/manual_time                            -0.0386         -0.0345             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/100000/400000/manual_time                            +0.1529         +0.1410             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/100000/1000000/manual_time                           +0.3739         +0.3534             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/10000000/manual_time                        -0.1041         -0.1122            15            13            15            13
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/40000000/manual_time                        -0.1314         -0.1398            33            28            33            28
Join<int32_t, int32_t>/left_semi_join_32bit/10000000/100000000/manual_time                       -0.1445         -0.1522            68            58            68            58
Join<int32_t, int32_t>/left_semi_join_32bit/100000000/100000000/manual_time                      -0.1175         -0.1253           157           138           157           137
Join<int32_t, int32_t>/left_semi_join_32bit/80000000/240000000/manual_time                       -0.1209         -0.1290           223           196           223           194
Join<int64_t, int64_t>/left_semi_join_64bit/50000000/50000000/manual_time                        -0.1392         -0.1467            82            71            82            70
Join<int64_t, int64_t>/left_semi_join_64bit/40000000/120000000/manual_time                       -0.1570         -0.1585           115            97           115            97
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/100000/manual_time                      +0.0003         -0.0080             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/400000/manual_time                      +0.1304         +0.1115             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000/1000000/manual_time                     +0.3206         +0.2783             0             0             0             0
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/10000000/manual_time                  -0.1502         -0.1571             4             4             5             4
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/40000000/manual_time                  -0.0050         -0.0156             8             8             8             8
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/10000000/100000000/manual_time                 +0.0594         +0.0490            17            18            17            17
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/100000000/100000000/manual_time                -0.1602         -0.1688            46            39            46            38
Join<int32_t, int32_t>/left_semi_join_32bit_nulls/80000000/240000000/manual_time                 -0.0376         -0.0472            58            56            58            55
Join<int64_t, int64_t>/left_semi_join_64bit_nulls/50000000/50000000/manual_time                  -0.1639         -0.1720            23            20            23            19
Join<int64_t, int64_t>/left_semi_join_64bit_nulls/40000000/120000000/manual_time                 -0.0424         -0.0523            29            28            29            28

So the performance is not always better, although in most cases we gain >10% speedup. The slower places are due to null search. I'm still trying to optimize it.

PointKernel

Some non-blocking nitpick. Thanks @ttnghia ! Great Work!

cpp/src/join/join_common_utils.cuh

cpp/src/join/join_common_utils.hpp

PointKernel · 2022-07-13T14:55:21Z

cpp/src/search/contains_table.cu

+      // Otherwise, we have only one nullable column and can use its null mask directly.
+      auto const row_bitmask =
+        haystack_nullable_columns.size() > 1
+          ? std::move(


This move is redundant.

move is necessary here because we need to access to the .first element of the bitmask_and result.

https://godbolt.org/z/4576f5s9M

Wow I didn't know that. Thanks 😄

Sorry, I was wrong, the .first element here is a device_buffer instead.
It seems that by not using std::move, the device_buffer is incorrectly copied (??) and CI test failed:

13:43:43 [ RUN ] DeathTest.CudaFatalError 13:43:43 /workspace/.conda-bld/work/cpp/tests/error/error_handling_test.cu:102: Failure 13:43:43 Death test: call_kernel() 13:43:43 Result: failed to die. 13:43:43 Error msg: 13:43:43 [ DEATH ]

I don't think it's related. #11100 (comment)

Let me try one more time to see if removing std::move actually can get rid of the death test failure.

Maybe the failure here is just random? I just checked rmm::device_buffer and see that its copy constructor is deleted so there should not be any copying.

Let's wait to see the CI test result.

Okay that test failed again. So indeed removing std::move is not related.

Now all CI tests passed (std::move removed)! Definitely the test failed randomly. I'm going to merge this soon.

ttnghia · 2022-07-13T20:26:27Z

Rerun tests.

This reverts commit 0c15f97.

Unfortunately Jake has no time to review again

ttnghia · 2022-07-14T15:22:27Z

@gpucibot merge

This PR adds the following APIs for set operations: * `lists::have_overlap` * `lists::intersect_distinct` * `lists::union_distinct` * `lists::difference_distinct` ### Name Convention Except for the first API (`lists::have_overlap`) that returns a boolean column, the suffix `_distinct` of the rest APIs denotes that their results will be lists columns in which all list rows have been post-processed to remove duplicates. As such, their results are actually "set" columns in which each row is a "set" of distinct elements. --- Depends on: * #10945 * #11017 * NVIDIA/cuCollections#175 * #11052 * #11118 * #11100 * #11149 Closes #10409. Authors: - Nghia Truong (https://github.com/ttnghia) - Yunsong Wang (https://github.com/PointKernel) Approvers: - Michael Wang (https://github.com/isVoid) - AJ Schmidt (https://github.com/ajschmidt8) - Bradley Dice (https://github.com/bdice) - Yunsong Wang (https://github.com/PointKernel) URL: #11043

Add member function interface

326f8d4

ttnghia added feature request New feature or request 2 - In Progress Currently a work in progress libcudf blocker libcudf Affects libcudf (C++/CUDA) code. Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jun 13, 2022

ttnghia self-assigned this Jun 13, 2022

ttnghia added 2 commits June 13, 2022 13:55

Fix stale comment

26958d5

Initial implementation

5a22c2b

github-actions bot added the CMake CMake build issue label Jun 13, 2022

ttnghia added 6 commits June 13, 2022 18:40

Switch to use new implementation

910e05f

All test passed

e4622b1

Add public and detail API

be85cd2

Cleanup and add comments

f299f4f

Fix style

82dc340

Rename function and variables

15c8daf

ttnghia changed the title ~~Refactor semi_anti_join to expose semi_join_contains~~ Refactor semi_anti_join to expose left_semi_join_contains Jun 14, 2022

ttnghia added 2 commits June 14, 2022 10:02

Fix a serious bug

5ae3ef8

Optimize null insertion

58fb9d7

ttnghia added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jun 14, 2022

ttnghia marked this pull request as ready for review June 14, 2022 20:01

ttnghia requested review from a team as code owners June 14, 2022 20:01

ttnghia requested review from jrhemstad and codereport June 14, 2022 20:01

ttnghia requested a review from jrhemstad July 8, 2022 00:27

ttnghia added 2 commits July 12, 2022 16:36

Merge branch 'branch-22.08' into refactor_semijoin

9038f03

Reverse cmake for cuco

943cf61

ttnghia removed 0 - Blocked Cannot progress due to external reasons 5 - Merge After Dependencies labels Jul 12, 2022

Some cleanup

5ec0634

ttnghia removed the request for review from codereport July 13, 2022 03:26

PointKernel approved these changes Jul 13, 2022

View reviewed changes

ttnghia added 4 commits July 13, 2022 10:04

Misc

4e54545

Move function into table_view.*

d16de35

Reverse changes in join_common_utils.hpp

032c2b6

Remove std::move

0c15f97

ttnghia added 4 commits July 13, 2022 13:52

Revert "Remove std::move"

00a034d

This reverts commit 0c15f97.

Merge branch 'branch-22.08' into refactor_semijoin

88fcb37

Remove std::move

0916751

Merge branch 'branch-22.08' into refactor_semijoin

ea5a513

rapids-bot bot merged commit ed9355f into rapidsai:branch-22.08 Jul 14, 2022

ttnghia deleted the refactor_semijoin branch July 14, 2022 15:23

res-life mentioned this pull request Jul 15, 2022

[BUG] join_test failed in integration tests NVIDIA/spark-rapids#6003

Closed

abellina mentioned this pull request Jul 19, 2022

[BUG] performance regression after semi_anti_join refactor #11299

Closed

This was referenced Jul 20, 2022

Temporarily reverse semi-anti-join implementation #11310

Closed

[FEA] Replace cuco::static_multimap by cuco::static_map in semi-anti-join #11313

Closed

ttnghia mentioned this pull request Jul 29, 2022

Fully support nested types in cudf::contains #10656

Merged

wence- mentioned this pull request Nov 22, 2023

[ENH] Audit cudf APIs for use of inappropriate algorithms #14479

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `semi_anti_join` #11100

Refactor `semi_anti_join` #11100

ttnghia commented Jun 13, 2022 •

edited

Loading

ttnghia commented Jun 14, 2022 •

edited

Loading

PointKernel left a comment

PointKernel Jul 13, 2022

ttnghia Jul 13, 2022 •

edited

Loading

PointKernel Jul 13, 2022

ttnghia Jul 13, 2022

ttnghia Jul 13, 2022 •

edited

Loading

PointKernel Jul 13, 2022

ttnghia Jul 13, 2022

ttnghia Jul 13, 2022 •

edited

Loading

ttnghia Jul 13, 2022

ttnghia Jul 14, 2022 •

edited

Loading

ttnghia commented Jul 13, 2022

ttnghia commented Jul 14, 2022

Refactor semi_anti_join #11100

Refactor semi_anti_join #11100

Conversation

ttnghia commented Jun 13, 2022 • edited Loading

ttnghia commented Jun 14, 2022 • edited Loading

PointKernel left a comment

Choose a reason for hiding this comment

PointKernel Jul 13, 2022

Choose a reason for hiding this comment

ttnghia Jul 13, 2022 • edited Loading

Choose a reason for hiding this comment

PointKernel Jul 13, 2022

Choose a reason for hiding this comment

ttnghia Jul 13, 2022

Choose a reason for hiding this comment

ttnghia Jul 13, 2022 • edited Loading

Choose a reason for hiding this comment

PointKernel Jul 13, 2022

Choose a reason for hiding this comment

ttnghia Jul 13, 2022

Choose a reason for hiding this comment

ttnghia Jul 13, 2022 • edited Loading

Choose a reason for hiding this comment

ttnghia Jul 13, 2022

Choose a reason for hiding this comment

ttnghia Jul 14, 2022 • edited Loading

Choose a reason for hiding this comment

ttnghia commented Jul 13, 2022

ttnghia commented Jul 14, 2022

Refactor `semi_anti_join` #11100

Refactor `semi_anti_join` #11100

ttnghia commented Jun 13, 2022 •

edited

Loading

ttnghia commented Jun 14, 2022 •

edited

Loading

ttnghia Jul 13, 2022 •

edited

Loading

ttnghia Jul 13, 2022 •

edited

Loading

ttnghia Jul 13, 2022 •

edited

Loading

ttnghia Jul 14, 2022 •

edited

Loading