Fix performance issue and add a new code path to `cudf::detail::contains` #11330

ttnghia · 2022-07-22T04:39:49Z

The current implementation of cudf::detail::contains can process input with arbitrary nested types. However, it was reported to have severe performance issue when the input tables have many duplicate rows (#11299). In order to fix the issue, #11310 and #11325 was created.

Unfortunately, #11310 is separating semi-anti-join from cudf::detail::contains, causing duplicate implementation. On the other hand, #11325 can address the issue #11299 but semi-anti-join using it still performs worse than the previous semi-anti-join implementation.

The changes in this PR include the following:

Fix the performance issue reported in [BUG] performance regression after semi_anti_join refactor #11299 for the current cudf::detail::contains implementation that support nested types.
Add a separate code path into cudf::detail::contains such that:
- Input without having lists column (at any nested level) will be processed by the code path that is the same as the old implementation of semi-anti-join. This is to make sure the performance of semi-anti-join will remain the same as before.
- Input with nested lists column, or NaNs compared as unequal, will be processed by another code path that supports nested types and different NaNs behavior. This will make sure support for nested types will not be dropped.

Closes #11299.

vyasr

Ack looks like I refreshed during an update and my comments vanished. Will start again.

vyasr · 2022-07-22T18:04:07Z

cpp/src/search/contains_table.cu

+template <typename Hasher>
+struct strong_index_hasher_adapter {
+  strong_index_hasher_adapter(Hasher const& hasher) : _hasher{hasher} {}
+
+  template <typename T>
+  __device__ inline auto operator()(T const idx) const noexcept
+  {
+    return _hasher(static_cast<size_type>(idx));
+  }
+
+ private:
+  Hasher _hasher;
+};


tl;dr you have two options that allow you to switch to a struct with public members:

Define a constructor

Explicitly specify the dependent type when you construct the adapter, e.g. strong_index_hasher_adapter<HasherT> h{...}.

Removing the constructor requires C++20 support for improved CTAD. CTAD in C++17 does not include the aggregate initializer that exists by default for aggregate types such as this one, so doing strong_index_hasher_adapter h{HasherT{}} will not infer the dependent type for the strong_index_hasher_adapter instance. This behavior is improved in C++20 (see the C++20 note in the implicitly-generated deduction guides section of the CTAD documentation). Here's an example that shows that supplying the template parameters explicitly always works, but deduction only works if you set the compiler flag to -std=c++20.

vyasr · 2022-07-22T18:04:28Z

cpp/src/search/contains_table.cu

+template <typename Hasher>
+struct strong_index_hasher_adapter {
+  strong_index_hasher_adapter(Hasher const& hasher) : _hasher{hasher} {}
+
+  template <typename T>
+  __device__ inline auto operator()(T const idx) const noexcept
+  {
+    return _hasher(static_cast<size_type>(idx));
+  }
+
+ private:
+  Hasher _hasher;
+};


Also yes, the hasher should be made const if it's public.

cpp/src/search/contains_table.cu

vyasr

Some very minor suggestions, but this looks good to me. Thanks for putting together a solution so quickly @ttnghia and @PointKernel! Fantastic work here.

cpp/src/search/contains_table.cu

vyasr · 2022-07-22T18:24:54Z

@ttnghia I think we can close #11310 and #11325 based on our discussions, correct?

ttnghia · 2022-07-22T18:33:14Z

@ttnghia I think we can close #11310 and #11325 based on our discussions, correct?

Right, I'm closing them.

abellina · 2022-07-22T20:17:53Z

Overall I see this PR as having very minor effect on the time for the NDS benchmark where it is 4 seconds slower (or less than 1% impact). I am comparing this patch (applied on top of the original refactor PR) abellina@6919828 vs a baseline based on commit: abellina@9034b1b, which was before the original refactor work.

One of the queries, query80 was statistically significant with a difference in the means of ~700ms worse. I have taken an nsys trace of the baseline and the patched up runs of this query and it's not clear from this trace what the differences are. It is something that we should look at more closely, but we feel is minor enough.

bdice · 2022-07-22T21:01:03Z

cpp/src/search/contains_table.cu

+  template <typename T,
+            typename U,
+            CUDF_ENABLE_IF(is_strong_index_type<T>() && is_strong_index_type<U>())>
+  __device__ constexpr auto operator()(T const lhs_index, U const rhs_index) const noexcept


@ttnghia Can you explain what types this functor expects to get? lhs_index_type, rhs_index_type and rhs_index_type, lhs_index_type or also things like lhs_index_type, lhs_index_type, rhs_index_type, rhs_index_type, and size_type, size_type? I want to simplify this but I don't fully understand the requirements you're aiming for here because the templates are fairly broad.

(size_type, size_type) will never be instantiated, because only strong index types are used in the code. The template function will not be enabled for this case.

Only (lhs_index_type, lhs_index_type) and (lhs_index_type, rhs_index_type) will be instantiated in this file, at this time and potentially (rhs_index_type, lhs_index_type) may be instantiated anytime in the future:

(lhs_index_type, lhs_index_type) will be used for self-comparing the haystack table, when inserting row indices of that table into the map.

(lhs_index_type, rhs_index_type) will be used for checking contains using cuco::static_map::contains. The internal implementation of cuco::static_map::contains at this moment only calls (lhs_index_type, rhs_index_type), but it doesn't guarantee to not call (rhs_index_type, lhs_index_type) in the future. Nevertheless, this function will flip the order to make sure the right indices will be passed into the underlying comparator.

Could we use three explicit operator()s, one that addresses the self-comparison lhs_index_type, lhs_index_type case, and two that address the left/right (right/left) pairings? This template logic is quite complicated. I propose:

/** * @brief An adapter functor to support strong index type for table row comparator that must be * operating on `cudf::size_type`. */ template <typename Comparator> struct strong_index_comparator_adapter { strong_index_comparator_adapter(Comparator const& comparator) : _comparator{comparator} {} __device__ constexpr auto operator()(lhs_index_type const left, lhs_index_type const right) const noexcept { return _comparator(static_cast<size_type>(left), static_cast<size_type>(right); } __device__ constexpr auto operator()(lhs_index_type const left, rhs_index_type const right) const noexcept { return _comparator(static_cast<size_type>(left), static_cast<size_type>(right); } __device__ constexpr auto operator()(rhs_index_type const right, lhs_index_type const left) const noexcept { // Reverse the order to left, right. return this->operator()(left, right); } private: Comparator const _comparator; };

Sounds good.

Wait, no. I have tried that. This will cause compiler warning: operator() defined but never referred.

That sounds like a good explanation to me! I couldn’t quite piece that together myself but I think it sounds accurate.

Sorry that is just half correct. Here we have two separate code paths:

A code path uses exprimental row operators (two_table_comparator) that doesn't need this adapter (here), and

Another code path uses the traditional cudf::row_equality for comparing rows from two tables (here). This cudf::row_equality doesn't support strong index types thus it must be wrapped in this adapter as in the link.

The reason why there are warnings is that: this adapter is a template struct, which means the compiler creates a separate version of when it is used. In this file, it is used 3 times thus there will be 3 different versions of this template struct will be created. For each version, only one overload of operator() is called (out of 3), causing warnings.

Okay, that is helpful to know. Perhaps that indicates that we should split this struct into multiple structs which only have the functions used in each case.

Ah good catch @ttnghia, thanks for the correction.

ttnghia · 2022-07-22T22:52:40Z

@gpucibot merge

ttnghia added 10 commits July 21, 2022 09:00

Complete refactor

70bba07

Fix type in numeric_limits

e2f0a3a

Merge branch 'branch-22.08' into refactor_contains

a87d532

Remove remap sentinel

0b7f6bf

Change headers order

132bb1c

Merge branch 'branch-22.08' into refactor_contains

792728e

Add has_nested_list

d747329

Add a new code path to handle non-list input

d73cd58

Small optimization

4a0460c

Fix NaN comparison

140cb20

ttnghia added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue Spark Functionality that helps Spark RAPIDS non-breaking Non-breaking change labels Jul 22, 2022

ttnghia added this to PR-WIP in v22.08 Release via automation Jul 22, 2022

ttnghia requested a review from a team as a code owner July 22, 2022 04:39

ttnghia self-assigned this Jul 22, 2022

ttnghia requested review from cwharris, mythrocks, PointKernel, abellina, bdice and hyperbolic2346 and removed request for cwharris July 22, 2022 04:39

ttnghia moved this from PR-WIP to PR-Needs review in v22.08 Release Jul 22, 2022

This comment was marked as off-topic.

Sign in to view

abellina added a commit to abellina/cudf that referenced this pull request Jul 22, 2022

Apply nghias patch from rapidsai#11330

6919828

Add overload to enforce comparator symmetry

783b93c

ttnghia added 4 commits July 22, 2022 10:18

Add doxygen

ba96035

Rename functions

a12576a

Unify operator() to avoid compiler warning

80407f0

Extract is_strong_index_type

2533e88

vyasr reviewed Jul 22, 2022

View reviewed changes

vyasr approved these changes Jul 22, 2022

View reviewed changes

cpp/src/search/contains_table.cu Show resolved Hide resolved

cpp/src/search/contains_table.cu Outdated Show resolved Hide resolved

ttnghia added 2 commits July 22, 2022 11:32

Reverse a change that was make by mistake

b5fa767

Fix comment

86d9507

This was referenced Jul 22, 2022

Temporarily reverse semi-anti-join implementation #11310

Closed

Refactor cudf::detail::contains(table_view, table_view) #11325

Closed

ttnghia requested a review from hyperbolic2346 July 22, 2022 18:34

Add @return tag

58f6aee

Rewrite comments, and minor change to build_row_bitmask

f070b02

bdice reviewed Jul 22, 2022

View reviewed changes

hyperbolic2346 approved these changes Jul 22, 2022

View reviewed changes

v22.08 Release automation moved this from PR-Needs review to PR-Reviewer approved Jul 22, 2022

abellina approved these changes Jul 22, 2022

View reviewed changes

rapids-bot bot merged commit 204218a into rapidsai:branch-22.08 Jul 22, 2022

v22.08 Release automation moved this from PR-Reviewer approved to Done Jul 22, 2022

ttnghia deleted the add_code_path_to_contains branch July 22, 2022 23:09

ttnghia mentioned this pull request Jul 26, 2022

[FEA] Replace cuco::static_multimap by cuco::static_map in semi-anti-join #11313

Closed

GregoryKimball mentioned this pull request Oct 3, 2022

[FEA] Implement full support for nested types #11844

Closed

ttnghia mentioned this pull request Dec 19, 2022

Updating stream_compaction/unique to use new row comparators #12159

Merged

3 tasks

GregoryKimball mentioned this pull request Jan 23, 2023

[FEA] Refactor experimental/row_operators.cuh and make it default #12593

Open

4 tasks

ttnghia mentioned this pull request Apr 11, 2023

[FEA] Do not depend on the internal implementation of cuco:: maps for argument order #13116

Closed

vyasr mentioned this pull request Apr 24, 2023

Update contains_table to experimental row hasher and equality comparator #13119

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix performance issue and add a new code path to `cudf::detail::contains` #11330

Fix performance issue and add a new code path to `cudf::detail::contains` #11330

ttnghia commented Jul 22, 2022 •

edited

This comment was marked as off-topic.

vyasr left a comment

vyasr Jul 22, 2022

vyasr Jul 22, 2022

vyasr left a comment

vyasr commented Jul 22, 2022

ttnghia commented Jul 22, 2022

abellina commented Jul 22, 2022 •

edited

bdice Jul 22, 2022 •

edited

ttnghia Jul 22, 2022 •

edited

bdice Jul 22, 2022

ttnghia Jul 22, 2022

ttnghia Jul 22, 2022 •

edited

bdice Jul 23, 2022

ttnghia Jul 23, 2022 •

edited

ttnghia Jul 23, 2022

bdice Jul 23, 2022

vyasr Jul 26, 2022

ttnghia commented Jul 22, 2022

Fix performance issue and add a new code path to cudf::detail::contains #11330

Fix performance issue and add a new code path to cudf::detail::contains #11330

Conversation

ttnghia commented Jul 22, 2022 • edited

This comment was marked as off-topic.

vyasr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vyasr left a comment

Choose a reason for hiding this comment

vyasr commented Jul 22, 2022

ttnghia commented Jul 22, 2022

abellina commented Jul 22, 2022 • edited

bdice Jul 22, 2022 • edited

Choose a reason for hiding this comment

ttnghia Jul 22, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Jul 22, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Jul 23, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia commented Jul 22, 2022

Fix performance issue and add a new code path to `cudf::detail::contains` #11330

Fix performance issue and add a new code path to `cudf::detail::contains` #11330

ttnghia commented Jul 22, 2022 •

edited

abellina commented Jul 22, 2022 •

edited

bdice Jul 22, 2022 •

edited

ttnghia Jul 22, 2022 •

edited

ttnghia Jul 22, 2022 •

edited

ttnghia Jul 23, 2022 •

edited