Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-table comparators with strong index types #10730

Merged
merged 35 commits into from
May 18, 2022

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Apr 25, 2022

This PR resolves #10508. It introduces two-table lexicographic row comparators with strongly typed index types. Given tables lhs and rhs, the two_table_comparator can create a device comparator whose strongly typed call operator can compare bidirectionally: lhs[i] < rhs[j] and rhs[i] < lhs[j]. The strong typing indicates which index belongs to which table.

This PR also contains a sample implementation in search_ordered.cu, which implements lower_bound and upper_bound algorithms.

@bdice bdice added libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function labels Apr 25, 2022
@bdice bdice self-assigned this Apr 25, 2022
@bdice bdice added this to PR-WIP in v22.06 Release via automation Apr 25, 2022
@bdice bdice changed the title Use a strongly typed index for two-table comparators Two-table comparators with strong index types May 2, 2022
@codecov

This comment was marked as off-topic.

Comment on lines 449 to 450
__device__ bool operator()(lhs_index_type const lhs_index,
rhs_index_type const rhs_index) const noexcept
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need overloads for (lhs_index_type, lhs_index_type) and (rhs_index_type, rhs_index_type) as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible — but I haven’t found an algorithm that needs those yet. For example, merge requires sorted inputs, so the lhs/lhs and rhs/rhs overloads are never needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, overloads for (lhs_index_type, lhs_index_type) and (rhs_index_type, rhs_index_type) would require this class to own two additional comparators (self comparators for left and right tables).

Copy link
Contributor

@ttnghia ttnghia May 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey guys! I stumbled on to this: using self_comparator with strong index types! So we are going to need (lhs_index_type, lhs_index_type) and (rhs_index_type, rhs_index_type) overloads for self_comparator.

These overloads just convert lhs_index_type or lhs_index_type into size_type and call the normal operator()(size_type).

Update: I may not need this as I have another idea to avoid using strong index type, but this is still a relevant need.

cpp/src/search/search.cu Outdated Show resolved Hide resolved
cpp/src/search/search.cu Outdated Show resolved Hide resolved
@ttnghia
Copy link
Contributor

ttnghia commented May 11, 2022

search.cu currently has merge conflict. I resolved it here: https://github.com/rapidsai/cudf/pull/10656/files#diff-664305c8b9b69efc1edae1d07f4466e6b9af6a3f41760497eb7414ee3ad9b610, and some temporary fix for unit tests here: f3dd8e7

It seems that we must rely on the upcoming weak order comparator from #10793.

@bdice bdice requested a review from devavret May 16, 2022 19:35
Copy link
Contributor

@ttnghia ttnghia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve, pending CI tests to pass all 😃

v22.06 Release automation moved this from PR-WIP to PR-Reviewer approved May 17, 2022
Copy link
Contributor

@jrhemstad jrhemstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean and concise PR!

@bdice
Copy link
Contributor Author

bdice commented May 18, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 7f9d51b into rapidsai:branch-22.06 May 18, 2022
v22.06 Release automation moved this from PR-Reviewer approved to Done May 18, 2022
Comment on lines +78 to +79
template <typename Index, typename Underlying = std::underlying_type_t<Index>>
struct strong_index_iterator : public thrust::iterator_facade<strong_index_iterator<Index>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice I realize that I am late to the party here, but this struct has zero docs. A new reader of the code won't have any idea why this struct exists since the purpose is not conveyed by its implementation, but rather by its existence alone. Could you make a PR with some brief documentation (or maybe just stick that into one of the downstream PRs that either you or @ttnghia is working on)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. I can create a standalone PR for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapids-bot bot pushed a commit that referenced this pull request May 19, 2022
This adds strong index types for equality comparator, along with #10730 to unblock #10548, #10656, and several others nested type feature requests.

Authors:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Bradley Dice (https://github.com/bdice)

URL: #10883
rapids-bot bot pushed a commit that referenced this pull request Aug 17, 2022
This extends the `cudf::contains` API to support nested types (lists + structs) with arbitrarily nested levels. As such, `cudf::contains` will work with literally any type of input data.

In addition, this fixes null handling of `cudf::contains` with structs column + struct scalar input when the structs column contains null rows at the top level while the scalar key is valid but all nulls at children levels.

Closes: #8965
Depends on:
 * #10730
 * #10883
 * #10802
 * #10997
 * NVIDIA/cuCollections#172
 * NVIDIA/cuCollections#173
 * #11037
 * #11356

Authors:
  - Nghia Truong (https://github.com/ttnghia)
  - Devavret Makkar (https://github.com/devavret)
  - Bradley Dice (https://github.com/bdice)
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Bradley Dice (https://github.com/bdice)
  - Yunsong Wang (https://github.com/PointKernel)

URL: #10656
rapids-bot bot pushed a commit that referenced this pull request Aug 22, 2022
This extends the `lists::contains` API to support nested types (lists + structs) with arbitrarily nested levels. As such, `lists::contains` will work with literally any type of input data.

In addition, the related implementation has been significantly refactored to facilitate adding new implementation.

Closes #8958.
Depends on:
 * #10730
 * #10883
 * #10999
 * #11019
 * #11037

Authors:
  - Nghia Truong (https://github.com/ttnghia)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - MithunR (https://github.com/mythrocks)
  - Bradley Dice (https://github.com/bdice)

URL: #10548
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

[FEA] row_comparators should use strongly typed index types to ensure commutativity
5 participants