Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List lexicographic comparator #11129

Merged
merged 91 commits into from
Sep 12, 2022
Merged

Conversation

devavret
Copy link
Contributor

@devavret devavret commented Jun 21, 2022

Contributes to #10186

This PR enables lexicographic comparisons between list columns. The comparison is robust to arbitrary levels of nesting, but does not yet support lists of (lists of...) structs. The comparison is based on the Dremel encoding already in use in the Parquet file format. To assist reviewers, here is a reasonably complete list of the changes in this PR:

  1. A helper function to get per-column Dremel data (for list columns) when constructing a preprocessed table, which now owns the Dremel data.
  2. Updating the set of lexicographically compatible columns to now include list columns as long as they do not have any nested structs within.
  3. An implementation of lexicographic::device_row_comparator::operator() for lists. This function is the heart of the change to enable comparisons between list columns.
  4. A new benchmark for sorting that uses list data.
  5. An update to a preexisting rolling collect set test that previously failed (because it requires list comparison) but now works.
  6. New tests for list comparison.

Many iterations already happened. I just realized late that I should commit
early return and remove unnecessary statements
@ttnghia
Copy link
Contributor

ttnghia commented Sep 8, 2022

The struct performance slowing down is fine IMO.

I'm starting to review this PR. The statement above makes me a bit worried. There are many cases comparing non-nested structs (struct of basic types) in Spark using the flattening approach + basic lex comparator. Switching to using the new lex comparator for non-nested structs could cause a lot of performance regression. The Spark team may scream in panic if they see their benchmarks slow down due to it.

@ttnghia
Copy link
Contributor

ttnghia commented Sep 8, 2022

As per my argument above, if this PR changes the performance of the existing APIs (like binary search and sorting) then I would suggest not to merge this until it has been tested with the performance benchmark in spark-rapids and got confirmation of a safe-merge.

Comment on lines +77 to +113
auto const d_comparator = comparator.less<true>(nullate::DYNAMIC{has_nulls});
if (find_first) {
thrust::lower_bound(rmm::exec_policy(stream),
haystack_it,
haystack_it + haystack.num_rows(),
needles_it,
needles_it + needles.num_rows(),
out_it,
d_comparator);
} else {
thrust::upper_bound(rmm::exec_policy(stream),
haystack_it,
haystack_it + haystack.num_rows(),
needles_it,
needles_it + needles.num_rows(),
out_it,
d_comparator);
}
} else {
thrust::upper_bound(rmm::exec_policy(stream),
haystack_it,
haystack_it + haystack.num_rows(),
needles_it,
needles_it + needles.num_rows(),
out_it,
d_comparator);
auto const d_comparator = comparator.less<false>(nullate::DYNAMIC{has_nulls});
if (find_first) {
thrust::lower_bound(rmm::exec_policy(stream),
haystack_it,
haystack_it + haystack.num_rows(),
needles_it,
needles_it + needles.num_rows(),
out_it,
d_comparator);
} else {
thrust::upper_bound(rmm::exec_policy(stream),
haystack_it,
haystack_it + haystack.num_rows(),
needles_it,
needles_it + needles.num_rows(),
out_it,
d_comparator);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously I tried to use a lambda to simplify the code:

  auto const do_search = [find_first](auto&&... args) {
    if (find_first) {
      thrust::lower_bound(std::forward<decltype(args)>(args)...);
    } else {
      thrust::upper_bound(std::forward<decltype(args)>(args)...);
    }
  };
  do_search(rmm::exec_policy(stream),
            count_it,
            count_it + haystack.num_rows(),
            count_it,
            count_it + needles.num_rows(),
            out_it,
            comp);

Unfortunately using such variadic template causes false compiler warnings. I want to bring it back here in case you guys may have a new idea for improving this.

Copy link
Contributor

@ttnghia ttnghia Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if not, we can also reduce the code by half by writing a lambda for the block lower_bound/upper_bound and calling it twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #11667 for discussion on how we can improve this. The plan is to do that in a follow-up so that we don't hold up this feature any longer for refactoring-like tasks.

@vyasr
Copy link
Contributor

vyasr commented Sep 8, 2022

As per my argument above, if this PR changes the performance of the existing APIs (like binary search and sorting) then I would suggest not to merge this until it has been tested with the performance benchmark in spark-rapids and got confirmation of a safe-merge.

This makes me sad, but it is a good point. I agree that we probably want to get this tested by spark-rapids. Can you follow up with the appropriate person (perhaps @abellina?) to get those benchmarks run?

The old comparator will eventually be phased out entirely in favor of the new comparator. If there are cases where the new comparator's performance measures up poorly against the old one, then we'll have to consider ways to optimize it. I take your point about the new comparator perhaps slowing down the case of structs that don't contain other structs, but I'm honestly not sure how far we want to go here. Truly providing peak performance for every possible combination would probably require us to have many different template overloads of this type, each with different operator overloads enabled or disabled from compilation (lists without structs, structs without lists, no lists or structs, different types of mixed structs/lists) and perhaps even based on whether or not nulls are present. Going down this route will, I suspect, very quickly start to incur a much higher maintenance burden than what we currently have since these performance characteristics are also likely to change with each compiler version. In my view it is expected that enabling more features for comparing nested types could have performance costs, and to some extent that is acceptable.

CC @GregoryKimball

@ttnghia ttnghia added 5 - DO NOT MERGE Hold off on merging; see PR for details and removed 5 - DO NOT MERGE Hold off on merging; see PR for details labels Sep 9, 2022
@vyasr
Copy link
Contributor

vyasr commented Sep 12, 2022

@gpucibot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants