Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. #9588

bdice · 2021-11-03T18:09:11Z

This PR is a first step towards #9552. I split the host and device code for segmented_count_set_bits, so that APIs with existing indices on the device can call the device function directly. Along the way, I discussed some pain points in the current design of bit counting functions with @jrhemstad and @mythrocks and made the following changes:

Fixed inconsistencies in the behavior of bit counting functions (count_set_bits returned zero when bitmask==nullptr, but segmented_count_set_bits returned segment lengths) by raising an error when bitmask==nullptr (breaking change)
Moved all bit counting functions to detail namespaces (breaking change)
Separated "validity" logic from pure bit counting through the introduction of new valid_count, null_count, segmented_valid_count, segmented_null_count detail functions. These functions treat all elements as valid when bitmask==nullptr and otherwise return the same way as the corresponding bit counting functions.
Split device-side code from host-side code to enable future work on segmented nullmask reductions over lists, with indices (offsets) already on the device.
Updated all calling code (and added tests) to use the new valid/null count detail functions.

mythrocks · 2021-11-03T18:28:20Z

I think I can change this behavior for all internal libcudf uses.

For clarity's sake, your suggestion is to change offsets to 2N for all internal APIs that use offsets?

mythrocks · 2021-11-03T18:51:51Z

I don't see a use case where we would want to have arbitrary (e.g. overlapping, non-monotonic) segment indices rather than the style of indexing used by list offsets.

I'm trying to understand why we'd need support for 2N indices. I was wondering if it might be not for overlapping ranges, but instead for ranges that "don't touch".

Supporting overlapping ranges might be useful to speed up grouped_rolling_window() for COUNT_VALID. :]

bdice · 2021-11-03T18:54:19Z

I think I can change this behavior for all internal libcudf uses.

For clarity's sake, your suggestion is to change offsets to 2N for all internal APIs that use offsets?

I meant the opposite -- I hoped to change this specific API for segmented_count_set_bits to use N+1 indices instead of 2N. There is a case where this 2N format is used that probably can't be changed: slice. I suppose we will just have to pre-process N+1 indices into the 2N format for this function.

nvdbaranec · 2021-11-03T21:02:28Z

2N. There is a case where this 2N format is used that probably can't be changed: slice. I suppose we will just have to pre-process N+1 indices into the 2N format for this function.

I bet slice could be changed as well. Although that's a pretty delicate function to break :) For what it's worth, I'm all for changing the interface to use offsets in the more standard sense.

…d_count_unset_bits.

jrhemstad · 2021-11-04T03:51:05Z

What we should really do is make the implementation take iterators for the start/stop points. You can generate those from either 2N or N+1 offsets with transform iterators.

bdice · 2021-11-04T04:01:27Z

What we should really do is make the implementation take iterators for the start/stop points. You can generate those from either 2N or N+1 offsets with transform iterators.

To clarify, you mean that the implementation would take two independent iterators as arguments: an iterator of "start" indices and an iterator of "end" indices? Such that (very loosely speaking) the 2N implementations would be strided iterators (start = even indices / end = odd indices) and the N+1 implementations would be iterators returning [0, 1, ..., N] and [1, 2, ... N+1]. That's viable. However, I would prefer to keep this PR simple and not break conventions here. I'd push that to a future issue / design discussion for all segmented APIs.

codecov · 2021-11-04T04:26:27Z

Codecov Report

Merging #9588 (f5e9d6a) into branch-22.02 (967a333) will decrease coverage by 0.06%.
The diff coverage is 4.16%.

❗ Current head f5e9d6a differs from pull request most recent head 3f689bc. Consider uploading reports for the commit 3f689bc to get more accurate results

@@               Coverage Diff                @@
##           branch-22.02    #9588      +/-   ##
================================================
- Coverage         10.49%   10.42%   -0.07%     
================================================
  Files               119      119              
  Lines             20305    20479     +174     
================================================
+ Hits               2130     2134       +4     
- Misses            18175    18345     +170

Impacted Files	Coverage Δ
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/_base_index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/column.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/datetime.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/column/string.py	`0.00% <ø> (ø)`
python/cudf/cudf/core/dataframe.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/groupby/groupby.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/index.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/indexed_frame.py	`0.00% <0.00%> (ø)`
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05c209f...3f689bc. Read the comment docs.

jrhemstad · 2021-11-04T14:43:21Z

I bet slice could be changed as well

We definitely cannot change slice from taking N potentially non-overlapping ranges. We have separate split and slice APIs for a reason.

To clarify, you mean that the implementation would take two independent iterators as arguments: an iterator of "start" indices and an iterator of "end" indices?

Yes, that would match CUB segmented reduce.

However, I would prefer to keep this PR simple and not break conventions here. I'd push that to a future issue / design discussion for all segmented APIs.

As is, this PR will preclude using iterators for the start/end indices by moving everything to a .cu file.

bdice · 2021-11-04T15:21:02Z

@jrhemstad Good points. I'll think some more on this and try to come up with a better approach.

…ull-functions-2

Co-authored-by: Jake Hemstad <jhemstad@nvidia.com>

…ull-functions

bdice · 2021-12-09T19:39:46Z

cpp/include/cudf/detail/null_mask.cuh


-  stream.synchronize();  // now ret is valid.
+struct index_alternator {


There are at least three places in this PR where the code would benefit from a strided iterator (input validation in validate_segmented_indices, constructing first/last index iterators from the iterator of segmented indices in segmented_count_bits, and constructing a vector of segment lengths in segmented_valid_count). I would like to work on https://github.com/NVIDIA/thrust/issues/912 in the near future. I might introduce a strided iterator into cudf/detail/iterator.cuh to test the design before working on a Thrust implementation.

mythrocks

Some minor nitpicks. LGTM!

cpp/include/cudf/detail/null_mask.cuh

cpp/tests/bitmask/bitmask_tests.cpp

bdice · 2021-12-10T23:05:37Z

@gpucibot merge

Fixes #9164. ### Prelude `lists::contains()` (introduced in #7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position. ### `index_of()` This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column. The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported. As with `lists::contains()`, nested types are not supported as search keys in `lists::index_of()`. If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if: 1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null. 2. The list row `lists[i]` is null In all other cases, `output[i]` should contain a non-negative value. ### Semantic changes for `lists::contains()` This commit also modifies the semantics of `lists::contains()`: it will now return nulls only for the following cases: 1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null. 2. The list row `lists[i]` is null In all other cases, a non-null bool is returned. Specifically `lists::contains()` no longer conforms to SQL semantics of returning `NULL` for list rows that don't contain the search key, while simultaneously containing nulls. In this case, `false` is returned. ### `lists::contains_null_elements()` A new function has been introduced to check if each list row contains null elements. The semantics are similar to `lists::contains()`, in that the column returned is BOOL8 typed: 1. If even 1 element in a list row is null, the returned row is `true`. 2. If no element is null, the returned row is `false`. 3. If the list row is null, the returned row is `null`. 4. If the list row is empty, the returned row is `false`. The current implementation is an inefficient placeholder, to be replaced once (#9588) is available. It is included here to reconstruct the SQL semantics dropped from `lists::contains()`. Authors: - MithunR (https://github.com/mythrocks) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Jason Lowe (https://github.com/jlowe) - Mark Harris (https://github.com/harrism) - Conor Hoekstra (https://github.com/codereport) URL: #9510

bdice added 11 commits October 28, 2021 13:24

Separate device function for segmented_count_set_bits.

99ab7cc

Simplify to_word_index_functor.

1fa8421

Intermediate work.

3a7f6c3

Fix host implementation of segmented_count_set_bits.

2f82f0d

Remove vector utility function - it doesn't make sense in this scope.

7c2adf5

Use indexing.

aa184a6

Move variable declaration.

cafa1e9

Move single-use functors and kernels from .cuh to .cu.

0fcf354

Minor refactoring for constness, better comments, clarity.

652ebf2

Clarify segment behavior.

c072278

Add comments.

3961fb3

bdice requested a review from a team as a code owner November 3, 2021 18:09

bdice requested review from mythrocks and davidwendt November 3, 2021 18:09

bdice self-assigned this Nov 3, 2021

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 3, 2021

bdice added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 3, 2021

bdice added 2 commits November 3, 2021 18:49

Add template for std::vector.

5caeddf

Unify detail implementations of segmented_count_set_bits and segmente…

3c343b3

…d_count_unset_bits.

bdice and others added 7 commits December 8, 2021 17:53

Intermediate work.

91a827a

Fix host_span functions.

66e000e

Merge remote-tracking branch 'upstream/branch-22.02' into segmented-n…

7a1ae12

…ull-functions-2

Fix constructor.

3884148

Fix tests for bitmask==nullptr.

2433275

Use thrust::get.

1e7d35d

Co-authored-by: Jake Hemstad <jhemstad@nvidia.com>

Merge remote-tracking branch 'upstream/branch-22.02' into segmented-n…

bbd045f

…ull-functions

bdice force-pushed the segmented-null-functions branch from 44674ea to bbd045f Compare December 9, 2021 19:29

bdice commented Dec 9, 2021

View reviewed changes

bdice changed the title ~~Split host/device side code in segmented_count_set_bits.~~ Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. Dec 9, 2021

Fix docs.

b4384d3

bdice added breaking Breaking change and removed non-breaking Non-breaking change labels Dec 9, 2021

bdice added 4 commits December 9, 2021 13:30

Add tests for valid/null counts.

254a6cc

Use ElementsAreArray instead of ContainerEq (order matters).

837ebe2

Add missing test.

e9d9c0e

Fix clang-format.

1ee405b

bdice requested a review from jrhemstad December 9, 2021 21:48

bdice mentioned this pull request Dec 9, 2021

[FEA] libcudf functions for segmented_has_nulls / has_null_elements(list_view) / segmented_bitmask_reduction #9552

Closed

jrhemstad approved these changes Dec 9, 2021

View reviewed changes

mythrocks approved these changes Dec 10, 2021

View reviewed changes

cpp/include/cudf/detail/null_mask.cuh Outdated Show resolved Hide resolved

cpp/include/cudf/detail/null_mask.cuh Show resolved Hide resolved

cpp/tests/bitmask/bitmask_tests.cpp Show resolved Hide resolved

v22.02 Release automation moved this from PR-Needs review to PR-Reviewer approved Dec 10, 2021

Reorder includes.

3f689bc

rapids-bot bot merged commit b359a0f into rapidsai:branch-22.02 Dec 10, 2021

v22.02 Release automation moved this from PR-Reviewer approved to Done Dec 10, 2021

bdice mentioned this pull request Jan 6, 2022

Support segmented reductions and null mask reductions #9621

Merged

bdice mentioned this pull request Feb 23, 2022

Add device create_sequence_table for benchmarks #10300

Merged

bdice mentioned this pull request Jul 7, 2022

Struct binary comparison op functionality for spark rapids #11153

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. #9588

Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. #9588

bdice commented Nov 3, 2021 •

edited

Loading

mythrocks commented Nov 3, 2021 •

edited

Loading

mythrocks commented Nov 3, 2021

bdice commented Nov 3, 2021

nvdbaranec commented Nov 3, 2021 •

edited

Loading

jrhemstad commented Nov 4, 2021

bdice commented Nov 4, 2021 •

edited

Loading

codecov bot commented Nov 4, 2021 •

edited

Loading

jrhemstad commented Nov 4, 2021

bdice commented Nov 4, 2021

bdice Dec 9, 2021 •

edited

Loading

mythrocks left a comment

bdice commented Dec 10, 2021


		stream.synchronize(); // now ret is valid.
		struct index_alternator {

Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. #9588

Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. #9588

Conversation

bdice commented Nov 3, 2021 • edited Loading

mythrocks commented Nov 3, 2021 • edited Loading

mythrocks commented Nov 3, 2021

bdice commented Nov 3, 2021

nvdbaranec commented Nov 3, 2021 • edited Loading

jrhemstad commented Nov 4, 2021

bdice commented Nov 4, 2021 • edited Loading

codecov bot commented Nov 4, 2021 • edited Loading

Codecov Report

jrhemstad commented Nov 4, 2021

bdice commented Nov 4, 2021

bdice Dec 9, 2021 • edited Loading

Choose a reason for hiding this comment

mythrocks left a comment

Choose a reason for hiding this comment

bdice commented Dec 10, 2021

bdice commented Nov 3, 2021 •

edited

Loading

mythrocks commented Nov 3, 2021 •

edited

Loading

nvdbaranec commented Nov 3, 2021 •

edited

Loading

bdice commented Nov 4, 2021 •

edited

Loading

codecov bot commented Nov 4, 2021 •

edited

Loading

bdice Dec 9, 2021 •

edited

Loading