Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. #9588

Merged
merged 57 commits into from
Dec 10, 2021

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Nov 3, 2021

This PR is a first step towards #9552. I split the host and device code for segmented_count_set_bits, so that APIs with existing indices on the device can call the device function directly. Along the way, I discussed some pain points in the current design of bit counting functions with @jrhemstad and @mythrocks and made the following changes:

  • Fixed inconsistencies in the behavior of bit counting functions (count_set_bits returned zero when bitmask==nullptr, but segmented_count_set_bits returned segment lengths) by raising an error when bitmask==nullptr (breaking change)

  • Moved all bit counting functions to detail namespaces (breaking change)

  • Separated "validity" logic from pure bit counting through the introduction of new valid_count, null_count, segmented_valid_count, segmented_null_count detail functions. These functions treat all elements as valid when bitmask==nullptr and otherwise return the same way as the corresponding bit counting functions.

  • Split device-side code from host-side code to enable future work on segmented nullmask reductions over lists, with indices (offsets) already on the device.

  • Updated all calling code (and added tests) to use the new valid/null count detail functions.

@bdice bdice requested a review from a team as a code owner November 3, 2021 18:09
@bdice bdice self-assigned this Nov 3, 2021
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 3, 2021
@bdice bdice added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 3, 2021
@mythrocks
Copy link
Contributor

mythrocks commented Nov 3, 2021

I think I can change this behavior for all internal libcudf uses.

For clarity's sake, your suggestion is to change offsets to 2N for all internal APIs that use offsets?

@mythrocks
Copy link
Contributor

I don't see a use case where we would want to have arbitrary (e.g. overlapping, non-monotonic) segment indices rather than the style of indexing used by list offsets.

I'm trying to understand why we'd need support for 2N indices. I was wondering if it might be not for overlapping ranges, but instead for ranges that "don't touch".

Supporting overlapping ranges might be useful to speed up grouped_rolling_window() for COUNT_VALID. :]

@bdice
Copy link
Contributor Author

bdice commented Nov 3, 2021

I think I can change this behavior for all internal libcudf uses.

For clarity's sake, your suggestion is to change offsets to 2N for all internal APIs that use offsets?

I meant the opposite -- I hoped to change this specific API for segmented_count_set_bits to use N+1 indices instead of 2N. There is a case where this 2N format is used that probably can't be changed: slice. I suppose we will just have to pre-process N+1 indices into the 2N format for this function.

@nvdbaranec
Copy link
Contributor

nvdbaranec commented Nov 3, 2021

2N. There is a case where this 2N format is used that probably can't be changed: slice. I suppose we will just have to pre-process N+1 indices into the 2N format for this function.

I bet slice could be changed as well. Although that's a pretty delicate function to break :) For what it's worth, I'm all for changing the interface to use offsets in the more standard sense.

@jrhemstad
Copy link
Contributor

What we should really do is make the implementation take iterators for the start/stop points. You can generate those from either 2N or N+1 offsets with transform iterators.

@bdice
Copy link
Contributor Author

bdice commented Nov 4, 2021

What we should really do is make the implementation take iterators for the start/stop points. You can generate those from either 2N or N+1 offsets with transform iterators.

To clarify, you mean that the implementation would take two independent iterators as arguments: an iterator of "start" indices and an iterator of "end" indices? Such that (very loosely speaking) the 2N implementations would be strided iterators (start = even indices / end = odd indices) and the N+1 implementations would be iterators returning [0, 1, ..., N] and [1, 2, ... N+1]. That's viable. However, I would prefer to keep this PR simple and not break conventions here. I'd push that to a future issue / design discussion for all segmented APIs.

@codecov
Copy link

codecov bot commented Nov 4, 2021

Codecov Report

Merging #9588 (f5e9d6a) into branch-22.02 (967a333) will decrease coverage by 0.06%.
The diff coverage is 4.16%.

❗ Current head f5e9d6a differs from pull request most recent head 3f689bc. Consider uploading reports for the commit 3f689bc to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.02    #9588      +/-   ##
================================================
- Coverage         10.49%   10.42%   -0.07%     
================================================
  Files               119      119              
  Lines             20305    20479     +174     
================================================
+ Hits               2130     2134       +4     
- Misses            18175    18345     +170     
Impacted Files Coverage Δ
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/_base_index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/column.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/datetime.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/column/string.py 0.00% <ø> (ø)
python/cudf/cudf/core/dataframe.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/groupby/groupby.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/index.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/indexed_frame.py 0.00% <0.00%> (ø)
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05c209f...3f689bc. Read the comment docs.

@jrhemstad
Copy link
Contributor

I bet slice could be changed as well

We definitely cannot change slice from taking N potentially non-overlapping ranges. We have separate split and slice APIs for a reason.

To clarify, you mean that the implementation would take two independent iterators as arguments: an iterator of "start" indices and an iterator of "end" indices?

Yes, that would match CUB segmented reduce.

However, I would prefer to keep this PR simple and not break conventions here. I'd push that to a future issue / design discussion for all segmented APIs.

As is, this PR will preclude using iterators for the start/end indices by moving everything to a .cu file.

@bdice
Copy link
Contributor Author

bdice commented Nov 4, 2021

@jrhemstad Good points. I'll think some more on this and try to come up with a better approach.


stream.synchronize(); // now ret is valid.
struct index_alternator {
Copy link
Contributor Author

@bdice bdice Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are at least three places in this PR where the code would benefit from a strided iterator (input validation in validate_segmented_indices, constructing first/last index iterators from the iterator of segmented indices in segmented_count_bits, and constructing a vector of segment lengths in segmented_valid_count). I would like to work on https://github.com/NVIDIA/thrust/issues/912 in the near future. I might introduce a strided iterator into cudf/detail/iterator.cuh to test the design before working on a Thrust implementation.

@bdice bdice changed the title Split host/device side code in segmented_count_set_bits. Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. Dec 9, 2021
@bdice bdice added breaking Breaking change and removed non-breaking Non-breaking change labels Dec 9, 2021
Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nitpicks. LGTM!

cpp/include/cudf/detail/null_mask.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/null_mask.cuh Show resolved Hide resolved
cpp/tests/bitmask/bitmask_tests.cpp Show resolved Hide resolved
v22.02 Release automation moved this from PR-Needs review to PR-Reviewer approved Dec 10, 2021
@bdice
Copy link
Contributor Author

bdice commented Dec 10, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b359a0f into rapidsai:branch-22.02 Dec 10, 2021
v22.02 Release automation moved this from PR-Reviewer approved to Done Dec 10, 2021
rapids-bot bot pushed a commit that referenced this pull request Dec 20, 2021
Fixes #9164.

### Prelude
`lists::contains()` (introduced in #7039) returns a `BOOL8` column, indicating whether the specified search_key(s) exist at all in each corresponding list row of an input LIST column. It does not return the actual position.

### `index_of()`
This commit introduces `lists::index_of()`, to return the INT32 positions of the specified search_key(s) in a LIST column.

The search keys may be searched for using either `FIND_FIRST` (which finds the position of the first occurrence), or `FIND_LAST` (which finds the last occurrence). Both column_view and scalar search keys are supported.

As with `lists::contains()`, nested types are not supported as search keys in `lists::index_of()`.

If the search_key cannot be found, that output row is set to `-1`. Additionally, the row `output[i]` is set to null if:
  1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null.
  2. The list row `lists[i]` is null

In all other cases, `output[i]` should contain a non-negative value.

### Semantic changes for `lists::contains()`
This commit also modifies the semantics of `lists::contains()`: it will now return nulls only for the following cases:
  1. The `search_key`(scalar) or `search_keys[i]`(column_view) is null.
  2. The list row `lists[i]` is null

In all other cases, a non-null bool is returned. Specifically `lists::contains()` no longer conforms to SQL semantics of returning `NULL` for list rows that don't contain the search key, while simultaneously containing nulls. In this case, `false` is returned.

### `lists::contains_null_elements()`
A new function has been introduced to check if each list row contains null elements. The semantics are similar to `lists::contains()`, in that the column returned is BOOL8 typed:
  1. If even 1 element in a list row is null, the returned row is `true`.
  2. If no element is null, the returned row is `false`.
  3. If the list row is null, the returned row is `null`.
  4. If the list row is empty, the returned row is `false`.

The current implementation is an inefficient placeholder, to be replaced once (#9588) is available. It is included here to reconstruct the SQL semantics dropped from `lists::contains()`.

Authors:
  - MithunR (https://github.com/mythrocks)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Jason Lowe (https://github.com/jlowe)
  - Mark Harris (https://github.com/harrism)
  - Conor Hoekstra (https://github.com/codereport)

URL: #9510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team breaking Breaking change improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

6 participants