Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace unnecessary uses of UNKNOWN_NULL_COUNT #13102

Merged
merged 14 commits into from
Apr 10, 2023

Conversation

vyasr
Copy link
Contributor

@vyasr vyasr commented Apr 10, 2023

Description

This PR replaces uses of cudf::UNKNOWN_NULL_COUNT where the null count is either already known or trivially computed.

Contributes to #11968

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Apr 10, 2023
@vyasr vyasr requested a review from a team as a code owner April 10, 2023 16:15
@vyasr vyasr self-assigned this Apr 10, 2023
@vyasr vyasr added this to the Enable streams milestone Apr 10, 2023
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
@bdice
Copy link
Contributor

bdice commented Apr 10, 2023

It'd be nice to know if we're eliminating many "real-world" calls to the null_count kernel by supplying these values. Not sure how to measure that well, but perhaps benchmarks would show it?

@vyasr
Copy link
Contributor Author

vyasr commented Apr 10, 2023

It'd be nice to know if we're eliminating many "real-world" calls to the null_count kernel by supplying these values. Not sure how to measure that well, but perhaps benchmarks would show it?

I agree on both counts, that is an interesting question, and one that is hard to answer. A good starting point would be defining what really constitutes a real-world use case. Also, the changes in this PR come in two flavors: 1) precomputing a null count, in which case we're doing more work up front under the assumption that it will eventually be necessary, and 2) propagating a known null count, which is a strict reduction in work. The latter is more common in this PR and is the case where we'd hope for kernel reduction of course, but depending on the workflow being benchmarked that affected might be washed out by changes of the first type in instances where the null count is not actually used. Hard to say without asking a pretty precise question I suspect.

@vyasr
Copy link
Contributor Author

vyasr commented Apr 10, 2023

/merge

@rapids-bot rapids-bot bot merged commit cab6522 into rapidsai:branch-23.06 Apr 10, 2023
@vyasr vyasr deleted the feat/set_null_count_no_unknown branch April 10, 2023 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants