-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support null_policy::EXCLUDE for COLLECT rolling aggregation #7264
Support null_policy::EXCLUDE for COLLECT rolling aggregation #7264
Conversation
The build failure might be because
|
I just checked that there isn't a pressing need to similarly exclude |
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7264 +/- ##
==============================================
Coverage ? 82.19%
==============================================
Files ? 100
Lines ? 16968
Branches ? 0
==============================================
Hits ? 13947
Misses ? 3021
Partials ? 0 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jake has almost covered all essentials, have only minor question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few suggestions, already looks great!
(Instead of modifying in place.)
1. Iterator-based iterator_with_null_at(). 2. purge_null_values() interface.
@gpucibot merge |
Thanks for the reviews, all. I've learnt a bit on this one. |
Since cudf supports skipping null values by PRs rapidsai/cudf#7264, and rapidsai/cudf#7457. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
This PR is to support skipping nulls for `collect ` aggregation in JVM by creating a new class `CollectAggregation` who accepts a `NullPolicy ` argument indicating whether to include nulls. Skipping nulls has already been supported by `collect ` aggregation with rolling in native (#7264), so this PR just exposes the feaure in JVM. This PR also introduces `NullPolicy ` and updates the related aggregates. Signed-off-by: firestarman <firestarmanllc@gmail.com> Authors: - Liangcai Li (@firestarman) Approvers: - Robert (Bobby) Evans (@revans2) - MithunR (@mythrocks) URL: #7457
Since cudf supports skipping null values by PRs rapidsai/cudf#7264, and rapidsai/cudf#7457. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Since cudf supports skipping null values by PRs rapidsai/cudf#7264, and rapidsai/cudf#7457. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
This PR is to support skipping nulls for `collect ` aggregation in JVM by creating a new class `CollectAggregation` who accepts a `NullPolicy ` argument indicating whether to include nulls. Skipping nulls has already been supported by `collect ` aggregation with rolling in native (rapidsai#7264), so this PR just exposes the feaure in JVM. This PR also introduces `NullPolicy ` and updates the related aggregates. Signed-off-by: firestarman <firestarmanllc@gmail.com> Authors: - Liangcai Li (@firestarman) Approvers: - Robert (Bobby) Evans (@revans2) - MithunR (@mythrocks) URL: rapidsai#7457
Since cudf supports skipping null values by PRs rapidsai/cudf#7264, and rapidsai/cudf#7457. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Since cudf supports skipping null values by PRs rapidsai/cudf#7264, and rapidsai/cudf#7457. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Closes #7258.
#7189 implements
COLLECT
aggregations to be done from window functions. The semantics of how null input rows are handled are consistent with CUDF semantics.E.g.
Note that the null element (
∅
) is replicated in the first 3 rows of the output.SparkSQL (and Hive, and other big data SQL systems) have different semantics, in that all null elements are purged. The output for the same operation should yield the following:
CUDF should allow the
COLLECT
aggregation to be constructed with an optionalnull_policy
argument (with defaultINCLUDE
). TheCOLLECT
window function should check the policy, and filter out null list-elements a posteriori.