New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(agg,over_window): support first_value
, last_value
and refactor LogicalAgg
#10740
Conversation
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
last_value
and refactor LogicalAggfirst_value
, last_value
and refactor LogicalAgg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sqlsmith part LGTM
Looks there might be some bugs introduced though, see fuzzing-1.log in build artifacts - https://buildkite.com/risingwavelabs/pull-request/builds/26777#01892404-766d-4c48-a47f-49224f2ee6e0:
CREATE MATERIALIZED VIEW m14 AS
SELECT
approx_count_distinct ('5D1ZlMzATx') AS col_0
FROM
m8 AS t_0
WHERE
TRUE
GROUP BY
t_0.col_0,
t_0.col_2
HAVING
TRUE
|
| AggKind::JsonbObjectAgg | ||
| AggKind::PercentileCont | ||
| AggKind::PercentileDisc | ||
| AggKind::Mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't approx count distinct fall under this category? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And StringAgg
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be the source of errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the src/meta/src/stream/stream_graph/fragment.rs:279:9
is because visit_stream_node_tables_inner
didn't correctly visit state table of DistinctApproxCount
, will dive in later. While OTOH 2-phase sum0(hyperloglog)
may indeed not equal to hyperloglog
, not quite sure. I'll add it to this category.
For StringAgg
, I was thinking that when user doesn't provide ORDER BY
, the call can actually be 2-phased by just concatenating strings in arbitrary nondeterministic order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For StringAgg, I was thinking that when user doesn't provide ORDER BY, the call can actually be 2-phased by just concatenating strings in arbitrary nondeterministic order.
Agree on correctness, although I cant think of a usecase.
I suppose there's some other bug around it then. Maybe it is executing in non-append-only. We should add another check for that.
An e2e test will be great to test two-phase string_agg as well, I don't think it is currently tested. Same goes for any other two-phase agg we have just started supporting in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems 2-phase StringAgg has other problem, that is the final phase can't get delim
argument from input. Maybe better to disable it again for now😇.
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
Signed-off-by: Richard Chien <stdrc@outlook.com>
/// Macros to generate match arms for [`AggKind`]. | ||
/// IMPORTANT: These macros must be carefully maintained especially when adding new [`AggKind`] | ||
/// variants. | ||
pub(crate) mod agg_kinds { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can infer or allow developers to specify these properties in the #[aggregate] macro later. 🤔
…8924bf-b1eb-4dd2-bf75-efef7f554a23 Signed-off-by: Richard Chien <stdrc@outlook.com>
Codecov Report
@@ Coverage Diff @@
## main #10740 +/- ##
=======================================
Coverage 70.14% 70.15%
=======================================
Files 1296 1296
Lines 221698 221748 +50
=======================================
+ Hits 155516 155569 +53
+ Misses 66182 66179 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 12 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
last_value
aggregate function and hence corresponding window functionfirst_value
andlast_value
to userChecklist
[ ] My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future)../risedev check
(or alias,./risedev c
)[ ] My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)[ ] My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the [details](https://github.com/risingwavelabs/risingwave/blob/main/CONTRIBUTING.md))Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note
first_value
andlast_value
aggregate function and window function