-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(optimizer): implement 2-phase stream agg #3320
Conversation
3bb9200
to
ae99ce4
Compare
Currently failing:
Fails as it tries to serialize The stream agg that should be failing is probably the second one, since the first should work as per normal. Backtrace + prints of pk row and order types:
|
ae99ce4
to
5b16473
Compare
I guess the issue is caused by wrong induction of state table schema. cc @BowenXiao1999 any ideas? |
Let me take a look |
From your query, it should be simple agg + not append_only + value states. So the generate_table_state should infer order type as 0 length. |
For value states (except min, max), if there is no group key, the pk must be empty. |
Closing this in favour of stateful approach to be used: https://singularity-data.quip.com/KtaRA6CspqRK#RBEADAF6kqO. It will be able to handle non-append only min / max too. |
I think we'll still need this. The stateless approach can only be used on append-only min max and value state, and the stateful approach can only be used on max min topN. |
5b16473
to
7cd2ddc
Compare
fda67fb
to
4d45994
Compare
src/frontend/src/optimizer/plan_node/stream_local_simple_agg.rs
Outdated
Show resolved
Hide resolved
Codecov Report
@@ Coverage Diff @@
## main #3320 +/- ##
==========================================
+ Coverage 74.28% 74.30% +0.01%
==========================================
Files 768 769 +1
Lines 106684 106775 +91
==========================================
+ Hits 79250 79338 +88
- Misses 27434 27437 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
src/frontend/src/optimizer/plan_node/stream_local_simple_agg.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM, good work!
src/frontend/src/optimizer/plan_node/stream_local_simple_agg.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM, Good work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good work!
What's changed and what's your intention?
As per title:
Summarize your change (mandatory)
Perform 2-phase agg if input is distributed.
How does this PR work? Need a brief introduction for the changed logic (optional)
When converting from logical plan to stream plan, if input is distributed, insert a
LocalAgg
andExchange(single)
to do 2-phase aggregation.Describe clearly one logical change and avoid lazy messages (optional)
A new plan node,
LocalSimpleAgg
is included, since the semantics ofLocalSimpleAgg
differs fromGlobalSimpleAgg
.It is used for distributed input, does not persist state. Additionally output is append-only, compared to
GlobalSimpleAgg
which has stateful output.Describe any limitations of the current code (optional)
Currently does not support
non-append-only min/max
.In the future perhaps
LocalSimpleAgg
should be renamed toLocalAppendOnlySimpleAgg
. It is an optimization on top of the more general statefulLocalSimpleAgg
.We will also need to replace the single value
LocalSimpleAgg
with the stateful one, since it is more general.Add the 'user-facing changes' label if your PR contains changes that are visible to users (optional)
NIL
Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)
#2997