-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(batch): Use futures-async-stream to implement HashJoin executor #2119
refactor(batch): Use futures-async-stream to implement HashJoin executor #2119
Conversation
Signed-off-by: d2lark <lichengamoy@gmail.com>
There are still some unresolved issues, such as Calling child.execute() twice at the same time shows the problem of ownership
|
@@ -0,0 +1,212 @@ | |||
// Copyright 2022 Singularity Data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also remove chunked_data
in executor/join
mod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this once but had some problems, if I delete executor/join/chunked_data
, many functions and structures under executor2/join/chunked_data
have to be changed to pub, so I keep executor/join/chunked_data
for now and delete it after all of them have been ported?
#[try_stream(boxed, ok = DataChunk, error = RwError)] | ||
async fn do_execute(mut self: Box<Self>) { | ||
let mut left_child_stream = self.left_child.execute(); | ||
let mut right_child_stream = self.right_child.execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The execution of right child should be postponed , we should execute left child to build table first.
/// Build side | ||
right_child: BoxedExecutor, | ||
right_child: BoxedExecutor2, | ||
state: HashJoinState<K>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, after using stream api, we no longer need this state machine. We just need to keep BuildTable
and ProbeTable<K>
as local varialbe in do_execute
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my simple understanding, the previous control flow open next close
caused a lot of variables to be bound to the executor and the whole logic was very complicated. So this morning I made a version of the risinglight implementation and simplified it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you contribution, I'll take a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, after using stream api, we no longer need this state machine. We just need to keep BuildTable and ProbeTable as local varialbe in do_execute method.
I see what you mean, I'll optimize it
Yes, |
yes, because of that we usually put the |
Just set them as |
Thanks for the tip~ |
Signed-off-by: d2lark <lichengamoy@gmail.com>
Signed-off-by: d2lark <lichengamoy@gmail.com>
Signed-off-by: d2lark <lichengamoy@gmail.com>
Signed-off-by: d2lark <lichengamoy@gmail.com>
rustdoc check failed @D2Lark cargo doc --document-private-items --no-deps https://github.com/singularity-data/risingwave/runs/6205527285?check_suite_focus=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report
@@ Coverage Diff @@
## main #2119 +/- ##
==========================================
- Coverage 70.92% 70.90% -0.02%
==========================================
Files 650 652 +2
Lines 82725 82790 +65
==========================================
+ Hits 58671 58705 +34
- Misses 24054 24085 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
LGTM |
The check passed, merged. |
while let Some(chunk) = self.right_child.next().await? { | ||
impl<K: HashKey + Send + Sync> HashJoinExecutor2<K> { | ||
#[try_stream(boxed, ok = DataChunk, error = RwError)] | ||
async fn do_execute(mut self: Box<Self>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an elegant implementation and I'd like to refactor it
What's changed and what's your intention?
Implement HashJoin using Executor2 trait.
Please explain IN DETAIL what the changes are in this PR and why they are needed:
Checklist
Refer to a related PR or issue link (optional)
close #1947