-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(streaming): apply StateTable to hash join #3085
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3085 +/- ##
=======================================
Coverage 73.22% 73.22%
=======================================
Files 747 747
Lines 101907 101907
=======================================
Hits 74619 74619
Misses 27288 27288
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
…sh_join_with_state_table
…sh_join_with_state_table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this PR consists of mutiple changes:
Update
on mem_table.- Integrate StateTable
etc
Need to fix #2794 or test fails. To do that we must have update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. The hash agg is more natural to integrate with StateTable so the logic looks clean.
But indeed this PR can be divided (?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
/// Get an owned `Row` by the given `indices` from current row. | ||
/// | ||
/// Use `datum_refs_by_indices` if possible instead to avoid allocating owned datums. | ||
pub fn by_indices(&self, indices: &[usize]) -> Row { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the topic: We should really review the current implementation of serialize
functions. In most cases there's no need to allocate a Row
.
/// When evicted, `cached` does not hold any entries. | ||
/// If a `JoinEntryState` exists for a join key, the all records under this | ||
/// join key will be presented in the cache. | ||
pub struct JoinEntryState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this struct is definitely a BTreeMap
now. 🤣 How about implement Deref
for it?
let old_row = join_row.clone().into_row(); | ||
join_row.inc_degree(); | ||
let new_row = join_row.clone().into_row(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be too costly. 😢 Let's optimize it later.
Yes! But I might keep changing the API anyway |
…sh_join_with_state_table
What's changed and what's your intention?
Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)
#2794
#2795