-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(streaming): use iterator directly instead of materializing all kv pairs from iterator #1998
Conversation
…ll kv pairs from iterator
Codecov Report
@@ Coverage Diff @@
## main #1998 +/- ##
==========================================
+ Coverage 70.94% 71.01% +0.07%
==========================================
Files 625 625
Lines 80597 80538 -59
==========================================
+ Hits 57177 57194 +17
+ Misses 23420 23344 -76
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
@BugenZhao @st1page @wcy-fdu |
Can top_n use |
pub struct PkAndRowIterator<'a, I: StateStoreIter<Item = (Bytes, Bytes)>, const TOP_N_TYPE: usize> { | ||
iter: I, | ||
ordered_row_deserializer: &'a mut OrderedRowDeserializer, | ||
cell_based_row_deserializer: &'a mut CellBasedRowDeserializer, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this can be part of CellBasedTable
and replace the CellBasedTableRowIter
? and after that, we can implement the iter of StateTable
composited with MemTable
(flush_buffer) and CellBasedTable
. and all stream operators can use the StateTable
to store their states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that only TopNExecutor
requires that the next
of the iterator returns both key and value, and the key is an OrderedRow
. This is not required in other stateful executors, i.e. HashAgg
and HashJoin
. It is also not required in MV.
This is because only TopNExecutor
needs an ordered key to fill in its cache, and this filling process does not only rely on the key of the single(insert, row)
whose state is being updated.
Use a flag to denote whether it should deserialize and return the key? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering we will do store a column either in key or value
, the caller should just make the decision whether it is necessary to return key
. This iterator should figure out by itself whether it needs to deserialize key
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we merge first and take some time to decide how to merge this with CellBasedTable
and replace the CellBasedTableRowIter
? Some other fixes are pending in TopN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What's changed and what's your intention?
In #1969, we replace
scan
inTopNExecutor
withiter
. However, we still first materialize all the kv pairs fromiter
.This PR adds a
PkAndRowIterator
so that two TopN's states only need to deal withOrderedRow
andRow
instead of rawBytes
.Also eliminate an extra comparison between
key_from_storage
andkey_from_buffer
due toencounter_same_key
check.Checklist
Refer to a related PR or issue link (optional)