-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(storage): remove vnode pruning from state store and keyspace #3208
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3208 +/- ##
==========================================
+ Coverage 73.50% 73.62% +0.12%
==========================================
Files 746 744 -2
Lines 101925 101858 -67
==========================================
+ Hits 74916 74996 +80
+ Misses 27009 26862 -147
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as PR title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'd like to hold this PR for a little bit. Looks like a big decision indeed 🤣 |
I'd like to know the concern 🤔 |
The changes look huge but it's primarily removing an argument from functions. No real logic touched. <- That's why I clicked approve button 😁 |
Thought for a moment, I have no clear idea on this change. Also thought about some edge use cases (like simple agg, MV and index), and all of them can still work without passing vnode to the storage layer to do any filtering. I think executors can still fit in the current design, with only vnode in compute layer and without vnode in storage layer. |
Then how does batch RowSeqScan partition? 😇 |
BTW, does #2887 also become unnecessary (executors doesn't need |
With the vnode prepending to the key, it seems partition scan can be simply implemented with range scan with the vnode prefix? |
So each scan executor's vnodes will be a consecutive range (instead of consistent hash)? |
As discussed in Patrick's proposal, the new row format will be like: table_id, vnode_id, pk_columns... Thus, partition by consistent-hashing can be represented as a set of prefix range of keys. |
According to the new design, we won't rely on vnode bitmap to do read pruning or compaction, so exactly, they are not required by executors. |
What's changed and what's your intention?
Changes
According to our offline discussion, we'll encode vnode into storage key, instead of passing it directly into keyspace and state store. Therefore, relevant interfaces should be modified. This PR do the following refactors:
struct VNodeBitmap
Limitations
VNodeBitmap
in SST meta still exists. This should be discarded as well to make our compaction no longer consistent-hash-aware.Checklist
./risedev check
(or alias,./risedev c
)