feat(storage): initially introduce vnode encoding in cell-based table #3407

BugenZhao · 2022-06-22T09:44:14Z

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR included multiple changes.

Introduce a generic AccessMode for cell-based table.
With READ_ONLY, the write interfaces will not be exposed and column pruning will be done by specifying a subset output_columns of all columns. With READ_WRITE, the instance should operate on all columns of this table, i.e., the output columns are a complete set. Under this assumption, we can ensure the correctness of some interfaces much more easily.
Encode/decode the vnode in cell_based_row_serializer/deserializer, all handled by CellBasedTable.
If a None for distribution keys is given, we'll fall back to the default vnode of 0x00 and encode/decode it anyway. So this reserve the order of real keys and is compatible with the old implementation when point getting or iterating. Besides, the vnode value of 0x00 can also be compressed by Hummock.
Remove const generic of ITER_TYPE.
There'll be two dimensions of iterator types in the future: whether to wait for the epoch, and whether to preserve the order among vnodes. Let's make them runtime parameters to avoid type exercises.

There remain several things to do as well. Check #3316 for the roadmap.

The merge iterator of multiple vnode ranges is not implemented yet to avoid the PR being huge. Thus, all of the vnodes fall back to 0x00 now to keep the correctness of iterator behavior. 🤣
Lookup of delta join still uses the Keyspace API to scan the arrangement. Some workarounds are introduced. May need to refactor before enabling real vnode computing.

Checklist

I have written necessary docs and comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

codecov · 2022-06-22T12:51:28Z

Codecov Report

Merging #3407 (2fb0da7) into main (8463a0e) will decrease coverage by 0.03%.
The diff coverage is 81.96%.

@@            Coverage Diff             @@
##             main    #3407      +/-   ##
==========================================
- Coverage   73.84%   73.81%   -0.04%     
==========================================
  Files         765      765              
  Lines      105273   105351      +78     
==========================================
+ Hits        77742    77766      +24     
- Misses      27531    27585      +54

Flag	Coverage Δ
rust	`73.81% <81.96%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/batch/src/executor/row_seq_scan.rs	`19.25% <ø> (ø)`
src/common/src/types/mod.rs	`67.75% <ø> (ø)`
src/storage/src/lib.rs	`100.00% <ø> (ø)`
src/storage/src/table/mem_table.rs	`78.57% <ø> (ø)`
src/stream/src/executor/mview/materialize.rs	`88.88% <ø> (ø)`
src/storage/src/table/cell_based_table.rs	`70.86% <76.00%> (-4.95%)`	⬇️
src/storage/src/cell_based_row_deserializer.rs	`92.57% <95.34%> (+0.85%)`	⬆️
src/storage/src/cell_based_row_serializer.rs	`100.00% <100.00%> (+34.48%)`	⬆️
src/storage/src/table/state_table.rs	`95.74% <100.00%> (+0.47%)`	⬆️
src/storage/src/table/test_relational_table.rs	`98.08% <100.00%> (ø)`
... and 11 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

wcy-fdu

LGTM!
May ask someone else to review the implementation details.

wcy-fdu · 2022-06-23T05:50:59Z

src/storage/src/table/cell_based_table.rs

 use crate::{Keyspace, StateStore, StateStoreIter};

+pub type AccessType = bool;
+pub const READ_ONLY: AccessType = false;
+pub const READ_WRITE: AccessType = true;


Suggested change

pub const READ_WRITE: AccessType = true;

pub const READ_AND_WRITE: AccessType = true;

IMO READ_WRITE is better 🤔

wcy-fdu · 2022-06-23T05:51:30Z

src/storage/src/table/cell_based_table.rs

    /// Get a [`StreamingIter`] with given `encoded_key_range`.
    pub(super) async fn streaming_iter_with_encoded_key_range<R, B>(
        &self,
        encoded_key_range: R,
        epoch: u64,
    ) -> StorageResult<StreamingIter<S>>
    where
-        R: RangeBounds<B> + Send,
+        R: RangeBounds<B> + Send + Clone,


Just curious: why add this clone?

We may construct multiple iterators if there're multiple vnodes, so we need to clone the range for all of them. We may let the caller to pass an Arc to avoid allocations.

wcy-fdu · 2022-06-23T05:51:36Z

src/storage/src/table/cell_based_table.rs

+        let iter = match iterators.len() {
+            0 => unreachable!(),
+            1 => iterators.into_iter().next().unwrap(),
+            _ => todo!("merge multiple vnode ranges"),


What's the relationship between iterators.len() and vonde nums? Is it one-on-one?

It depends on whether we need to preserve the order of primary keys. If so, we need to construct an iterator for every single vnode and merge them into a sorted one, or we can construct a single for a continuous range.

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BowenXiao1999

Rest LGTM. Good PR!

BowenXiao1999 · 2022-06-23T06:03:32Z

src/storage/src/cell_based_row_deserializer.rs

+        self.deserialize_inner::<true>(raw_key, cell)
+    }
+
+    // TODO: remove this once we refactored lookup in delta join with cell-based table


I still don't quite understand this workaround. I see in the code we already .extend(DEFAULT_VNODE) when calculate the prefix of scan key in lookup executor. Why still call deserialize_without_vnode instead of deserialize?

This is actually a pk prefix scan with Keyspace.append::scan instead of Keyspace::scan, so the vnode part and the pk prefix of the key passed into the deserializer are truncated.

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

src/storage/src/cell_based_row_deserializer.rs

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

src/storage/src/table/cell_based_table.rs

xx01cyx · 2022-06-23T07:57:08Z

src/storage/src/table/cell_based_table.rs

-                    } else {
-                        ValueMeta::default()
-                    };
+                    let vnode = self.compute_vnode_by_row(&row);


Should vnode be computed by cell based table or state table? If we are to pass vnode from upper layer and do filtering accordingly, then memtable should also have the info of vnode.

What do you mean by "do filtering accordingly"? 🤔 Currently the memtable associated with the cell-based table will also be partitioned into the same vnode, so there's no need to do filtering.

For example, executor only needs the data of a specific vnode. (Is that possible?)

The streaming executor should tell the cell-based table the vnodes it cares about when constructing a CellBasedTable instance. The vnode bitmap changes only if the system scales in/out, and the executors will be dropped and rebuilt according to our design.

The batch executors may need this, which is unrelated to the memtable, however. Will implement in the following PRs.

Co-authored-by: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com>

st1page

LGTM

BugenZhao added 4 commits June 22, 2022 13:42

read write split

90f3ee3

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

remove with value meta

846a7f5

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

ser/de vnode

a999dfe

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

introduce auto enum

1bfc499

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

github-actions bot added the type/feature label Jun 22, 2022

BugenZhao added 3 commits June 22, 2022 17:45

make risedev happy

19ff999

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

extract batch

0977364

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

always use none dist key

f9ef149

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao changed the title ~~feat(storage): initially introduce vnode encoding in cell-based table [WIP]~~ feat(storage): initially introduce vnode encoding in cell-based table Jun 22, 2022

BugenZhao marked this pull request as ready for review June 22, 2022 10:27

try fixing lookup

e9cbe75

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao added 3 commits June 22, 2022 21:08

simplify converting

8d27b07

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

Merge remote-tracking branch 'origin/main' into bz/vnode-in-key-part-3

9d3b20c

add more docs

6f4f4cf

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BugenZhao requested review from wcy-fdu, lmatz, skyzh, hzxa21, xx01cyx and st1page June 23, 2022 04:03

wcy-fdu reviewed Jun 23, 2022

View reviewed changes

remove iter type

b1cbacc

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

BowenXiao1999 reviewed Jun 23, 2022

View reviewed changes

trigger ci

5509767

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

xx01cyx reviewed Jun 23, 2022

View reviewed changes

src/storage/src/cell_based_row_deserializer.rs Outdated Show resolved Hide resolved

BugenZhao added 2 commits June 23, 2022 15:29

impl concat

d80f422

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

make yuanxin happy

09881bb

Signed-off-by: Bugen Zhao <i@bugenzhao.com>

xx01cyx reviewed Jun 23, 2022

View reviewed changes

Apply suggestions from code review

873553d

Co-authored-by: Yuanxin Cao <60498509+xx01cyx@users.noreply.github.com>

wcy-fdu approved these changes Jun 23, 2022

View reviewed changes

st1page approved these changes Jun 23, 2022

View reviewed changes

BugenZhao added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jun 23, 2022

Merge branch 'main' into bz/vnode-in-key-part-3

2fb0da7

mergify bot merged commit be644d1 into main Jun 23, 2022

mergify bot deleted the bz/vnode-in-key-part-3 branch June 23, 2022 09:44

BugenZhao mentioned this pull request Jun 24, 2022

feat(storage): introduce dedup_pk_row encoding for StateTable #3214

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): initially introduce vnode encoding in cell-based table #3407

feat(storage): initially introduce vnode encoding in cell-based table #3407

BugenZhao commented Jun 22, 2022 •

edited

codecov bot commented Jun 22, 2022 •

edited

wcy-fdu left a comment

wcy-fdu Jun 23, 2022

xx01cyx Jun 23, 2022

wcy-fdu Jun 23, 2022

BugenZhao Jun 23, 2022

wcy-fdu Jun 23, 2022

BugenZhao Jun 23, 2022

BowenXiao1999 left a comment

BowenXiao1999 Jun 23, 2022

BugenZhao Jun 23, 2022

xx01cyx Jun 23, 2022

BugenZhao Jun 23, 2022

xx01cyx Jun 23, 2022 •

edited

BugenZhao Jun 23, 2022

st1page left a comment

	pub const READ_WRITE: AccessType = true;
	pub const READ_AND_WRITE: AccessType = true;

feat(storage): initially introduce vnode encoding in cell-based table #3407

feat(storage): initially introduce vnode encoding in cell-based table #3407

Conversation

BugenZhao commented Jun 22, 2022 • edited

What's changed and what's your intention?

Checklist

Refer to a related PR or issue link (optional)

codecov bot commented Jun 22, 2022 • edited

Codecov Report

wcy-fdu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BowenXiao1999 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xx01cyx Jun 23, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

st1page left a comment

Choose a reason for hiding this comment

BugenZhao commented Jun 22, 2022 •

edited

codecov bot commented Jun 22, 2022 •

edited

xx01cyx Jun 23, 2022 •

edited