Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): introduce dedup_pk_row encoding for StateTable #3214

Merged
merged 67 commits into from
Jun 28, 2022

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Jun 14, 2022

What's changed and what's your intention?

Alternate design for dedup_pk_row encoding to #3143.

See #3143 (comment) for context.

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • Summarize your change (mandatory)
    • Parameterizes relational layer on cell serializer used.
    • Allows us to use dedup pk serialization.
    • Maintains existing interfaces of StateTable, CellTable.
  • How does this PR work? Need a brief introduction for the changed logic (optional)
    • Introduces a new trait, CellSerializer which relational layer is parameterized on.
  • Describe any limitations of the current code (optional)
    • For DedupPkStateTable, DedupPkCellTable, they are only parameterized on serializer. This means serializing and deserializing differs. (For StateTable, CellTable, serializing and deserializing still matches, behaviour is unchanged).
    • It should be parameterized on deserializer too for consistency, but this can be done in separate PR, since these interfaces are not yet wired up to any executors.
    • This also means dedup_pk_iter in cell_based_table can be removed eventually.
  • Next steps will be
    1. Parameterize celltable, state table on deserializer.
    2. Deprecate dedup_pk_iter and related structs / traits, replace with parameterized cell table iter.
    3. Update lookup executor to use dedup pk encoding.
    4. Link e2e with mview. (This can't be done until all store reads use dedup pk encoding).

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

Previous implementaiton: #3143

Issue: #588

@kwannoel kwannoel changed the title Kwannoel/dedup pk enc 2 feat(storage): introduce dedup_pk_row encoding for StateTable Jun 14, 2022
@codecov
Copy link

codecov bot commented Jun 14, 2022

Codecov Report

Merging #3214 (806d832) into main (6e793fd) will increase coverage by 0.04%.
The diff coverage is 95.14%.

@@            Coverage Diff             @@
##             main    #3214      +/-   ##
==========================================
+ Coverage   74.41%   74.46%   +0.04%     
==========================================
  Files         768      769       +1     
  Lines      107793   108054     +261     
==========================================
+ Hits        80218    80464     +246     
- Misses      27575    27590      +15     
Flag Coverage Δ
rust 74.46% <95.14%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/storage/src/lib.rs 100.00% <ø> (ø)
.../storage/src/dedup_pk_cell_based_row_serializer.rs 92.59% <92.59%> (ø)
src/storage/src/table/test_relational_table.rs 98.12% <98.82%> (+0.03%) ⬆️
src/storage/src/cell_based_row_serializer.rs 100.00% <100.00%> (ø)
src/storage/src/table/cell_based_table.rs 71.80% <100.00%> (+0.12%) ⬆️
src/storage/src/table/state_table.rs 95.86% <100.00%> (ø)
src/meta/src/hummock/mock_hummock_meta_client.rs 40.56% <0.00%> (-0.95%) ⬇️
src/storage/src/hummock/iterator/merge_inner.rs 89.37% <0.00%> (-0.63%) ⬇️
src/connector/src/filesystem/file_common.rs 80.35% <0.00%> (-0.45%) ⬇️
src/storage/src/hummock/local_version_manager.rs 79.15% <0.00%> (-0.13%) ⬇️
... and 2 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has totally checked 852 files.

Valid Invalid Ignored Fixed
850 1 1 0
Click to see the invalid file list
  • src/storage/src/table/dedup_pk_state_table.rs

src/storage/src/table/dedup_pk_state_table.rs Outdated Show resolved Hide resolved
@kwannoel
Copy link
Contributor Author

kwannoel commented Jun 14, 2022

Currently breaking with ./e2e_test/streaming/index.slt.

Simplified, the query is:

create table iii_t1 (v1 int, v2 int);

create table iii_t2 (v3 int, v4 int);

insert into iii_t1 values (2, 0), (3, 0), (0, 0), (1, 0);

insert into iii_t2 values (2, 5), (3, 4), (0, 3), (1, 2);

flush;

create index iii_index_1 on iii_t1(v1);

create index iii_index_2 on iii_t2(v3);

create materialized view iii_mv2 as select * from iii_t1, iii_t2 where iii_t1.v1 = iii_t2.v3;

select v1, v2, v3, v4 from iii_mv2;

As can be seen by the plan, this uses DeltaJoin:

dev=> explain  create materialized view iii_mv2 as select * from iii_t1, iii_t2 where iii_t1.v1 = iii_t2.v3;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 StreamMaterialize { columns: [v1, v2, v3, v4, _row_id(hidden), _row_id#1(hidden)], pk_columns: [_row_id, _row_id#1] }
   StreamExchange { dist: HashShard([4, 5]) }
     StreamProject { exprs: [$0, $1, $3, $4, $2, $5] }
       StreamDeltaJoin { type: Inner, predicate: $0 = $3 }
         StreamIndexScan { index: iii_index_1, columns: [v1, v2, _row_id], pk_indices: [2] }
         StreamIndexScan { index: iii_index_2, columns: [v3, v4, _row_id], pk_indices: [2] }

Under the hood DeltaJoin is converted to a LookupExecutor which reads from storage, without dedup pk encoding. As such it is unable to fetch missing pk datums:

e2e_test/streaming/index.slt                                 .. [FAILED]

failed to run `e2e_test/streaming/index.slt`

Caused by:
    test error at e2e_test/streaming/index.slt:52: query result mismatch:
    SQL: select v1, v2, v3, v4 from iii_mv2;
    Expected:
    0 0 0 3
    1 0 1 2
    2 0 2 5
    3 0 3 4
    Actual:
    0 0 NULL 3
    1 0 NULL 2
    2 0 NULL 5
    3 0 NULL 4

Still thinking of how to deal with this.

@kwannoel
Copy link
Contributor Author

kwannoel commented Jun 14, 2022

All readers will need to use dedup pk decoding, as long as they read from storage which has dedup pk encoding.

We need to extend lookup to use that too.

@kwannoel
Copy link
Contributor Author

kwannoel commented Jun 14, 2022

In this or a future PR, we need to make it easy to toggle dedup pk encoding.
The encoding should be determined during plan phase. The executors built from the plan should be configured to use this encoding.

@kwannoel
Copy link
Contributor Author

Did further digging into how lookup executor works.

When scanning from storage we first get a key_prefix:
https://github.com/singularity-data/risingwave/blob/4f0ef58145b0ac72fc71dbbf463ccc70c141c292/src/stream/src/executor/lookup/impl_.rs#L402-L409

Which gets masked from the pk:
https://github.com/singularity-data/risingwave/blob/4f0ef58145b0ac72fc71dbbf463ccc70c141c292/src/storage/src/keyspace.rs#L125

This also means that the pk being used for deduplication is in-complete. In this case it only contains row_id of the underlying table the index referred to.

Need to think of how to get this dedupped datum.

@kwannoel
Copy link
Contributor Author

kwannoel commented Jun 15, 2022

Thinking whether to implement this e2e just yet, because of changes to various executors. Will elaborate more soon.

@kwannoel
Copy link
Contributor Author

kwannoel commented Jun 16, 2022

Partially fixed e2e tests (e2e_test/streaming/index.slt works now). Am sufficiently confident of current approach for encoding.

Checkpointing work here for now. I will be breaking this PR up into two chunks:

  1. With just dedup pk encoding for state table, not linked to mview yet.
  2. decoding for lookup in a separate PR + link to mview. To be continued later on.

@BugenZhao BugenZhao self-requested a review June 17, 2022 03:01
@kwannoel kwannoel marked this pull request as ready for review June 17, 2022 05:32
@kwannoel kwannoel requested review from wcy-fdu and skyzh June 17, 2022 05:36
src/storage/src/dedup_pk_cell_based_row_serializer.rs Outdated Show resolved Hide resolved
src/storage/src/table/dedup_pk_state_table.rs Outdated Show resolved Hide resolved
src/storage/src/table/test_relational_table.rs Outdated Show resolved Hide resolved
#[derive(Clone)]
pub struct StateTable<S: StateStore> {
pub struct StateTableExtended<S: StateStore, SER: CellSerializer> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I‘m not sure is this by design?
Maybe I missed some of the discussion, is there any RFC about this?

@kwannoel kwannoel requested a review from lmatz June 23, 2022 05:20
@kwannoel
Copy link
Contributor Author

Currently fixing merge conflicts, will update when ready for review again

Ready for review again~

@BugenZhao
Copy link
Member

I'll help to resolve the conflicts brought by #3407.

Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
Signed-off-by: Bugen Zhao <i@bugenzhao.com>
@BugenZhao
Copy link
Member

I'll help to resolve the conflicts brought by #3407.

Done. Could you please review the resolving results? @kwannoel

Besides, I've also made these changes:

  • Add a factory method of create to the trait of serializer, so we can simplify the table construction methods.
  • Rename Extended to Base since it sounds more reasonable.

@kwannoel
Copy link
Contributor Author

Besides, I've also made these changes:

Add a factory method of create to the trait of serializer, so we can simplify the table construction methods.
Rename Extended to Base since it sounds more reasonable.

Both of these changes LGTM, thanks for the help!

@kwannoel
Copy link
Contributor Author

Done. Could you please review the resolving results? @kwannoel

Reviewed and LGTM too 👍

Copy link
Contributor

@BowenXiao1999 BowenXiao1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

}
}

impl CellSerializer for DedupPkCellBasedRowSerializer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name looks a bit confusing for me.
RowSerializer seems serialize a row (vec), while CellSerializer for a cell (Datum).

src/storage/src/dedup_pk_cell_based_row_serializer.rs Outdated Show resolved Hide resolved
/// 1. Row indices of datums not in pk,
/// 2. or datums which have to be stored regardless
/// (e.g. if memcomparable not equal to value encoding)
dedup_datum_indices: HashSet<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether consider a situation like:

all cells of a state table is pk (table_columns.len = pk.len). just like #3474 . In this case, we can not dedup all pk, but at least one pk is not dedup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can SENTINEL_CELL_ID help with this case? Would require changes to the logic of deserialization though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some tests to verify this

Copy link
Contributor Author

@kwannoel kwannoel Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would require changes to the logic of deserialization though.

Regarding deserialization, we can also rely on cell_based_row_deserializer to do it, so I don't think much changes are needed:

https://github.com/singularity-data/risingwave/blob/cc4d1d196313f9bebfa2fd44e2cd3436d70652b8/src/storage/src/dedup_pk_cell_based_row_serializer.rs#L222-L243

Just need to replace the deduplicated datums with datums from pk. This logic can be in dedup_pk_cell_based_row_deserializer.

@kwannoel kwannoel enabled auto-merge (squash) June 28, 2022 07:33
@kwannoel kwannoel merged commit 8203ac7 into main Jun 28, 2022
@kwannoel kwannoel deleted the kwannoel/dedup-pk-enc-2 branch June 28, 2022 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants