Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(executor): implement dedup pk decoding for BatchQueryExecutor #3060

Merged
merged 13 commits into from
Jun 9, 2022

Conversation

kwannoel
Copy link
Contributor

@kwannoel kwannoel commented Jun 8, 2022

What's changed and what's your intention?

  • Summarize your change (mandatory)
    • Continued from perf(executor): decode row datums from pk #2957.
    • Adds dedup pk decoding to BatchQueryExecutor.
    • Changes pk_descs to pass by reference to DedupPkCellBasedTableIter. This avoids a clone of pk_descs each time we call the execute_inner method of BatchQueryExecutor.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

related #588

@kwannoel kwannoel changed the title perf(executor): implement dedup pk decoding for batch query perf(executor): implement dedup pk decoding for BatchQueryExecutor Jun 8, 2022
@codecov
Copy link

codecov bot commented Jun 8, 2022

Codecov Report

Merging #3060 (3f77a51) into main (c8c4acf) will increase coverage by 0.00%.
The diff coverage is 61.36%.

@@           Coverage Diff           @@
##             main    #3060   +/-   ##
=======================================
  Coverage   73.54%   73.54%           
=======================================
  Files         735      735           
  Lines      100309   100340   +31     
=======================================
+ Hits        73769    73793   +24     
- Misses      26540    26547    +7     
Flag Coverage Δ
rust 73.54% <61.36%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/batch/src/executor/row_seq_scan.rs 23.27% <0.00%> (ø)
...ntend/src/optimizer/plan_node/stream_index_scan.rs 20.00% <0.00%> (-0.94%) ⬇️
src/stream/src/from_proto/batch_query.rs 32.05% <0.00%> (-2.20%) ⬇️
...frontend/src/optimizer/plan_node/batch_seq_scan.rs 95.76% <100.00%> (ø)
...ntend/src/optimizer/plan_node/stream_table_scan.rs 96.37% <100.00%> (+0.16%) ⬆️
src/storage/src/table/cell_based_table.rs 72.61% <100.00%> (ø)
src/storage/src/table/test_relational_table.rs 98.21% <100.00%> (ø)
src/stream/src/executor/batch_query.rs 91.42% <100.00%> (+1.31%) ⬆️
src/meta/src/hummock/mock_hummock_meta_client.rs 43.82% <0.00%> (-1.13%) ⬇️
src/storage/src/hummock/local_version_manager.rs 84.12% <0.00%> (-0.16%) ⬇️
... and 1 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@kwannoel kwannoel requested a review from lmatz June 8, 2022 12:11
@kwannoel kwannoel marked this pull request as ready for review June 8, 2022 12:12
@kwannoel kwannoel requested a review from skyzh June 8, 2022 12:12
@kwannoel kwannoel force-pushed the kwannoel/dedup-pk-dec branch 3 times, most recently from 7ebf9c2 to af5505a Compare June 9, 2022 03:04
@kwannoel
Copy link
Contributor Author

kwannoel commented Jun 9, 2022

rebased

Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except the naming. By the way, I found that every time we need to access a table, our executors only get parts of its information. e.g., in this case, BatchExecutor only get column desc without ordered columns. We should have a single struct to describe a storage table in the future, maybe after the state table refactor.

related: #2212
cc @BowenXiao1999 @wcy-fdu

schema_ref_id: Default::default(),
table_desc: Some(CellBasedTableDesc {
table_id: self.logical.table_desc().table_id.into(),
pk: self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name should be order_key instead? pk is different from order key in the streaming engine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ineed. Maybe Meta/FE should embed this info into Stream Plan, so executor can just read it and build state table.

Copy link
Contributor Author

@kwannoel kwannoel Jun 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: 2206ad5 Sorry realized it should be order_key rather than order_keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skyzh
Copy link
Contributor

skyzh commented Jun 9, 2022

StateTable will need:

  • column desc (for encoding value and generate cell-based encoding)
  • ordered column (for encoding key)
  • distribution key (for calculating vnode)
    To read a table. Maybe we can make this a new StateTableDesc in proto, and pass it everywhere.

@kwannoel kwannoel merged commit 46d8875 into main Jun 9, 2022
@kwannoel kwannoel deleted the kwannoel/dedup-pk-dec branch June 9, 2022 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants