Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): use ReadOptions and WriteOptions in StateStore. #3216

Merged
merged 8 commits into from
Jun 24, 2022

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Jun 14, 2022

What's changed and what's your intention?

Use ReadOptions and WriteOptions to wrap parameters for StateStore.

  • Store state table id in KeySpace, in order to use it in state store's read/write.
  • Major code changes is quite small in store.rs, but lots of call sites are affected.

Checklist

  • I have written necessary docs and comments
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

#2065

@zwang28 zwang28 requested review from BugenZhao and hzxa21 June 14, 2022 10:17
@zwang28 zwang28 marked this pull request as ready for review June 14, 2022 10:25
@codecov
Copy link

codecov bot commented Jun 14, 2022

Codecov Report

Merging #3216 (4281d49) into main (93e02e7) will increase coverage by 0.00%.
The diff coverage is 74.00%.

@@           Coverage Diff            @@
##             main    #3216    +/-   ##
========================================
  Coverage   73.74%   73.74%            
========================================
  Files         768      768            
  Lines      105736   106454   +718     
========================================
+ Hits        77972    78507   +535     
- Misses      27764    27947   +183     
Flag Coverage Δ
rust 73.74% <74.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/bench/ss_bench/operations/get.rs 0.00% <0.00%> (ø)
...rc/bench/ss_bench/operations/prefix_scan_random.rs 0.00% <0.00%> (ø)
src/bench/ss_bench/operations/write_batch.rs 0.00% <0.00%> (ø)
src/ctl/src/cmd_impl/hummock/list_kv.rs 0.00% <0.00%> (ø)
src/storage/src/monitor/monitored_store.rs 1.94% <0.00%> (-0.07%) ⬇️
src/storage/src/panic_store.rs 0.00% <0.00%> (ø)
src/storage/src/store.rs 54.54% <50.00%> (-12.13%) ⬇️
src/storage/src/write_batch.rs 68.62% <65.00%> (-0.27%) ⬇️
src/storage/src/hummock/state_store_tests.rs 76.25% <73.61%> (-3.00%) ⬇️
src/storage/src/keyspace.rs 82.22% <75.00%> (-3.65%) ⬇️
... and 14 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@BugenZhao
Copy link
Member

Why there's a TableId inside the Keyspace interface?

@zwang28
Copy link
Contributor Author

zwang28 commented Jun 14, 2022

Why there's a TableId inside the Keyspace interface?

It is cell table id and will be used to decide the compaction group during state store's read/write.

@BugenZhao
Copy link
Member

Why there's a TableId inside the Keyspace interface?

It is cell table id and will be used to decide the compaction group during state store's read/write.

So is this PR a continuation of #1812? It still seems to expose too many implementation details. 😢

@BugenZhao BugenZhao requested review from skyzh and wenym1 June 14, 2022 10:57
@wenym1
Copy link
Contributor

wenym1 commented Jun 16, 2022

Seems that we have a lot of merge conflict right now.

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 16, 2022

Why there's a TableId inside the Keyspace interface?

It is cell table id and will be used to decide the compaction group during state store's read/write.

So is this PR a continuation of #1812? It still seems to expose too many implementation details. 😢

StateTable has table_id and StateStore needs table_id for reads/writes. The issue here is how to pass StateTable's table_id to StateStore. How about putting table_id in WriteBatch instead of Keyspace?

@zwang28
Copy link
Contributor Author

zwang28 commented Jun 17, 2022

How about putting table_id in WriteBatch instead of Keyspace

But state store's read interfaces still require table_id?

@zwang28 zwang28 force-pushed the wangzheng/refactor_state_store branch from b7f3808 to cc40b26 Compare June 17, 2022 07:20
@zwang28
Copy link
Contributor Author

zwang28 commented Jun 17, 2022

I resolved the conflict and left the test unfixed for now.
@hzxa21 WriteBatch now store table_id. But KeySpace still keep table_id to use in read path.
@BugenZhao KeySpace's interface is not modified. Would you explain which "implementation details" we don't want to expose?

@hzxa21
Copy link
Collaborator

hzxa21 commented Jun 17, 2022

How about putting table_id in WriteBatch instead of Keyspace

But state store's read interfaces still require table_id?

You are right. We either pass table_id to the Keyspace read interface or store it in Keyspace just like this PR. Considering this, I think storing it in Keyspace is okay. @BugenZhao What do you think?

@zwang28
Copy link
Contributor Author

zwang28 commented Jun 21, 2022

@skyzh Would you please take a look at this refactor?

let val_size = match store
.get(
&key,
ReadOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use the same ReadOptions with Hummock? (for prefetch)

Copy link
Contributor Author

@zwang28 zwang28 Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean reusing the existed hummock ReadOptions for state store interfaces?
I think it's a concept in state store layer(although other state store impl doesn't make use of it)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In the future we may need executor to tell us which it is a large scan request so that storage can decide whether we need to prefetch more data from object storage.

Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

// --- Flush states to the state store ---
// Some state will have the correct output only after their internal states have been
// fully flushed.
let (write_batch, dirty_cnt) = {
let mut write_batch = store.start_write_batch();
let mut write_batch = store.start_write_batch(WriteOptions { epoch, table_id });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this is not correct. HashAgg uses a lot of tables, we should produce multiple write batches in hash agg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right both hash_agg.rs and global_simple_agg.rs have been refactored after this PR. Will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -202,6 +203,7 @@ impl<S: StateStore> SimpleAggExecutor<S> {
) -> StreamExecutorResult<Option<StreamChunk>> {
// The state store of each keyspace is the same so just need the first.
let store = keyspace[0].state_store();
let table_id = keyspace[0].table_id();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

# Conflicts:
#	src/ctl/src/cmd_impl/hummock/list_kv.rs
#	src/storage/src/hummock/state_store.rs
#	src/storage/src/store.rs
#	src/stream/src/executor/global_simple_agg.rs
#	src/stream/src/executor/hash_agg.rs
Copy link
Contributor

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @BugenZhao are you working on string agg now? Seems that this PR might affect how string_agg stores and reads things. As it is not covered by our e2e test, you might want to pay attention to this PR.

@zwang28 zwang28 merged commit 118540a into main Jun 24, 2022
@zwang28 zwang28 deleted the wangzheng/refactor_state_store branch June 24, 2022 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants