Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): iterator support min_epoch check #3542

Merged
merged 6 commits into from
Jun 30, 2022

Conversation

Li0k
Copy link
Contributor

@Li0k Li0k commented Jun 29, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

Some change about TTL as title

Please explain IN DETAIL what the changes are in this PR and why they are needed:

  • keyspace adapt table properties with CatalogTable
  • UserIterator add check through min_epoch
  • ReadOptions add Option and support to calculate min_epoch for iterator

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

@Li0k Li0k force-pushed the li0k/feat_keyspace_table_properties branch from 6891217 to 68704b9 Compare June 29, 2022 04:16
@codecov
Copy link

codecov bot commented Jun 29, 2022

Codecov Report

Merging #3542 (e1e27e5) into main (edab3f5) will increase coverage by 0.02%.
The diff coverage is n/a.

❗ Current head e1e27e5 differs from pull request most recent head 1516a71. Consider uploading reports for the commit 1516a71 to get more accurate results

@@            Coverage Diff             @@
##             main    #3542      +/-   ##
==========================================
+ Coverage   74.39%   74.41%   +0.02%     
==========================================
  Files         771      770       -1     
  Lines      108714   108659      -55     
==========================================
- Hits        80875    80858      -17     
+ Misses      27839    27801      -38     
Flag Coverage Δ
rust 74.41% <0.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/stream/src/executor/aggregation/row_count.rs 78.72% <0.00%> (-1.72%) ⬇️
src/stream/src/executor/top_n_executor.rs 51.51% <0.00%> (-1.61%) ⬇️
src/expr/src/expr/template.rs 68.04% <0.00%> (-0.65%) ⬇️
...ontend/src/optimizer/plan_node/stream_hash_join.rs 97.79% <0.00%> (-0.57%) ⬇️
src/common/src/array/column_proto_readers.rs 77.96% <0.00%> (-0.37%) ⬇️
src/common/src/array/mod.rs 72.17% <0.00%> (-0.09%) ⬇️
src/sqlparser/src/ast/mod.rs 89.32% <0.00%> (-0.05%) ⬇️
src/batch/src/executor/join/hash_join_state.rs 88.91% <0.00%> (-0.03%) ⬇️
src/batch/src/executor/join/hash_join.rs 84.78% <0.00%> (-0.03%) ⬇️
src/common/src/util/chunk_coalesce.rs 96.66% <0.00%> (-0.02%) ⬇️
... and 19 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@Li0k Li0k force-pushed the li0k/feat_keyspace_table_properties branch from 68704b9 to 8eb6fcb Compare June 29, 2022 07:39
@Li0k Li0k marked this pull request as ready for review June 29, 2022 07:40
@Li0k Li0k changed the title feat(storage): keyspace adapt table properties and iterator support min_epoch check feat(storage): iterator support min_epoch check Jun 29, 2022
@Li0k Li0k force-pushed the li0k/feat_keyspace_table_properties branch from 8eb6fcb to 565eb54 Compare June 29, 2022 09:57
@BowenXiao1999
Copy link
Contributor

UserIterator add check through min_epoch

How is it different from previous (with/with out min_epoch)?

Copy link
Contributor

@BowenXiao1999 BowenXiao1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like easy to understand. LGTM

@@ -53,6 +53,8 @@ pub struct BackwardUserIterator {
/// Only reads values if `epoch <= self.read_epoch`.
read_epoch: Epoch,

min_epoch: Epoch,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments for the usage of this field. Should explicitly mentioned that this is only be used for ttl.

impl From<&TableOption> for risingwave_pb::hummock::TableOption {
fn from(table_option: &TableOption) -> Self {
Self {
ttl: table_option.ttl.unwrap_or(0),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> unwrap_or(TABLE_OPTION_DUMMY_TTL)

We use 0 as the special value. Do we need to disallow user to set ttl to 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its a good idea to disallow user to set ttl to 0 in frontend (but now frontend had not distinguish with_option value type) even if the logic is masked in the storage layer.
conculution: if the ttl be set to 0 , we can cover it in the storage layer

impl ReadOptions {
pub fn min_epoch(&self) -> u64 {
match self.ttl {
Some(ttl_u32) => self.epoch - ttl_u32 as u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the upper 48bit of the epoch represents physical time so you cannot simpy subtract ttl from epoch to construct min_epoch. See epoch.rs for more details.

Copy link
Contributor Author

@Li0k Li0k Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix above

@Li0k
Copy link
Contributor Author

Li0k commented Jun 29, 2022

UserIterator add check through min_epoch

How is it different from previous (with/with out min_epoch)?

At first min_epoch means that user could not access keys which under min_epoch. It use for consistent for ttl. Without min_epoch, user may access the key which under compaction, lead to Inconsistency between two reads

@BowenXiao1999
Copy link
Contributor

BowenXiao1999 commented Jun 29, 2022

UserIterator add check through min_epoch

How is it different from previous (with/with out min_epoch)?

At first min_epoch means that user could not access keys which under min_epoch. It use for consistent for ttl. Without min_epoch, user may access the key which under compaction, lead to Inconsistency between two reads

But I remembered that iter will always need a epoch as parameter? User only care about data in specific epoch

@Li0k
Copy link
Contributor Author

Li0k commented Jun 29, 2022

UserIterator add check through min_epoch

How is it different from previous (with/with out min_epoch)?

At first min_epoch means that user could not access keys which under min_epoch. It use for consistent for ttl. Without min_epoch, user may access the key which under compaction, lead to Inconsistency between two reads

But I remembered that iter will always need a epoch as parameter? User only care about data in specific epoch

yes, user only care about the read_epoch. (read_epoch is a upper_bound, min_epoch is a lower_bound when use ttl). an actual scene, user hold a window which width 30min, so that the ttl can be set to 40min. in this case , user never access the key which epoch under min_epoch (read_epoch - ttl), even though the key is the only one in hummock

@Li0k Li0k force-pushed the li0k/feat_keyspace_table_properties branch 2 times, most recently from d8a84d6 to 2ca781e Compare June 30, 2022 02:50
@@ -70,6 +70,16 @@ impl Epoch {
pub fn as_system_time(&self) -> SystemTime {
*UNIX_SINGULARITY_DATE_EPOCH + Duration::from_millis(self.physical_time())
}

pub fn min_epoch_with_interval(&self, interval: u64) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it min_epoch_with_interval is a bit confusing since the Epoch struct should be unaware of the usage of epoch. I suggest rename it to something like subtract_ms(&self, relative_time_ms: u64) and add some docs explaining the usage.

pub fn min_epoch(&self) -> u64 {
let epoch = Epoch(self.epoch);
match self.ttl {
Some(ttl_u32) => epoch.min_epoch_with_interval(ttl_u32 as u64).0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ttl_u32 is measure in second? we should convert it to ms to do the epoch subtraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unify the unit of ttl to second

@Li0k Li0k force-pushed the li0k/feat_keyspace_table_properties branch 2 times, most recently from 1516a71 to ee00898 Compare June 30, 2022 11:53
Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@@ -514,6 +523,7 @@ mod tests {
.map(|table_info| table_info.id)
.collect::<Vec<_>>();

// println!("table_ids_from_version {:?}", table_ids_from_version);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove commented line

fix(storage): unify the unit of ttl to second

fix(storage): doc comment

fix(storage): use TableOption parse to handle ttl which zero
@Li0k Li0k force-pushed the li0k/feat_keyspace_table_properties branch from ee00898 to 861773d Compare June 30, 2022 13:52
@Li0k Li0k added the mergify/can-merge Indicates that the PR can be added to the merge queue label Jun 30, 2022
@mergify mergify bot merged commit 2d5c68f into main Jun 30, 2022
@mergify mergify bot deleted the li0k/feat_keyspace_table_properties branch June 30, 2022 14:05
for (k, _) in scan_result {
let table_id = get_table_id(&k).unwrap();
let epoch = get_epoch(&k);
assert_eq!(table_id, existing_table_id);
assert!(epoch >= (watermark - ttl_expire as u64));
assert!(epoch >= min_epoch.0);
Copy link
Member

@xxchan xxchan Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huangjw806 pushed a commit that referenced this pull request Jul 5, 2022
* feat(storage): keyspace support table_properties

* feat(storage): iterator shadow keys which epoch under min_epoch

* feat(storage): TableOption for common

* feat(storage): ReadOption add ttl and calculate min_epoch function

* chore(storage): adapt some test with ReadOption

fix(storage): fix clippy

chore(storage): some unit-test for min_epoch check

chore(storage): remove table_option in keyspace

* fix(storage): fix min_epoch check in compaction_filter and test

fix(storage): unify the unit of ttl to second

fix(storage): doc comment

fix(storage): use TableOption parse to handle ttl which zero
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mergify/can-merge Indicates that the PR can be added to the merge queue type/feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants