-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option preserve_internal_time_seconds
to preserve the time info
#10747
Conversation
47e0f50
to
4fceba5
Compare
track_internal_time_seconds
to enable internal time trackingpreserve_internal_time_seconds
to preserve the time info
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
HISTORY.md
Outdated
@@ -6,6 +6,9 @@ | |||
* Fixed an optimistic transaction validation bug caused by DBImpl::GetLatestSequenceForKey() returning non-latest seq for merge (#10724). | |||
* Fixed a bug in iterator refresh which could segfault for DeleteRange users (#10739). | |||
|
|||
### New Features | |||
* Add option `preserve_internal_time_seconds` for tiered storage migration, which can start collecting the preserving the time information for the new data, but won't enable the tiering feature (`preclude_last_level_data_seconds`) yet. The time information can be used to decide the temperature of the existing data when the tiering is enabled later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preserve_internal_time_seconds can potentially be used for more accurate oldest ancestor time estimation, right? So maybe explain it more in a boarder way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature hasn't been implemented yet (we abandoned #8185 because based on test, there's limited improvement with random dataset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. I mean just be vague about what the feature does, and not limit to tiered storage. Even now, the feature can be used for manual checking the age of the keys.
db/compaction/compaction_job.cc
Outdated
uint64_t preserve_time_duration = | ||
c->immutable_options()->preclude_last_level_data_seconds == 0 | ||
? c->immutable_options()->preserve_internal_time_seconds | ||
: c->immutable_options()->preclude_last_level_data_seconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be max(preserve_internal_time_seconds, preclude_last_level_data_seconds)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about that too, but only slightly leaning to have "preclude_last_level" to override "preserve_time":
- For simplification, so only one number is valid.
- Implementation wise, it also simpler, that only one number needs to be stored in
compaction_job
.
But the main question is: is there usecase that the use want to set these 2 number differently?
Currently I think the feature is still be part of "preclude_last_level".
Enable time tracking may improve the oldest_ancestor_time estimation, but the feature hasn't be implemented and for normal dataset, it doesn't improve the estimation too much based on some simulator test (that's one of the reason we abandon the original PR #8185 and currently we hope RoundRobin can fundamentally fix the TTL compaction issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we only limit the feature to tiered storage (which I would argue we shouldn't), I assume we would leave room where preserve_internal_time_seconds > preclude_last_level_data_seconds
, if users would preserve the choice that they would increase preclude_last_level_data_seconds
later and hope the it takes effective sooner.
db/db_impl/db_impl.cc
Outdated
uint64_t track_time_duration = | ||
cfd->ioptions()->preclude_last_level_data_seconds == 0 | ||
? cfd->ioptions()->preserve_internal_time_seconds | ||
: cfd->ioptions()->preclude_last_level_data_seconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, should it be max() if preclude_last_level_data_seconds <> 0?
include/rocksdb/advanced_options.h
Outdated
// It's typically used for the migration of existing data to tiered storage. | ||
// The user can enable this feature first, then the existing data will have | ||
// time information for `preclude_last_level_data_seconds` feature which will | ||
// be enabled later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments usually explain the default value and its behavior. We should say whether it is dynamically changeable or not. I think it should be changeable if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add comment to specify that it's not dynamic changeable for now (the same as "preclude_last_level"). And give more information about what it's doing.
1beaeb2
to
3f822df
Compare
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
db/seqno_time_test.cc
Outdated
SeqnoTimeTablePropTest() : SeqnoTimeTest() {} | ||
|
||
void SetTrackTimeDurationOptions(Options& options, | ||
uint64_t track_time_duration) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Google C++ Style suggests "put all input-only parameters before any output parameters".
|
||
ASSERT_OK(db_->CompactRange(cro, nullptr, nullptr)); | ||
|
||
ASSERT_GT(num_seqno_zeroing, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be nice to cover the case where in the last level, some keys are zeroed out and some aren't, based on time of insertion.
It is also better to validate by reading internal keys too. Perhaps GetAllKeyVersions() can be used for the validation.
? _current_time - preserve_time_duration | ||
: 0; | ||
preserve_time_min_seqno_ = | ||
seqno_time_mapping_.GetOldestSequenceNum(preserve_time); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't completely get why now the oldest entry isn't seqno to preserve anymore, but at least it's still correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, it is the oldest entry for this version of code. (my first version was having preserve_time and preclude_time differently, then I changed mind to have preclude_time to override preserve_time if both set, but forgot to revert this code. Now I think we're going back to have them differently).
3f822df
to
d56a5f5
Compare
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jay-zhuang has updated the pull request. You must reimport the pull request before landing. |
@jay-zhuang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Lock the penultimate level for the whole compaction inputs range, so any key in that compaction is safe to move up from the last level to penultimate level. It depends on facebook#10747 (for the migration test)
Add option
preserve_internal_time_seconds
to preserve the internaltime information.
It's mostly for the migration of the existing data to tiered storage (
preclude_last_level_data_seconds
). When the tiering feature is justenabled, the existing data won't have the time information to decide if
it's hot or cold. Enabling this feature will start collect and preserve
the time information for the new data.