Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track per-SST user-defined timestamp information in MANIFEST #9092

Closed
wants to merge 6 commits into from

Conversation

sunlike-Lipo
Copy link
Contributor

@sunlike-Lipo sunlike-Lipo commented Oct 29, 2021

Track per-SST user-defined timestamp information in MANIFEST #8957

Rockdb has supported user-defined timestamp feature. Application can specify a timestamp
when writing each k-v pair. When data flush from memory to disk file called SST files, file
creation activity will commit to MANIFEST. This commit is for tracking timestamp info in the
MANIFEST for each file. The changes involved are as follows:

Summary:

  1. Track max/min timestamp in FileMetaData, and fix invoved codes.
  2. Add NewFileCustomTag::kMinTimestamp and NewFileCustomTag::kMinTimestamp in
    NewFileCustomTag ( in the kNewFile4 part ), and support invoved codes such as
    VersionEdit Encode and Decode etc.
  3. Add unit test code for VersionEdit EncodeDecodeNewFile4, and fix invoved test codes.

@sunlike-Lipo sunlike-Lipo changed the title add some description about timestamp in metadata Track per-SST user-defined timestamp information in MANIFEST Oct 29, 2021
@wolfkdy
Copy link
Contributor

wolfkdy commented Nov 4, 2021

@riversand963 I've reviewed this mr and my colleague post it.
The original mr is splitted into two mrs.
What you commented in the original mr are all fixed.
Shall you take a look when free, thanks!

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry again for the delay, have been a little busier lately.
Thanks @sunlike-Lipo for the PR, what this PR currently has LGTM.
I cannot find the logic of actually computing min/max timestamp for each file during flush/compaction/ingestion/..., is that going to me in another PR, or am I missing anything here?
cc @wolfkdy

@@ -186,6 +186,13 @@ bool VersionEdit::EncodeTo(std::string* dst) const {
PutVarint32(dst, NewFileCustomTag::kFileChecksumFuncName);
PutLengthPrefixedSlice(dst, Slice(f.file_checksum_func_name));

if (f.max_timestamp != kDisableUserTimestamp) {
assert(f.min_timestamp != kDisableUserTimestamp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return false if assertion fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, fixed

f.min_timestamp = field.ToString();
break;
case kMaxTimestamp:
f.max_timestamp = field.ToString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a check for the equality of min_timestamp.size() and max_timestamp.size() somewhere in this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, added

@riversand963
Copy link
Contributor

oh, you split that to #9093 .
I can see the confusion now. What I originally meant was to also store the min/max timestamp in the properties block of each SST, which can be in another PR, (I think).
What #9093 does is to compute min/max timestamps using properties collectors.
nvm, we can still merge them separately.

@riversand963
Copy link
Contributor

I also suggest that the description of this PR be updated since it's too short now. It would be great if the description can briefly mention we introduce (compatible) format changes to MANIFEST, etc.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments.

@@ -186,6 +186,15 @@ bool VersionEdit::EncodeTo(std::string* dst) const {
PutVarint32(dst, NewFileCustomTag::kFileChecksumFuncName);
PutLengthPrefixedSlice(dst, Slice(f.file_checksum_func_name));

if (f.max_timestamp != kDisableUserTimestamp) {
if (f.min_timestamp == kDisableUserTimestamp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer a more strict check.

if (f.min_timestamp.size() != f.max_timestamp.size()) {
  assert(false);
  return false;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I fixed it as your suggestion.

@@ -315,6 +324,7 @@ const char* VersionEdit::DecodeNewFile4From(Slice* input) {
return "new-file4 custom field";
}
if (custom_tag == kTerminate) {
assert(f.min_timestamp.size() == f.max_timestamp.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of error, we need to return a string literal to indicate error. THe assertion will be optimized away in opt build mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I added some return msg after assert.

…8957

Rockdb has supported user-defined timestamp feature. Application can
specify a timestamp when writing each k-v pair. When data flush from
memory to disk file called SST files, file creation activity will
commit to MANIFEST. This commit is for tracking timestamp info in the
MANIFEST for each file. The changes involved are as follows:

1) Track max/min timestamp in FileMetaData, and fix invoved codes.
2) Add max/min timestamp in NewFileCustomTag(in the knEWfILE4 part),
   and support invoved codes such as VersionEdit Encode and Decode etc.
3) Add unit test code for VersionEdit EncodeDecodeNewFile4, and fix
   invoved test codes.
@sunlike-Lipo
Copy link
Contributor Author

I also suggest that the description of this PR be updated since it's too short now. It would be great if the description can briefly mention we introduce (compatible) format changes to MANIFEST, etc.

I add some description about this pr. Thanks a lot for suggestion.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sunlike-Lipo for the PR. Mostly LGTM except two minor comments.

Comment on lines 804 to 807
r.append(" min_timestamp:");
r.append(f.min_timestamp);
r.append(" max_timestamp:");
r.append(f.max_timestamp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for this late comment.
I think printing the timestamps in hex format seems better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. now fix it to Slice(f.min_timestamp).ToString(true).

Comment on lines 923 to 924
jw << "MinTimestamp" << f.min_timestamp;
jw << "MaxTimestamp" << f.max_timestamp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print timestamp in hex format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. now fix it to Slice(f.min_timestamp).ToString(true).

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@riversand963
Copy link
Contributor

Turns out our internal linter is unhappy about putting kDisableUserTimestamp definition in dbformat.h. Can we declare it in dbformat.h and define it in dbformat.cc?

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@sunlike-Lipo
Copy link
Contributor Author

Turns out our internal linter is unhappy about putting kDisableUserTimestamp definition in dbformat.h. Can we declare it in dbformat.h and define it in dbformat.cc?

Sure. That sounds more reasonable. fixed

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@riversand963
Copy link
Contributor

Thanks @sunlike-Lipo a lot for the PR! I have already initiated internal review process for merging.
Just realized one minor thing, maybe we should still mention this PR in HISTORY.md although this PR does not currently enable the functionality of tracking per-file timestamps. I was a little hesitant before, but now I think we should still update HISTORY. After that, I think we are good to go.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@sunlike-Lipo has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@sunlike-Lipo
Copy link
Contributor Author

Thanks @sunlike-Lipo a lot for the PR! I have already initiated internal review process for merging. Just realized one minor thing, maybe we should still mention this PR in HISTORY.md although this PR does not currently enable the functionality of tracking per-file timestamps. I was a little hesitant before, but now I think we should still update HISTORY. After that, I think we are good to go.

OK, I modified the HISTORY.md.

@facebook-github-bot
Copy link
Contributor

@riversand963 merged this pull request in 937fbcb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants