Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle file boundaries when timestamps should not be persisted #11578

Closed
wants to merge 3 commits into from

Conversation

jowlyzhang
Copy link
Contributor

Handle file boundaries FileMetaData.smallest, FileMetaData.largest for when persist_user_defined_timestamps is false:
1) on the manifest write path, the original user-defined timestamps in file boundaries are stripped. This stripping is done during VersionEdit::Encode to limit the effect of the stripping to only the persisted version of the file boundaries.
2) on the manifest read path during DB open, a a min timestamp is padded to the file boundaries. Ideally, this padding should happen during VersionEdit::Decode so that all in memory file boundaries have a compatible user key format as the running user comparator. However, because the user-defined timestamp size information is not available at that time. This change is added to VersionEditHandler::OnNonCfOperation.

Test Plan:

make all check
./version_edit_test --gtest_filter="*EncodeDecodeNewFile4HandleFileBoundary*". 
./db_with_timestamp_basic_test --gtest_filter="*HandleFileBoundariesTest*"

@jowlyzhang jowlyzhang marked this pull request as draft June 30, 2023 03:10
@jowlyzhang jowlyzhang force-pushed the file_boundary_handling branch 2 times, most recently from 51237aa to 9b91dbf Compare June 30, 2023 17:24
@jowlyzhang jowlyzhang marked this pull request as ready for review June 30, 2023 18:01
Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the good comments and tests! (some recommendations)

PutLengthPrefixedSlice(dst, meta.largest.Encode());
return;
}
std::string smallest_buf;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this is not performance critical code, but it bothers me somewhat that there's an extra key copy to do this. Not really because of this code but I expect something similar to happen in code that is more performance critical. I believe it's not simple because the timestamp is between the user key and the seqno+kind, but there could be a version of StripTimestampFromInternalKey that includes the length prefix. Or something like SliceParts could be used (but SliceParts does not encapsulate the sequence of Slices, just provides a view into an existing array of Slices). Something to think about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate a lot all the performance related suggestions! You are right that even this path is not a hot one, similar ideas are used in more performance critical places. Also this is a great idea! It looks like there is an implementation PutLengthPrefixedSliceParts that is ready to use to help save this one copy. I will work on a follow up to update this and places with similar logic to use an in place stripping version of function StripTimestampFromInternalKey that returns a SliceParts, and encoding with PutLengthPrefixedSliceParts to save this extra copy.

db/version_edit.h Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in f745263.

@jowlyzhang jowlyzhang deleted the file_boundary_handling branch July 11, 2023 18:35
mayuehappy added a commit to mayuehappy/rocksdb that referenced this pull request Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants