-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for timestamp in Get/Put #5079
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! This would be a great feature to have in RocksDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! We need to document in some comments that it is just a feature in-progress, what is supported, not supported, and mention that format compatibility is not guaranteed and API is applicable to change. And I think it's pretty close to be merged!
db/db_impl.cc
Outdated
if (read_options.timestamp != nullptr) { | ||
Slice akey; | ||
std::string buf; | ||
Status s = AppendTimestamp(key, *(read_options.timestamp), &akey, &buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a TODO: in GetImpl() we already copy the key once to LookupKey lkey
. We should do the copying together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I have TODOs for both Write/Put and Get. Will be addressed in another PR. This one focuses more on API and tries to demonstrate how the internal can work with timestamp.
table/get_context.cc
Outdated
@@ -182,7 +182,8 @@ bool GetContext::SaveValue(const ParsedInternalKey& parsed_key, | |||
assert(matched); | |||
assert((state_ != kMerge && parsed_key.type != kTypeMerge) || | |||
merge_context_ != nullptr); | |||
if (ucmp_->Equal(parsed_key.user_key, user_key_)) { | |||
if (ucmp_->Equal(parsed_key.user_key, user_key_) || | |||
ucmp_->CompareWithoutTimestamp(parsed_key.user_key, user_key_) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it is correct to check only key without timestamp for hit, I think merge should only happen if both of user key and timestamp are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, merge should also look at timestamp. To RocksDB, two equal keys with different user-timestamps are just two different keys. Without further instruction from the user, e.g. merge policy, RocksDB needs to treat them as separate keys.
I didn't implement merge in this PR though.
db/db_impl.cc
Outdated
@@ -1352,6 +1352,15 @@ ColumnFamilyHandle* DBImpl::DefaultColumnFamily() const { | |||
Status DBImpl::Get(const ReadOptions& read_options, | |||
ColumnFamilyHandle* column_family, const Slice& key, | |||
PinnableSlice* value) { | |||
if (read_options.timestamp != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not in this PR, but we should think about returning which timestamp for the entry returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we return the timestamp to user? Say if user specifies 55 as timestamp, and the db has a record with ts 54, we need to return 54 as well as the key-value pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We have been talking about a hidden column or something like that in SQL tables which is timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are very close to get it committed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's land the first commit!!! Please mention it in HISTORY.md
Failed tests are legitimate though. |
Taking a look now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@siying, I think we should also persist timestamp size info to the MANIFEST using the safe ignore mask. |
Discussed with @siying offline. For now we can still rely on comparator name to identify comparators without persisting timestamp size in the MANIFEST. It may be a good idea to postpone MANIFEST format change (although can be forward-compatible) until we are ready to officially release timestamp support. |
@riversand963 has updated the pull request. Re-import the pull request |
@siying. Since we may likely allow users to construct their own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@riversand963 has updated the pull request. Re-import the pull request |
@riversand963 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@riversand963 has updated the pull request. Re-import the pull request |
Summary: Explicitly override ComparaWithoutTimestamp in BytewiseComparator and ReverseBytewiseComparator to avoid an extra virtual function call incurred by Comparator::CompareWithoutTimestamp.
Summary: We need to make timestamp_size_ non const because we implicitly use assignment operator. Also make it private to disable subclass modifying its value. Finally, add an explicit overloaded assignment.
@riversand963 has updated the pull request. Re-import the pull request |
@riversand963 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@riversand963 has updated the pull request. Re-import the pull request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@riversand963 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@riversand963 merged this pull request in 340ed4f. |
Summary: It's useful to be able to (optionally) associate key-value pairs with user-provided timestamps. This PR is an early effort towards this goal and continues the work of facebook#4942. A suite of new unit tests exist in DBBasicTestWithTimestampWithParam. Support for timestamp requires the user to provide timestamp as a slice in `ReadOptions` and `WriteOptions`. All timestamps of the same database must share the same length, format, etc. The format of the timestamp is the same throughout the same database, and the user is responsible for providing a comparator function (Comparator) to order the <key, timestamp> tuples. Once created, the format and length of the timestamp cannot change (at least for now). Test plan (on devserver): ``` $COMPILE_WITH_ASAN=1 make -j32 all $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/* $make check ``` All tests must pass. We also run the following db_bench tests to verify whether there is regression on Get/Put while timestamp is not enabled. ``` $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000 $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 ``` Repeat for 6 times for both versions. Results are as follows: ``` | | readrandom | fillrandom | | master | 16.77 MB/s | 47.05 MB/s | | PR5079 | 16.44 MB/s | 47.03 MB/s | ``` Pull Request resolved: facebook#5079 Differential Revision: D15132946 Pulled By: riversand963 fbshipit-source-id: 833a0d657eac21182f0f206c910a6438154c742c
Summary: In previous #5079, we added user-specified timestamp to `DB::Get()` and `DB::Put()`. Limitation is that these two functions may cause extra memory allocation and key copy. The reason is that `WriteBatch` does not allocate extra memory for timestamps because it is not aware of timestamp size, and we did not provide an API to assign/update timestamp of each key within a `WriteBatch`. We address these issues in this PR by doing the following. 1. Add a `timestamp_size_` to `WriteBatch` so that `WriteBatch` can take timestamps into account when calling `WriteBatch::Put`, `WriteBatch::Delete`, etc. 2. Add APIs `WriteBatch::AssignTimestamp` and `WriteBatch::AssignTimestamps` so that application can assign/update timestamps for each key in a `WriteBatch`. 3. Avoid key copy in `GetImpl` by adding new constructor to `LookupKey`. Test plan (on devserver): ``` $make clean && COMPILE_WITH_ASAN=1 make -j32 all $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/* $make check ``` If the API extension looks good, I will add more unit tests. Some simple benchmark using db_bench. ``` $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000 $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 -disable_wal=true ``` Master is at a78503b. ``` | | readrandom | fillrandom | | master | 15.53 MB/s | 25.97 MB/s | | PR5502 | 16.70 MB/s | 25.80 MB/s | ``` Pull Request resolved: #5502 Differential Revision: D16340894 Pulled By: riversand963 fbshipit-source-id: 51132cf792be07d1efc3ac33f5768c4ee2608bb8
…k#5502) Summary: In previous facebook#5079, we added user-specified timestamp to `DB::Get()` and `DB::Put()`. Limitation is that these two functions may cause extra memory allocation and key copy. The reason is that `WriteBatch` does not allocate extra memory for timestamps because it is not aware of timestamp size, and we did not provide an API to assign/update timestamp of each key within a `WriteBatch`. We address these issues in this PR by doing the following. 1. Add a `timestamp_size_` to `WriteBatch` so that `WriteBatch` can take timestamps into account when calling `WriteBatch::Put`, `WriteBatch::Delete`, etc. 2. Add APIs `WriteBatch::AssignTimestamp` and `WriteBatch::AssignTimestamps` so that application can assign/update timestamps for each key in a `WriteBatch`. 3. Avoid key copy in `GetImpl` by adding new constructor to `LookupKey`. Test plan (on devserver): ``` $make clean && COMPILE_WITH_ASAN=1 make -j32 all $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/* $make check ``` If the API extension looks good, I will add more unit tests. Some simple benchmark using db_bench. ``` $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000 $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 -disable_wal=true ``` Master is at a78503b. ``` | | readrandom | fillrandom | | master | 15.53 MB/s | 25.97 MB/s | | PR5502 | 16.70 MB/s | 25.80 MB/s | ``` Pull Request resolved: facebook#5502 Differential Revision: D16340894 Pulled By: riversand963 fbshipit-source-id: 51132cf792be07d1efc3ac33f5768c4ee2608bb8
Summary: Fixes facebook#5079. So what happens was 1. Durable uploads CLOUDMANIFEST 2. Non-durable downloads CLOUDMANIFEST 3. Non-durable tried to open MANIFEST, couldn't find it locally nor in the cloud, assuming it's a new DB. 4. Durable uploads new MANIFEST 5. Non-durable wrote new epoch to its local CLOUDMANIFEST 6. Non-durable failed to open MANIFEST, restarts. At restart, it resuses its local CLOUDMANIFEST without compatible MANIFEST, goes into crash loop. If 3 and 4 reverses order, non-durable will be able to open the DB. This diff ensures that if this scenario occurs, non-durable will reinitializes the entire directory. Test Plan: added a unittest to describe such scenario. Reviewers: dhruba, igor Reviewed By: igor Differential Revision: https://rockset.phacility.com/D5571
It's useful to be able to (optionally) associate key-value pairs with user-provided timestamps. This PR is an early effort towards this goal and continues the work of #4942. A suite of new unit tests exist in DBBasicTestWithTimestampWithParam. Support for timestamp requires the user to provide timestamp as a slice in
ReadOptions
andWriteOptions
. All timestamps of the same database must share the same length, format, etc. The format of the timestamp is the same throughout the same database, and the user is responsible for providing a comparator function (Comparator) to order the <key, timestamp> tuples. Once created, the format and length of the timestamp cannot change (at least for now).Test plan (on devserver):
All tests must pass.
We also run the following db_bench tests to verify whether there is regression on Get/Put while timestamp is not enabled.
Repeat for 6 times for both versions.
Results are as follows: