-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the comparator from the sst file table properties in sst_dump_tool #9491
Conversation
@@ -1079,7 +1080,7 @@ INSTANTIATE_TEST_CASE_P(CompactionIteratorWithAllowIngestBehindTestInstance, | |||
class CompactionIteratorTsGcTest : public CompactionIteratorTest { | |||
public: | |||
CompactionIteratorTsGcTest() | |||
: CompactionIteratorTest(test::ComparatorWithU64Ts()) {} | |||
: CompactionIteratorTest(rocksdb::ComparatorWithU64Ts()) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other things aside, we no longer hard-code the namespace name as rocksdb
any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it.
I think this PR tries to address two issues:
IMO, we can split this PR into two for 1 and 2 respectively. For 1, we can create an SST file using For 2, we can spend more time thinking about the approach based on 1. |
@riversand963 this PR addresses 1 and validates with |
For 1, we do not have to move the two test comparators from test namespace to ROCKDB_NAMESPACE. The API additions are unnecessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @satyajanga for the PR.
I think we can keep the two test comparators in the test namespace in this PR, and implement support for obtaining comparator name from table properties. With unit tests addition, I think we are in the right direction.
tools/sst_dump_test.cc
Outdated
@@ -90,7 +90,8 @@ class SSTDumpToolTest : public testing::Test { | |||
snprintf(usage[2], kOptLength, "--file=%s", file_path.c_str()); | |||
} | |||
|
|||
void createSST(const Options& opts, const std::string& file_name) { | |||
void createSST(const Options& opts, const std::string& file_name, | |||
bool reverse = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing bool reverse
, I think we can pass a comparator name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I misread the code. Ignore this one.
tools/sst_dump_test.cc
Outdated
for (uint32_t i = 0; i < num_keys; i++) { | ||
tb->Add(MakeKey(i), MakeValue(i)); | ||
if (reverse) { | ||
for (uint32_t i = num_keys; i > 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, 0 is not tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided it because the condition i>=0 will cause the infinite loop since i is unsigned.
I will change the i type int32_t
table/sst_file_dumper.cc
Outdated
table_properties_->comparator_name, | ||
&user_comparator); | ||
if (s.ok()) { | ||
internal_comparator_ = InternalKeyComparator(user_comparator, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we can put /*named=*/
before true
. Usually, we try to do this to help understand the code without relying on IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it
Introduce a CountedFileSystem for counting file operations (facebook#9283)
util/comparator.cc
Outdated
|
||
/* Comparator with 64-bit integer timestamp. | ||
This is still experimental, use it with caution. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use //
for multi-line comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in the latest commit
/* Comparator with 64-bit integer timestamp. | ||
This is still experimental, use it with caution. | ||
*/ | ||
class ComparatorWithU64TsImpl : public Comparator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this comparator is now in ROCKSDB_NAMESPACE and will be exposed via public API, we need improve it more.
Currently, the constructor hard-codes BytewiseComparator()
, which is not necessary. We should be able to wrap an arbitrary comparator (without ts) using this class. When wrapping around a pointer to another comparator, we should consider object lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its slightly tricky as I think about it.
If we can pass any arbitrarory comparator, we need a way to store/retrieve it from the sst file properties.
We need two things
- timestamp comparator -> we have it
- and the comparator without ts -> we dont have it If I understand correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there is some confusion here.
Say we want to support BytewiseComparatorWithU64Ts
, ReverseBytewiseComparatorWithU64Ts
and FancyComparatorWithU64Ts()
. In the object registry, we have the following:
"leveldb.BytewiseComparator" => BytewiseComparator()
"rocksdb.ReverseBytewiseComparator" => ReverseBytewiseComparator()
"rocksdb.BytewiseComparator.u64ts" => BytewiseComparatorWithU64Ts() (this does not have to be API yet)
"rocksdb.ReverseBytewiseComparator.u64ts"=> ReverseBytewiseComparatorWithU64Ts() (does not have to be public API yet)
"rocksdb.FancyComparator.u64ts" => FancyComparatorWithU64Ts() // I made this up
BytewiseComparatorWithU64Ts()
, ReverseBytewiseComparatorWithU64Ts()
and FancyComparatorWithU64Ts()
can all use ComparatorWithU64TsImpl
class. In each of them, they can determine how they construct ComparatorWithU64TsImpl
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated this based on the internal chat
include/rocksdb/comparator.h
Outdated
@@ -150,4 +150,6 @@ extern const Comparator* BytewiseComparator(); | |||
// ordering. | |||
extern const Comparator* ReverseBytewiseComparator(); | |||
|
|||
extern const Comparator* ComparatorWithU64Ts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put the comments about its limitation here, and explicitly state it's experimental and subject to change/removal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should think more about the API name and contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented earlier, we can avoid public API change
@satyajanga has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Thinking more about this. I still think we should avoid adding public API if there is a way to do with out it. Adding a new API to expose an unoptimized comparator binding with a specific timestamp implementation seems hacky to me. Maybe we can add it later after longer period of time and broader discussion.
|
@satyajanga has updated the pull request. You must reimport the pull request before landing. |
util/comparator.cc
Outdated
public: | ||
ComparatorWithU64TsImpl() | ||
: Comparator(/*ts_sz=*/sizeof(uint64_t)), | ||
cmp_without_ts_(BytewiseComparator()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to change constructor to pass a
ComparatorWithU64TsImpl(T ). In the constructor I will cast and assign to Comparator
@satyajanga has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. Left a few comments.
HISTORY.md
Outdated
@@ -30,6 +30,8 @@ | |||
|
|||
## New Features | |||
* Introduced an option `BlockBasedTableBuilder::detect_filter_construct_corruption` for detecting corruption during Bloom Filter (format_version >= 5) and Ribbon Filter construction. | |||
* Introduced a new experimental comparator for user defined timestamp feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it
include/rocksdb/comparator.h
Outdated
@@ -150,4 +150,6 @@ extern const Comparator* BytewiseComparator(); | |||
// ordering. | |||
extern const Comparator* ReverseBytewiseComparator(); | |||
|
|||
extern const Comparator* ComparatorWithU64Ts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented earlier, we can avoid public API change
util/comparator.cc
Outdated
"T must be a inherited type of comparator"); | ||
|
||
public: | ||
ComparatorWithU64TsImpl() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add explicit
.
util/comparator.cc
Outdated
static const char* kClassName() { | ||
// Putting this up just for getting opinion, need to make this better. | ||
static char buf[128]; | ||
sprintf(buf, "%s.u64", T::kClassName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discourage the usage of sprint
. When you have to, use stringstream, especially when perf is not an issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the pointer modified it
util/comparator.cc
Outdated
} | ||
|
||
private: | ||
const Comparator* cmp_without_ts_{nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can actually be const T*
.
util/comparator.cc
Outdated
// EXPERIMENTAL | ||
// Comparator with 64-bit integer timestamp. | ||
// We did not performance test this yet. | ||
template <typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: TComp
or TComparator
or ComparatorType
.
Modified the diff based on our chat |
@satyajanga has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@satyajanga has updated the pull request. You must reimport the pull request before landing. |
@satyajanga has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
db/db_test2.cc
Outdated
@@ -17,6 +17,7 @@ | |||
#include "options/options_helper.h" | |||
#include "port/port.h" | |||
#include "port/stack_trace.h" | |||
#include "rocksdb/comparator.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
@@ -9,6 +9,7 @@ | |||
#include <vector> | |||
|
|||
#include "db/dbformat.h" | |||
#include "options/options_parser.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed?
const Comparator* BytewiseComparatorWithU64TsWrapper() { | ||
ConfigOptions config_options; | ||
const Comparator* user_comparator = nullptr; | ||
Status s = Comparator::CreateFromString( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to check s
so that the failing CircleCI test can pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just do s.PermitUncheckedError()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the s.PermitUncheckedError() for addressing circleci test failures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I think we may want to expose the error s
to caller via something like
Status GetBytewiseComparatorWithU64TsWrapper(const Comparator** cmp);
sorry for the late comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely expose it if it is a public API or prod code,
Please remember that this is a test only function.
if we want to expose the Status then we probably dont need this util, because this wrapper is not saving much code.
Every usage in the test looks like below
const Comparator* user_comparator = nullptr;
Status s = BytewiseComparatorWithU64TsWrapper(&user_comparator);
assert(s.ok());
options.comparator = user_comparator;
Let me know if. you think we should do this.
tools/sst_dump_test.cc
Outdated
@@ -7,6 +7,10 @@ | |||
// Use of this source code is governed by a BSD-style license that can be | |||
// found in the LICENSE file. See the AUTHORS file for names of contributors. | |||
|
|||
#include <cstdint> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this here?
If we do, is it possible to remove the #include <stdint.h>
in line 16?
tools/sst_dump_test.cc
Outdated
for (uint32_t i = 0; i < num_keys; i++) { | ||
tb->Add(MakeKey(i), MakeValue(i)); | ||
const std::string comparator_name = ikc.user_comparator()->Name(); | ||
if (comparator_name == ReverseBytewiseComparator()->Name()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: very minor. there are several implicit data cast and string creation here. It may be easier to just use strcmp
for the two comparator's Name()
. Not a big deal for test.
tools/sst_dump_test.cc
Outdated
char buf[100]; | ||
snprintf(buf, sizeof(buf), "v_%04d", i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to wrap this logic in a helper function MakeInternalkey(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be more interesting to make the keys have different timestamps.
util/comparator.cc
Outdated
#include <memory> | ||
#include <mutex> | ||
#include <sstream> | ||
#include <string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed.
util/comparator.cc
Outdated
@@ -12,9 +12,13 @@ | |||
#include <stdint.h> | |||
|
|||
#include <algorithm> | |||
#include <cstring> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not needed.
util/comparator.cc
Outdated
} | ||
|
||
TComparator cmp_without_ts_; | ||
std::string name_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ever used?
util/comparator.cc
Outdated
// For TComparator = BytewiseComparator the return value is | ||
// "leveldb.BytewiseComparator.u64ts" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary and can be removed.
table_properties_->comparator_name, | ||
&user_comparator); | ||
if (s.ok()) { | ||
internal_comparator_ = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's assert user_comparator is not nullptr at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Also, can you remove the following from PR description
The PR description here will be the final commit message, thus we want it to be concise and precise. |
@satyajanga has updated the pull request. You must reimport the pull request before landing. |
Summary: We introduced a new Comparator for timestamp in user keys. In the sst_dump_tool by default we use BytewiseComparator to read sst files. This change allows us to read comparator_name from table properties in meta data block and use it to read. Test Plan: make check Reviewers: Subscribers: Tasks: Tags:
Summary: We introduced a new Comparator for timestamp in user keys. In the sst_dump_tool by default we use BytewiseComparator to read sst files. This change allows us to read comparator_name from table properties in meta data block and use it to read. Test Plan: make check Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:
- removed the public facing api - templatized the ComparatorWithU64Ts
@satyajanga has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @satyajanga for the PR.
The changes LGTM. Can you also add a manual test to the test plan? For example, run
make -j16 sst_dump db_with_timestamp_basic_test
KEEP_DB=1 ./db_with_timestamp_basic_test --gtest_filter=<TestName>
Then use sst_dump on the resulting database.
./sst_dump --from=<from> --file=<>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM again. Thanks @satyajanga for the contribution
@satyajanga has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@satyajanga has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@satyajanga has updated the pull request. You must reimport the pull request before landing. |
@satyajanga has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@satyajanga has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: rocksdb.compact_deletes testcase failed ``` sst_dump: rocksdb/table/sst_file_dumper.cc:147: my_rocksdb::Status my_rocksdb::SstFileDumper::GetTableReader(const std::string &): Assertion `user_comparator' failed. suite/rocksdb/t/sst_count_rows.sh: line 27: [: =: unary operator expected ``` The issue is related to Rocksdb PR: facebook/rocksdb#9491 "Use the comparator from the sst file table properties in sst_dump_tool". After this PR, sst_dump will use The comparator from myrocks sst file which is "RocksDB_SE_v3.10". the assert is triggered by rocksdb couldn't find comparator. similar to sst_mysql_ldb, register mysql comparator during startup Reviewed By: yizhang82 Differential Revision: D36391186 fbshipit-source-id: 520242b0a4db2dfbdc46c968457fa2c9e9bf44f4
facebook#9491) Summary: We introduced a new Comparator for timestamp in user keys. In the sst_dump_tool by default we use BytewiseComparator to read sst files. This change allows us to read comparator_name from table properties in meta data block and use it to read. Pull Request resolved: facebook#9491 Test Plan: added unittests for new functionality. make check ![image](https://user-images.githubusercontent.com/4923556/152915444-28b88a1f-7b4e-47d0-815f-7011552bd9a2.png) ![image](https://user-images.githubusercontent.com/4923556/152916196-bea3d2a1-a3d5-4362-b911-036131b83e8d.png) Reviewed By: riversand963 Differential Revision: D33993614 Pulled By: satyajanga fbshipit-source-id: 4b5cf938e6d2cb3931d763bef5baccc900b8c536 Signed-off-by: v01dstar <yang.zhang@pingcap.com>
facebook#9491) Summary: We introduced a new Comparator for timestamp in user keys. In the sst_dump_tool by default we use BytewiseComparator to read sst files. This change allows us to read comparator_name from table properties in meta data block and use it to read. Pull Request resolved: facebook#9491 Test Plan: added unittests for new functionality. make check ![image](https://user-images.githubusercontent.com/4923556/152915444-28b88a1f-7b4e-47d0-815f-7011552bd9a2.png) ![image](https://user-images.githubusercontent.com/4923556/152916196-bea3d2a1-a3d5-4362-b911-036131b83e8d.png) Reviewed By: riversand963 Differential Revision: D33993614 Pulled By: satyajanga fbshipit-source-id: 4b5cf938e6d2cb3931d763bef5baccc900b8c536 Signed-off-by: v01dstar <yang.zhang@pingcap.com>
Summary:
We introduced a new Comparator for timestamp in user keys. In the sst_dump_tool by default we use BytewiseComparator to read sst files. This change allows us to read comparator_name from table properties in meta data block and use it to read.
Test Plan:
added unittests for new functionality.
make check