Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a seek issue with prefix extractor and timestamp #7644

Closed
wants to merge 7 commits into from

Conversation

jay-zhuang
Copy link
Contributor

During seek, prefix compare should not include timestamp.

Test Plan: added unittest

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jay-zhuang for the fix. Mostly LGTM except a comment on test.

Options options = CurrentOptions();
options.env = env_;
options.create_if_missing = true;
options.prefix_extractor.reset(NewFixedPrefixTransform(3));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a prefix extractor that does not simply extract a prefix from user key?
If the prefix extractor takes <user_key, ts> and just returns a substring of user_key, then the fix introduced in this PR won't actually make a difference.
Is there a prefix extractor that takes <user_key, ts> and returns a result based on both user_key and ts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never thought about such usecase. Are you saying there could be usecase like: get prefix with the whole key + part of timestamp?
For example, with the following data:

foo1, ts1 -> value1
foo1, ts2 -> value2
foo1, ts3 -> value3

And prefix extractor is: key + first_part_of(ts). Is that correct? Or a valid case? And I assume the prefix extractor can only return a prefix of the user key, so to include timestamp, it has to include the whole key. Is that right?

This unittest just to reproduce the issue that the prefix key is compared with CompareWithoutTimestamp() instead of Compare().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so, according to a discussion with @pdillinger. #7589 (comment)

Some of our existing code feeds <user_key, ts> as input to prefix extractor, and we cannot assume that the ts part does not affect output of prefix extraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should include timestamp for prefix extractor:

  1. it's an internal implementation detail that we should not expose to the user. That we have timestamp right after user key and how the timestamp is encoded (I think you mentioned that we might remove timestamp if the data is on the bottom level like seq_num and maybe delta encoding?);
  2. All user's existing prefix extractors need to update to handle the timestamp, as the prefix may or may not include the timestamp:
    explicit FixedPrefixTransform(size_t prefix_len)
  3. I don't feel it's a useful usecase (if it's still valid) to include timestamp in prefix extractor. If we do need it, we should have a new prefix extractor interface to explicitly having the timestamp.

What you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, but I don't think the unit test is actually testing what we meant to fix. In fact, if just change version_set.cc as follows, the unit test still can pass.

diff --git a/db/version_set.cc b/db/version_set.cc
index 0fe09ef34..1d4ef1853 100644
--- a/db/version_set.cc
+++ b/db/version_set.cc
@@ -1100,9 +1100,9 @@ void LevelIterator::Seek(const Slice& target) {
     Slice file_user_key = ExtractUserKey(file_iter_.key());
     if (prefix_extractor_->InDomain(target_user_key) &&
         (!prefix_extractor_->InDomain(file_user_key) ||
-         user_comparator_.Compare(
-             prefix_extractor_->Transform(target_user_key),
-             prefix_extractor_->Transform(file_user_key)) != 0)) {
+         user_comparator_.CompareWithoutTimestamp(
+             prefix_extractor_->Transform(target_user_key), false,
+             prefix_extractor_->Transform(file_user_key), false) != 0)) {
       SetFileIterator(nullptr);
     }
   }

Namely, without stripping the timestamp from user key, the unit test can still pass.
I was saying we need a test case in which your change will make the unit pass, but the above will not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original comment was proposing a test case which, without your change, will mistakenly include timestamp for prefix extraction and yield a different result.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Added one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// db.NewIterator(). For prefix filtering to work properly,
// "prefix_extractor" and "comparator" must be such that the following
// properties hold:
//
// 1) key.starts_with(prefix(key))
// 2) Compare(prefix(key), key) <= 0.
// 3) If Compare(k1, k2) <= 0, then Compare(prefix(k1), prefix(k2)) <= 0
// 4) prefix(prefix(key)) == prefix(key)

I think we have to assume things might break if "prefix" is not actually a prefix of user key (without timestamp), but I haven't thought through what might break.

@riversand963
Copy link
Contributor

Or, we can keep unit test, but change only one line in version_set.cc as follows

diff --git a/db/version_set.cc b/db/version_set.cc
index 0fe09ef34..1d4ef1853 100644
--- a/db/version_set.cc
+++ b/db/version_set.cc
@@ -1100,9 +1100,9 @@ void LevelIterator::Seek(const Slice& target) {
     Slice file_user_key = ExtractUserKey(file_iter_.key());
     if (prefix_extractor_->InDomain(target_user_key) &&
         (!prefix_extractor_->InDomain(file_user_key) ||
-         user_comparator_.Compare(
-             prefix_extractor_->Transform(target_user_key),
-             prefix_extractor_->Transform(file_user_key)) != 0)) {
+         user_comparator_.CompareWithoutTimestamp(
+             prefix_extractor_->Transform(target_user_key), false,
+             prefix_extractor_->Transform(file_user_key), false) != 0)) {
       SetFileIterator(nullptr);
     }
   }

Then, we update the PR description so that it describes the real fix here: change Compre to CompareWithoutTimestamp.
Then, I think this PR is ready to go.

Copy link
Contributor

@riversand963 riversand963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jay-zhuang for the fix!

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

@jay-zhuang
Copy link
Contributor Author

Or, we can keep unit test, but change only one line in version_set.cc as follows

diff --git a/db/version_set.cc b/db/version_set.cc
index 0fe09ef34..1d4ef1853 100644
--- a/db/version_set.cc
+++ b/db/version_set.cc
@@ -1100,9 +1100,9 @@ void LevelIterator::Seek(const Slice& target) {
     Slice file_user_key = ExtractUserKey(file_iter_.key());
     if (prefix_extractor_->InDomain(target_user_key) &&
         (!prefix_extractor_->InDomain(file_user_key) ||
-         user_comparator_.Compare(
-             prefix_extractor_->Transform(target_user_key),
-             prefix_extractor_->Transform(file_user_key)) != 0)) {
+         user_comparator_.CompareWithoutTimestamp(
+             prefix_extractor_->Transform(target_user_key), false,
+             prefix_extractor_->Transform(file_user_key), false) != 0)) {
       SetFileIterator(nullptr);
     }
   }

Then, we update the PR description so that it describes the real fix here: change Compre to CompareWithoutTimestamp.
Then, I think this PR is ready to go.

I added an unit test to cover the case that prefix extractor includes timestamp, which will return invalid result if timestamp is included. As prefix bloom filter uses prefix without timestamp, so as here.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jay-zhuang jay-zhuang changed the title Fix a seek issue with timestamp Fix a seek issue with prefix extractor and timestamp Nov 10, 2020
@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang has updated the pull request. You must reimport the pull request before landing.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jay-zhuang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jay-zhuang merged this pull request in 18aee7d.

codingrhythm pushed a commit to SafetyCulture/rocksdb that referenced this pull request Mar 5, 2021
Summary:
During seek, prefix compare should not include timestamp.

Pull Request resolved: facebook#7644

Test Plan: added unittest

Reviewed By: riversand963

Differential Revision: D24772066

Pulled By: jay-zhuang

fbshipit-source-id: 3982655a8bf8da256a738e8497b73b3d9bdac92e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants