Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in format_version 3 + partition filters + prefix search #5835

Closed

Conversation

maysamyabandeh
Copy link
Contributor

@maysamyabandeh maysamyabandeh commented Sep 20, 2019

Partitioned filters make use of a top-level index to find the partition in which the filter resides. The top-level index has a key per partition. The key is guaranteed to be larger or equal than any key in that partition. When used with format_version 3, which excludes the sequence number form index keys, the separator key in the index could be equal to the prefix of the keys in the next partition. In this way, when searching for the key, the top-level index will lead us to the previous partition, which has no key with that prefix. The prefix bloom test thus returns false, although the prefix exists in the bloom of the next partition.
The patch fixes that by always adding the prefix of the first key of the next partition to the bloom of the current partition. In this way, in the corner cases that the index will lead us to the previous partition, we still can find the bloom filter there.

Here is an example: Keys are P1K1, and P2K2, each in a separate block. The separator in the index is P2. When seek to prefix P2, the index matches with separator P2 and points to the block that contains P1K1, where there used to be no bloom of prefix P2.
This is not an issue with format_version 2, since separator will be <P2, MaxSeq> while the search key will be <P2, snapshot_seq>, and since in rocksdb keys are sorted in reverse order of their sequence number we have <P2, snapshot_seq> larger than <P2, MaxSeq>, and thus the index will lead the search to the next 2nd block, where we have filter for P2K2 and its prefix P2.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -59,12 +59,13 @@ class FullFilterBlockBuilder : public FilterBlockBuilder {
virtual void AddKey(const Slice& key);
std::unique_ptr<FilterBitsBuilder> filter_bits_builder_;
virtual void Reset();
const SliceTransform* prefix_extractor_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making it public, can you add an accesser instead?

@siying
Copy link
Contributor

siying commented Sep 21, 2019

I think the bug is more than in format_version 3, so I suggest we modify the word in HISTORY.md and PR title.

@siying
Copy link
Contributor

siying commented Sep 21, 2019

Btw, I don't think the fix is a hack. It's a solid solution.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has updated the pull request. Re-import the pull request

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh has updated the pull request. Re-import the pull request

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maysamyabandeh has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@maysamyabandeh
Copy link
Contributor Author

Discussed offline. The description is updated to explain why the bug does not manifest with format version 2.

@facebook-github-bot
Copy link
Contributor

@maysamyabandeh merged this pull request in 6652c94.

merryChris pushed a commit to merryChris/rocksdb that referenced this pull request Nov 18, 2019
…cebook#5835)

Summary:
Partitioned filters make use of a top-level index to find the partition in which the filter resides. The top-level index has a key per partition. The key is guaranteed to be larger or equal than any key in that partition. When used with format_version 3, which excludes the sequence number form index keys, the separator key in the index could be equal to the prefix of the keys in the next partition. In this way, when searching for the key, the top-level index will lead us to the previous partition, which has no key with that prefix. The prefix bloom test thus returns false, although the prefix exists in the bloom of the next partition.
The patch fixes that by a hack: It always adds the prefix of the first key of the next partition to the bloom of the current partition. In this way, in the corner cases that the index will lead us to the previous partition, we still can find the bloom filter there.
Pull Request resolved: facebook#5835

Differential Revision: D17513585

Pulled By: maysamyabandeh

fbshipit-source-id: e2d1ff26c759e6e03875c4d57f4228316ecf50e9
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 17, 2024
Summary: Basically, the fix in facebook#8137 was incomplete (and I missed it in
the review), because if `whole_key_filtering` is false, then
`last_prefix_str_` will never be set to non-empty and the fix doesn't
work. Also related to facebook#5835.

This is intended as a safe, simple fix that will regress CPU efficiency
slightly (for `whole_key_filtering=false` cases, because of extra prefix
string copies during flush & compaction). An efficient fix is not
possible without some substantial refactoring.

Also in this PR: new test DBBloomFilterTest.FilterNumEntriesCoalesce
tests an adjacent code path that was previously untested for its effect
of ensuring the number of unique prefixes and keys is tracked properly
when both prefixes and whole keys are going into a filter. (Test fails
when either of the two code segments checking for duplicates is
disabled.) In addition, the same test would fail before the main bug fix
here because the code would inappropriately add the empty string to the
filter (because of unmodified `last_prefix_str_`).

Test Plan: In addition to DBBloomFilterTest.FilterNumEntriesCoalesce,
extended DBBloomFilterTest.SeekForPrevWithPartitionedFilters to cover
the broken case. (Mostly whitespace change.)
facebook-github-bot pushed a commit that referenced this pull request Jul 17, 2024
…2872)

Summary:
Basically, the fix in #8137 was incomplete (and I missed it in the review), because if `whole_key_filtering` is false, then `last_prefix_str_` will never be set to non-empty and the fix doesn't work. Also related to #5835.

This is intended as a safe, simple fix that will regress CPU efficiency slightly (for `whole_key_filtering=false` cases, because of extra prefix string copies during flush & compaction). An efficient fix is not possible without some substantial refactoring.

Also in this PR: new test DBBloomFilterTest.FilterNumEntriesCoalesce tests an adjacent code path that was previously untested for its effect of ensuring the number of unique prefixes and keys is tracked properly when both prefixes and whole keys are going into a filter. (Test fails when either of the two code segments checking for duplicates is disabled.) In addition, the same test would fail before the main bug fix here because the code would inappropriately add the empty string to the filter (because of unmodified `last_prefix_str_`).

Pull Request resolved: #12872

Test Plan: In addition to DBBloomFilterTest.FilterNumEntriesCoalesce, extended DBBloomFilterTest.SeekForPrevWithPartitionedFilters to cover the broken case. (Mostly whitespace change.)

Reviewed By: jowlyzhang

Differential Revision: D59873793

Pulled By: pdillinger

fbshipit-source-id: 2a7b7f09ca73dc188fb4dab833826ad6da7ebb11
pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Jul 18, 2024
…cebook#12872)

Summary:
Basically, the fix in facebook#8137 was incomplete (and I missed it in the review), because if `whole_key_filtering` is false, then `last_prefix_str_` will never be set to non-empty and the fix doesn't work. Also related to facebook#5835.

This is intended as a safe, simple fix that will regress CPU efficiency slightly (for `whole_key_filtering=false` cases, because of extra prefix string copies during flush & compaction). An efficient fix is not possible without some substantial refactoring.

Also in this PR: new test DBBloomFilterTest.FilterNumEntriesCoalesce tests an adjacent code path that was previously untested for its effect of ensuring the number of unique prefixes and keys is tracked properly when both prefixes and whole keys are going into a filter. (Test fails when either of the two code segments checking for duplicates is disabled.) In addition, the same test would fail before the main bug fix here because the code would inappropriately add the empty string to the filter (because of unmodified `last_prefix_str_`).

Pull Request resolved: facebook#12872

Test Plan: In addition to DBBloomFilterTest.FilterNumEntriesCoalesce, extended DBBloomFilterTest.SeekForPrevWithPartitionedFilters to cover the broken case. (Mostly whitespace change.)

Reviewed By: jowlyzhang

Differential Revision: D59873793

Pulled By: pdillinger

fbshipit-source-id: 2a7b7f09ca73dc188fb4dab833826ad6da7ebb11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants