Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrefixSeek: ignore prefix_same_as_start when enable total_order_seek #11771

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

guoxiangCN
Copy link

@guoxiangCN guoxiangCN commented Aug 29, 2023

The following is a comment on the "prefix_same_as_start" configuration item found in the source code, which explicitly informs you to enable “total_order_seek”, It is invalid during seek, but in practical use, it was found that this parameter was not ignored.

 // Enforce that the iterator only iterates over the same prefix as the seek.
  // This option is effective only for prefix seeks, i.e. prefix_extractor is
  // non-null for the column family and total_order_seek is false.  Unlike
  // iterate_upper_bound, prefix_same_as_start only works within a prefix
  // but in both directions.
  // Default: false
  bool prefix_same_as_start;

In some scenarios, although we have set up PrefixExtractor, some keys are not within the Domain . When performing these queries with TotalOrderSeek, "prefix_same_as_start" will still take effect, and it will cause the program to abort due to SeekKey or LowerBound/UpperBound not being within the Domain, so abort when extract prefix.

@jowlyzhang jowlyzhang self-requested a review September 29, 2023 21:45
@jowlyzhang
Copy link
Contributor

jowlyzhang commented Sep 29, 2023

Thank you for this fix. You are right the documentation for this option is not aligned with the behavior. Specifically if users have this setting:
prefix_extractor != nullptr // prefix bloom filter is available
ReadOptions.total_order_seek = true // but do not use prefix bloom filter
ReadOptions.prefix_same_as_start = true // only iterate within the same prefix

The iterator will be set to invalid state when it encounters a key that doesn't have the same prefix as the target key. This behavior is not aligned with the documentation.

I think the original motivation for this option is to protect users that is using prefix bloom filter(total_order_seek = false) to not accidentally iterate out of the prefix bound because that can potentially give incorrect results. It might be even better to enforce prefix_same_as_start to be true in this case.

But it also doesn't hurt to allow users that are not using prefix bloom filter(total_order_seek = true) to explicitly express their need to just iterate within the target's prefix. For this, we can just fix the inconsistency in the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants