Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail DB::Open if hashing features enabled with incompatible Comparator #10293

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

pdillinger
Copy link
Contributor

Summary: RocksDB allows Comparators to treat keys with different byte
contents as equal, if that's appropriate for the application, but that
makes the comparator incompatible with a number of hashing-based
features that assume equal keys generate the same hash. Before now, the
only protection against getting wrong results with such incompatible
combinations is quietly disabling data block hash index if the
Comparator returns true for CanKeysWithDifferentByteContentsBeEqual().

This change attempts to detect all the cases in which RocksDB can
produce wrong results with such a Comparator and fail DB::Open with an
appropriate status. (More nuiance in HISTORY entry.) Many of the cases
were found by expanding the unit test and seeing what fails in data
correctness.

Also included:

  • Override CanKeysWithDifferentByteContentsBeEqual() for most of the
    testing comparators, as appropriate. (Didn't find any "bad configuration"
    cases in unit tests.)
  • Allow setting can_keys_with_different_byte_contents_be_equal in C API,
    and move name to be a char* rather than a function pointer to a
    function returning char*. (What would be the reason for that complexity?)
  • Include memtable Bloom in standard testing configurations, by adding to
    an existing configuration. (Now kVectorRepAndMemtableBloom)
  • Clarify API comment for ReverseBytewiseComparator
  • Rename a testing comparator that was named like
    ReverseBytewiseComparator but actually quite different.

Related to #10256

Test Plan: substantially updated unit tests

Summary: RocksDB allows Comparators to treat keys with different byte
contents as equal, if that's appropriate for the application, but that
makes the comparator incompatible with a number of hashing-based
features that assume equal keys generate the same hash. Before now, the
only protection against getting wrong results with such incompatible
combinations is quietly disabling data block hash index if the
Comparator returns true for `CanKeysWithDifferentByteContentsBeEqual()`.

This change attempts to detect all the cases in which RocksDB can
produce wrong results with such a Comparator and fail DB::Open with an
appropriate status. (More nuiance in HISTORY entry.)

Also included:
* ...

Test Plan: substantially updated unit tests
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235 hx235 self-requested a review July 8, 2022 20:19
@pdillinger pdillinger requested review from hx235 and removed request for hx235 July 8, 2022 20:21
@isaac-io
Copy link

@pdillinger note that the added checks make the check in the BlockBasedTableBuilder::Rep contructor redundant.

Also, a possible issue is that with the added checks, Java code that calls OptimizeForPointQuery() with a custom comparator will be broken because CanKeysWithDifferentByteContentsBeEqual() is not overridable on the Java side, so the default implementation which returns true kicks in.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

int (*compare_ts_)(void*, const char* a_ts, size_t a_tslen, const char* b_ts,
size_t b_tslen);
int (*compare_without_ts_)(void*, const char* a, size_t alen,
unsigned char a_has_ts, const char* b, size_t blen,
unsigned char b_has_ts);
const char* name_;
unsigned char ckwdbcbe_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replace this with the full name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full name on snake case is really really long. It's easy to see what it means with code navigation

{
Status s = TryReopen(new_options);
if (expect_open_failure) {
ASSERT_NOK(s);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we at least assert the NOK status is InvalidArg to prevent false positive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do that

HISTORY.md Show resolved Hide resolved
HISTORY.md Show resolved Hide resolved
@hx235 hx235 self-requested a review July 15, 2022 10:41
Copy link
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for a very helpful PR summary! Mostly LGTM with minor comments!

Before approving, just want to double check with the author on any thoughts about how this might affect internal users (https://github.com/facebook/rocksdb/pull/10293/files?show-viewed-files=true&file-filters%5B%5D=#r922044847) - we can chat more internally offline.

Thanks!

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@pdillinger
Copy link
Contributor Author

I'm considering putting this on hold because

  • It's not totally clear to what users should do if they only provide certain kinds of keys as input and expect the comparator to only deal with those keys. In that case, you might have keys with different bytes that would be accepted and compare as equal, but the user never intends to provide such keys. (For example, fixed size keys.) What should CanKeysWithDifferentByteContentsBeEqual return in such cases? Also, the comparator has to accept prefixes (from prefix_extractor), separators (from FindShortestSeparator), and successors (from FindShortSuccessor).
  • As mentioned in No way to have a custom comparator with a filter policy in RocksDB 7.x #10256, the previous mechanism (awkward, incomplete) for specifying a hash function to deal with equivalence classes among keys was removed in 7.0. It might be preferable to simply provide a mechanism for a Comparator to provide a custom hash function and have all the hashing-based structures use that. To minimize virtual call overhead, we could compute the hash once per op and carry it through calls to various table readers, etc. We probably still want to make it difficult to accidentally misuse hashing features with a custom comparator, so likely keep this change in some form.

@pdillinger
Copy link
Contributor Author

Also, the comparator has to accept prefixes (from prefix_extractor)

Correction: only if they are provided as Seek key

@ajkr
Copy link
Contributor

ajkr commented Jul 22, 2022

It's not totally clear to what users should do if they only provide certain kinds of keys as input and expect the comparator to only deal with those keys.

The Comparator mentions providing ordering for slices "used as keys in an sstable or a database". Should CanKeysWithDifferentByteContentsBeEqual() confine itself similarly, in which case users in the scenario you mentioned should return false? (It's not explicit what keys in an sstable are, but I'm guessing it refers to FindShortestSeparator().)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants