Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hide deprecated, inefficient block-based filter from public API #9535

Closed
wants to merge 3 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Feb 9, 2022

Summary: This change removes the ability to configure the deprecated,
inefficient block-based filter in the public API. Options that would
have enabled it now use "full" (and optionally partitioned) filters.
Existing block-based filters can still be read and used, and a "back
door" way to build them still exists, for testing and in case of trouble.

About the only way this removal would cause an issue for users is if
temporary memory for filter construction greatly increases. In
HISTORY.md we suggest a few possible mitigations: partitioned filters,
smaller SST files, or setting reserve_table_builder_memory=true.

Or users who have customized a FilterPolicy using the
CreateFilter/KeyMayMatch mechanism removed in #9501 will have to upgrade
their code. (It's long past time for people to move to the new
builder/reader customization interface.)

This change also introduces some internal-use-only configuration strings
for testing specific filter implementations while bypassing some
compatibility / intelligence logic. This is intended to hint at a path
toward making FilterPolicy Customizable, but it also gives us a "back
door" way to configure block-based filter.

Aside: updated db_bench so that -readonly implies -use_existing_db

Test Plan: Unit tests updated. Specifically,

  • BlockBasedTableTest.BlockReadCountTest is tweaked to validate the back
    door configuration interface and ignoring of use_block_based_builder.
  • BlockBasedTableTest.TracingGetTest is migrated from testing
    block-based filter access pattern to full filter access patter, by
    re-ordering some things.
  • Options test (pretty self-explanatory)

Performance test - create with ./db_bench -db=/dev/shm/rocksdb1 -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=fillrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 with and without -use_block_based_filter, which creates a DB with 21 SST files in L0. Read with ./db_bench -db=/dev/shm/rocksdb1 -readonly -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=readrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -duration=30

Without -use_block_based_filter: readrandom 464 ops/sec, 689280 KB DB
With -use_block_based_filter: readrandom 169 ops/sec, 690996 KB DB
No consistent difference with fillrandom

Summary: This change removes the ability to configure the deprecated,
inefficient block-based filter in the public API. Options that would
have enabled it now use "full" (and optionally partitioned) filters.
Existing block-based filters can still be read and used, and a "back
door" way to build them still exists, for testing and in case of trouble.

About the only way this removal would cause an issue for users is if
temporary memory for filter construction greatly increases. In
HISTORY.md we suggest a few possible mitigations: partitioned filters,
smaller SST files, or setting reserve_table_builder_memory=true.

Or users who have customized a FilterPolicy using the
CreateFilter/KeyMayMatch mechanism removed in facebook#9501 will have to upgrade
their code. (It's long past time for people to move to the new
builder/reader customization interface.)

This change also introduces some internal-use-only configuration strings
for testing specific filter implementations while bypassing some
compatibility / intelligence logic. This is intended to hint at a path
toward making FilterPolicy Customizable, but it also gives us a "back
door" way to configure block-based filter.

Test Plan: Unit tests updated. Specifically,

* BlockBasedTableTest.BlockReadCountTest is tweaked to validate the back
door configuration interface and ignoring of `use_block_based_builder`.
* BlockBasedTableTest.TracingGetTest is migrated from testing
block-based filter access pattern to full filter access patter, by
re-ordering some things.
* Options test (pretty self-explanatory)
Comment on lines 240 to +241
extern const FilterPolicy* NewBloomFilterPolicy(
double bits_per_key, bool use_block_based_builder = false);
double bits_per_key, bool IGNORED_use_block_based_builder = false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you create a separate signature that does not have the ignored flag and have this signature just call the other one? That will allow this API to be deprecate in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a material difference in source code compatibility, and we make almost no promises for binary compatibility, so I don't think it's necessary. I hope we can deprecate this whole function before long because it's ridiculous that this returns a raw pointer, especially because we track FilterPolicy using shared_ptrs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the function should be deprecated but we need to come up with the alternative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work that doesn't need to be in a major release.

options/options_test.cc Show resolved Hide resolved
// use_block_based_builder=true now ignored in public API (same as false)
ASSERT_OK(GetBlockBasedTableOptionsFromString(
config_options, table_opt, "filter_policy=bloomfilter:4:true", &new_opt));
bfp = dynamic_cast<const BloomFilterPolicy*>(new_opt.filter_policy.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an assert to validate the cast worked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the case is invalid, there will be a null deref, which for us reliably crashes in a discernible way. We always compile unit tests with RTTI.

@@ -4217,12 +4217,22 @@ class Benchmark {
table_options->filter_policy = BlockBasedTableOptions().filter_policy;
} else if (FLAGS_bloom_bits == 0) {
table_options->filter_policy.reset();
} else if (FLAGS_use_block_based_filter) {
// Use back-door way of enabling obsolete block-based Bloom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also print a warning about using a deprecated flag/option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to be so cautious with RocksDB developers. The connection to public API is loose, and digging into the connection would reveal the current status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we go about deprecating command line arguments? Or do we expect them to "live forever" and just not do anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a developer tool like this, I recommend simply removing options that do nothing. For now, the command line option still works, e.g. for the performance test in this PR. In some months, we might be able to remove the "back door" option and the db_bench command line option. In some more months, we can arguably remove read support for block-based filter, as it's not critical data, just a performance booster.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jay-zhuang jay-zhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.👍👍
I hope we could also update all the tests that using deprecated NewBloomFilterPolicy(, true/false).

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Feb 15, 2022
Summary: Obsolete block-based filter no longer in public API, from facebook#9535

Test Plan: existing tests
facebook-github-bot pushed a commit that referenced this pull request Feb 15, 2022
Summary:
Obsolete block-based filter no longer in public API, from #9535

Pull Request resolved: #9569

Test Plan: existing tests

Reviewed By: jay-zhuang

Differential Revision: D34243579

Pulled By: pdillinger

fbshipit-source-id: ec5127d9bb9cc3f70501c531829a735bffdd1418
facebook-github-bot pushed a commit that referenced this pull request Jun 16, 2022
Summary:
In #9535, release 7.0, we hid the old block-based filter from being created using
the public API, because of its inefficiency. Although we normally maintain read compatibility
on old DBs forever, filters are not required for reading a DB, only for optimizing read
performance. Thus, it should be acceptable to remove this code and the substantial
maintenance burden it carries as useful features are developed and validated (such
as user timestamp).

This change completely removes the code for reading and writing the old block-based
filters, net removing about 1370 lines of code no longer needed. Options removed from
testing / benchmarking tools. The prior existence is only evident in a couple of places:
* `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in
a major release to minimize source code incompatibilities.
* A warning is logged when an old table file is opened that used the old block-based
filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing
should suffice. Unfortunately, sst_dump does not tell you whether a file uses
block-based filter, and the structure of the code makes it very difficult to fix.
* To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`)
for metaindex is maintained (for now).

Other notes:
* In some cases where numbers are associated with filter configurations, we have had to
update the assigned numbers so that they all correspond to something that exists.
* Fixed potential stat counting bug by assuming `filter_checked = false` for cases
like `filter == nullptr` rather than assuming `filter_checked = true`
* Removed obsolete `block_offset` and `prefix_extractor` parameters from several
functions.
* Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)`
because the caller guarantees the prefix extractor exists and is compatible

Pull Request resolved: #10184

Test Plan:
tests updated, manually test new warning in LOG using base version to
generate a DB

Reviewed By: riversand963

Differential Revision: D37212647

Pulled By: pdillinger

fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80
gitbw95 pushed a commit to gitbw95/rocksdb that referenced this pull request Jun 28, 2022
Summary:
In facebook#9535, release 7.0, we hid the old block-based filter from being created using
the public API, because of its inefficiency. Although we normally maintain read compatibility
on old DBs forever, filters are not required for reading a DB, only for optimizing read
performance. Thus, it should be acceptable to remove this code and the substantial
maintenance burden it carries as useful features are developed and validated (such
as user timestamp).

This change completely removes the code for reading and writing the old block-based
filters, net removing about 1370 lines of code no longer needed. Options removed from
testing / benchmarking tools. The prior existence is only evident in a couple of places:
* `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in
a major release to minimize source code incompatibilities.
* A warning is logged when an old table file is opened that used the old block-based
filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing
should suffice. Unfortunately, sst_dump does not tell you whether a file uses
block-based filter, and the structure of the code makes it very difficult to fix.
* To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`)
for metaindex is maintained (for now).

Other notes:
* In some cases where numbers are associated with filter configurations, we have had to
update the assigned numbers so that they all correspond to something that exists.
* Fixed potential stat counting bug by assuming `filter_checked = false` for cases
like `filter == nullptr` rather than assuming `filter_checked = true`
* Removed obsolete `block_offset` and `prefix_extractor` parameters from several
functions.
* Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)`
because the caller guarantees the prefix extractor exists and is compatible

Pull Request resolved: facebook#10184

Test Plan:
tests updated, manually test new warning in LOG using base version to
generate a DB

Reviewed By: riversand963

Differential Revision: D37212647

Pulled By: pdillinger

fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants