Hide deprecated, inefficient block-based filter from public API #9535

pdillinger · 2022-02-09T20:59:18Z

Summary: This change removes the ability to configure the deprecated,
inefficient block-based filter in the public API. Options that would
have enabled it now use "full" (and optionally partitioned) filters.
Existing block-based filters can still be read and used, and a "back
door" way to build them still exists, for testing and in case of trouble.

About the only way this removal would cause an issue for users is if
temporary memory for filter construction greatly increases. In
HISTORY.md we suggest a few possible mitigations: partitioned filters,
smaller SST files, or setting reserve_table_builder_memory=true.

Or users who have customized a FilterPolicy using the
CreateFilter/KeyMayMatch mechanism removed in #9501 will have to upgrade
their code. (It's long past time for people to move to the new
builder/reader customization interface.)

This change also introduces some internal-use-only configuration strings
for testing specific filter implementations while bypassing some
compatibility / intelligence logic. This is intended to hint at a path
toward making FilterPolicy Customizable, but it also gives us a "back
door" way to configure block-based filter.

Aside: updated db_bench so that -readonly implies -use_existing_db

Test Plan: Unit tests updated. Specifically,

BlockBasedTableTest.BlockReadCountTest is tweaked to validate the back
door configuration interface and ignoring of use_block_based_builder.
BlockBasedTableTest.TracingGetTest is migrated from testing
block-based filter access pattern to full filter access patter, by
re-ordering some things.
Options test (pretty self-explanatory)

Performance test - create with ./db_bench -db=/dev/shm/rocksdb1 -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=fillrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 with and without -use_block_based_filter, which creates a DB with 21 SST files in L0. Read with ./db_bench -db=/dev/shm/rocksdb1 -readonly -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=readrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -duration=30

Without -use_block_based_filter: readrandom 464 ops/sec, 689280 KB DB
With -use_block_based_filter: readrandom 169 ops/sec, 690996 KB DB
No consistent difference with fillrandom

Summary: This change removes the ability to configure the deprecated, inefficient block-based filter in the public API. Options that would have enabled it now use "full" (and optionally partitioned) filters. Existing block-based filters can still be read and used, and a "back door" way to build them still exists, for testing and in case of trouble. About the only way this removal would cause an issue for users is if temporary memory for filter construction greatly increases. In HISTORY.md we suggest a few possible mitigations: partitioned filters, smaller SST files, or setting reserve_table_builder_memory=true. Or users who have customized a FilterPolicy using the CreateFilter/KeyMayMatch mechanism removed in facebook#9501 will have to upgrade their code. (It's long past time for people to move to the new builder/reader customization interface.) This change also introduces some internal-use-only configuration strings for testing specific filter implementations while bypassing some compatibility / intelligence logic. This is intended to hint at a path toward making FilterPolicy Customizable, but it also gives us a "back door" way to configure block-based filter. Test Plan: Unit tests updated. Specifically, * BlockBasedTableTest.BlockReadCountTest is tweaked to validate the back door configuration interface and ignoring of `use_block_based_builder`. * BlockBasedTableTest.TracingGetTest is migrated from testing block-based filter access pattern to full filter access patter, by re-ordering some things. * Options test (pretty self-explanatory)

mrambacher · 2022-02-10T13:32:47Z

include/rocksdb/filter_policy.h

 extern const FilterPolicy* NewBloomFilterPolicy(
-    double bits_per_key, bool use_block_based_builder = false);
+    double bits_per_key, bool IGNORED_use_block_based_builder = false);


Should you create a separate signature that does not have the ignored flag and have this signature just call the other one? That will allow this API to be deprecate in the future.

There's not a material difference in source code compatibility, and we make almost no promises for binary compatibility, so I don't think it's necessary. I hope we can deprecate this whole function before long because it's ridiculous that this returns a raw pointer, especially because we track FilterPolicy using shared_ptrs.

I agree the function should be deprecated but we need to come up with the alternative.

Future work that doesn't need to be in a major release.

options/options_test.cc

mrambacher · 2022-02-10T13:37:20Z

options/options_test.cc

+  // use_block_based_builder=true now ignored in public API (same as false)
+  ASSERT_OK(GetBlockBasedTableOptionsFromString(
+      config_options, table_opt, "filter_policy=bloomfilter:4:true", &new_opt));
+  bfp = dynamic_cast<const BloomFilterPolicy*>(new_opt.filter_policy.get());


Should there be an assert to validate the cast worked?

If the case is invalid, there will be a null deref, which for us reliably crashes in a discernible way. We always compile unit tests with RTTI.

mrambacher · 2022-02-10T13:43:44Z

tools/db_bench_tool.cc

@@ -4217,12 +4217,22 @@ class Benchmark {
        table_options->filter_policy = BlockBasedTableOptions().filter_policy;
      } else if (FLAGS_bloom_bits == 0) {
        table_options->filter_policy.reset();
+      } else if (FLAGS_use_block_based_filter) {
+        // Use back-door way of enabling obsolete block-based Bloom


Should this also print a warning about using a deprecated flag/option?

I don't think we need to be so cautious with RocksDB developers. The connection to public API is loose, and digging into the connection would reveal the current status.

How would we go about deprecating command line arguments? Or do we expect them to "live forever" and just not do anything?

For a developer tool like this, I recommend simply removing options that do nothing. For now, the command line option still works, e.g. for the performance test in this PR. In some months, we might be able to remove the "back door" option and the db_bench command line option. In some more months, we can arguably remove read support for block-based filter, as it's not critical data, just a performance booster.

facebook-github-bot · 2022-02-10T21:39:31Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mrambacher

LGTM

jay-zhuang

LGTM.👍👍
I hope we could also update all the tests that using deprecated NewBloomFilterPolicy(, true/false).

Summary: Obsolete block-based filter no longer in public API, from facebook#9535 Test Plan: existing tests

Summary: Obsolete block-based filter no longer in public API, from #9535 Pull Request resolved: #9569 Test Plan: existing tests Reviewed By: jay-zhuang Differential Revision: D34243579 Pulled By: pdillinger fbshipit-source-id: ec5127d9bb9cc3f70501c531829a735bffdd1418

Summary: In #9535, release 7.0, we hid the old block-based filter from being created using the public API, because of its inefficiency. Although we normally maintain read compatibility on old DBs forever, filters are not required for reading a DB, only for optimizing read performance. Thus, it should be acceptable to remove this code and the substantial maintenance burden it carries as useful features are developed and validated (such as user timestamp). This change completely removes the code for reading and writing the old block-based filters, net removing about 1370 lines of code no longer needed. Options removed from testing / benchmarking tools. The prior existence is only evident in a couple of places: * `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in a major release to minimize source code incompatibilities. * A warning is logged when an old table file is opened that used the old block-based filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing should suffice. Unfortunately, sst_dump does not tell you whether a file uses block-based filter, and the structure of the code makes it very difficult to fix. * To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`) for metaindex is maintained (for now). Other notes: * In some cases where numbers are associated with filter configurations, we have had to update the assigned numbers so that they all correspond to something that exists. * Fixed potential stat counting bug by assuming `filter_checked = false` for cases like `filter == nullptr` rather than assuming `filter_checked = true` * Removed obsolete `block_offset` and `prefix_extractor` parameters from several functions. * Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)` because the caller guarantees the prefix extractor exists and is compatible Pull Request resolved: #10184 Test Plan: tests updated, manually test new warning in LOG using base version to generate a DB Reviewed By: riversand963 Differential Revision: D37212647 Pulled By: pdillinger fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80

Summary: In facebook#9535, release 7.0, we hid the old block-based filter from being created using the public API, because of its inefficiency. Although we normally maintain read compatibility on old DBs forever, filters are not required for reading a DB, only for optimizing read performance. Thus, it should be acceptable to remove this code and the substantial maintenance burden it carries as useful features are developed and validated (such as user timestamp). This change completely removes the code for reading and writing the old block-based filters, net removing about 1370 lines of code no longer needed. Options removed from testing / benchmarking tools. The prior existence is only evident in a couple of places: * `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in a major release to minimize source code incompatibilities. * A warning is logged when an old table file is opened that used the old block-based filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing should suffice. Unfortunately, sst_dump does not tell you whether a file uses block-based filter, and the structure of the code makes it very difficult to fix. * To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`) for metaindex is maintained (for now). Other notes: * In some cases where numbers are associated with filter configurations, we have had to update the assigned numbers so that they all correspond to something that exists. * Fixed potential stat counting bug by assuming `filter_checked = false` for cases like `filter == nullptr` rather than assuming `filter_checked = true` * Removed obsolete `block_offset` and `prefix_extractor` parameters from several functions. * Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)` because the caller guarantees the prefix extractor exists and is compatible Pull Request resolved: facebook#10184 Test Plan: tests updated, manually test new warning in LOG using base version to generate a DB Reviewed By: riversand963 Differential Revision: D37212647 Pulled By: pdillinger fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80

pdillinger requested a review from jay-zhuang February 9, 2022 20:59

facebook-github-bot added the CLA Signed label Feb 9, 2022

pdillinger added 2 commits February 9, 2022 14:30

Fix LITE, db_bench aside

8b324da

Fix^2 LITE

ec51196

mrambacher reviewed Feb 10, 2022

View reviewed changes

mrambacher approved these changes Feb 11, 2022

View reviewed changes

jay-zhuang approved these changes Feb 12, 2022

View reviewed changes

facebook-github-bot closed this in 479eb1a Feb 12, 2022

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Feb 15, 2022

Update Java API for FilterPolicy changes

5f8ba15

Summary: Obsolete block-based filter no longer in public API, from facebook#9535 Test Plan: existing tests

This was referenced Feb 15, 2022

Update Java API for FilterPolicy changes #9569

Closed

Refactor FilterPolicies toward Customizable #9567

Closed

pdillinger mentioned this pull request Jun 16, 2022

Remove deprecated block-based filter #10184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hide deprecated, inefficient block-based filter from public API #9535

Hide deprecated, inefficient block-based filter from public API #9535

pdillinger commented Feb 9, 2022 •

edited

Loading

mrambacher Feb 10, 2022

pdillinger Feb 10, 2022

mrambacher Feb 10, 2022

pdillinger Feb 11, 2022

mrambacher Feb 10, 2022

pdillinger Feb 10, 2022

mrambacher Feb 10, 2022

pdillinger Feb 10, 2022

mrambacher Feb 10, 2022

pdillinger Feb 11, 2022

facebook-github-bot commented Feb 10, 2022

mrambacher left a comment

jay-zhuang left a comment

Hide deprecated, inefficient block-based filter from public API #9535

Hide deprecated, inefficient block-based filter from public API #9535

Conversation

pdillinger commented Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 10, 2022

mrambacher left a comment

Choose a reason for hiding this comment

jay-zhuang left a comment

Choose a reason for hiding this comment

pdillinger commented Feb 9, 2022 •

edited

Loading