-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hide deprecated, inefficient block-based filter from public API #9535
Conversation
Summary: This change removes the ability to configure the deprecated, inefficient block-based filter in the public API. Options that would have enabled it now use "full" (and optionally partitioned) filters. Existing block-based filters can still be read and used, and a "back door" way to build them still exists, for testing and in case of trouble. About the only way this removal would cause an issue for users is if temporary memory for filter construction greatly increases. In HISTORY.md we suggest a few possible mitigations: partitioned filters, smaller SST files, or setting reserve_table_builder_memory=true. Or users who have customized a FilterPolicy using the CreateFilter/KeyMayMatch mechanism removed in facebook#9501 will have to upgrade their code. (It's long past time for people to move to the new builder/reader customization interface.) This change also introduces some internal-use-only configuration strings for testing specific filter implementations while bypassing some compatibility / intelligence logic. This is intended to hint at a path toward making FilterPolicy Customizable, but it also gives us a "back door" way to configure block-based filter. Test Plan: Unit tests updated. Specifically, * BlockBasedTableTest.BlockReadCountTest is tweaked to validate the back door configuration interface and ignoring of `use_block_based_builder`. * BlockBasedTableTest.TracingGetTest is migrated from testing block-based filter access pattern to full filter access patter, by re-ordering some things. * Options test (pretty self-explanatory)
extern const FilterPolicy* NewBloomFilterPolicy( | ||
double bits_per_key, bool use_block_based_builder = false); | ||
double bits_per_key, bool IGNORED_use_block_based_builder = false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you create a separate signature that does not have the ignored flag and have this signature just call the other one? That will allow this API to be deprecate in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not a material difference in source code compatibility, and we make almost no promises for binary compatibility, so I don't think it's necessary. I hope we can deprecate this whole function before long because it's ridiculous that this returns a raw pointer, especially because we track FilterPolicy using shared_ptrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the function should be deprecated but we need to come up with the alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future work that doesn't need to be in a major release.
// use_block_based_builder=true now ignored in public API (same as false) | ||
ASSERT_OK(GetBlockBasedTableOptionsFromString( | ||
config_options, table_opt, "filter_policy=bloomfilter:4:true", &new_opt)); | ||
bfp = dynamic_cast<const BloomFilterPolicy*>(new_opt.filter_policy.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an assert to validate the cast worked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the case is invalid, there will be a null deref, which for us reliably crashes in a discernible way. We always compile unit tests with RTTI.
@@ -4217,12 +4217,22 @@ class Benchmark { | |||
table_options->filter_policy = BlockBasedTableOptions().filter_policy; | |||
} else if (FLAGS_bloom_bits == 0) { | |||
table_options->filter_policy.reset(); | |||
} else if (FLAGS_use_block_based_filter) { | |||
// Use back-door way of enabling obsolete block-based Bloom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also print a warning about using a deprecated flag/option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to be so cautious with RocksDB developers. The connection to public API is loose, and digging into the connection would reveal the current status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we go about deprecating command line arguments? Or do we expect them to "live forever" and just not do anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a developer tool like this, I recommend simply removing options that do nothing. For now, the command line option still works, e.g. for the performance test in this PR. In some months, we might be able to remove the "back door" option and the db_bench command line option. In some more months, we can arguably remove read support for block-based filter, as it's not critical data, just a performance booster.
@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.👍👍
I hope we could also update all the tests that using deprecated NewBloomFilterPolicy(, true/false)
.
Summary: Obsolete block-based filter no longer in public API, from facebook#9535 Test Plan: existing tests
Summary: In #9535, release 7.0, we hid the old block-based filter from being created using the public API, because of its inefficiency. Although we normally maintain read compatibility on old DBs forever, filters are not required for reading a DB, only for optimizing read performance. Thus, it should be acceptable to remove this code and the substantial maintenance burden it carries as useful features are developed and validated (such as user timestamp). This change completely removes the code for reading and writing the old block-based filters, net removing about 1370 lines of code no longer needed. Options removed from testing / benchmarking tools. The prior existence is only evident in a couple of places: * `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in a major release to minimize source code incompatibilities. * A warning is logged when an old table file is opened that used the old block-based filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing should suffice. Unfortunately, sst_dump does not tell you whether a file uses block-based filter, and the structure of the code makes it very difficult to fix. * To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`) for metaindex is maintained (for now). Other notes: * In some cases where numbers are associated with filter configurations, we have had to update the assigned numbers so that they all correspond to something that exists. * Fixed potential stat counting bug by assuming `filter_checked = false` for cases like `filter == nullptr` rather than assuming `filter_checked = true` * Removed obsolete `block_offset` and `prefix_extractor` parameters from several functions. * Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)` because the caller guarantees the prefix extractor exists and is compatible Pull Request resolved: #10184 Test Plan: tests updated, manually test new warning in LOG using base version to generate a DB Reviewed By: riversand963 Differential Revision: D37212647 Pulled By: pdillinger fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80
Summary: In facebook#9535, release 7.0, we hid the old block-based filter from being created using the public API, because of its inefficiency. Although we normally maintain read compatibility on old DBs forever, filters are not required for reading a DB, only for optimizing read performance. Thus, it should be acceptable to remove this code and the substantial maintenance burden it carries as useful features are developed and validated (such as user timestamp). This change completely removes the code for reading and writing the old block-based filters, net removing about 1370 lines of code no longer needed. Options removed from testing / benchmarking tools. The prior existence is only evident in a couple of places: * `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in a major release to minimize source code incompatibilities. * A warning is logged when an old table file is opened that used the old block-based filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing should suffice. Unfortunately, sst_dump does not tell you whether a file uses block-based filter, and the structure of the code makes it very difficult to fix. * To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`) for metaindex is maintained (for now). Other notes: * In some cases where numbers are associated with filter configurations, we have had to update the assigned numbers so that they all correspond to something that exists. * Fixed potential stat counting bug by assuming `filter_checked = false` for cases like `filter == nullptr` rather than assuming `filter_checked = true` * Removed obsolete `block_offset` and `prefix_extractor` parameters from several functions. * Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)` because the caller guarantees the prefix extractor exists and is compatible Pull Request resolved: facebook#10184 Test Plan: tests updated, manually test new warning in LOG using base version to generate a DB Reviewed By: riversand963 Differential Revision: D37212647 Pulled By: pdillinger fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80
Summary: This change removes the ability to configure the deprecated,
inefficient block-based filter in the public API. Options that would
have enabled it now use "full" (and optionally partitioned) filters.
Existing block-based filters can still be read and used, and a "back
door" way to build them still exists, for testing and in case of trouble.
About the only way this removal would cause an issue for users is if
temporary memory for filter construction greatly increases. In
HISTORY.md we suggest a few possible mitigations: partitioned filters,
smaller SST files, or setting reserve_table_builder_memory=true.
Or users who have customized a FilterPolicy using the
CreateFilter/KeyMayMatch mechanism removed in #9501 will have to upgrade
their code. (It's long past time for people to move to the new
builder/reader customization interface.)
This change also introduces some internal-use-only configuration strings
for testing specific filter implementations while bypassing some
compatibility / intelligence logic. This is intended to hint at a path
toward making FilterPolicy Customizable, but it also gives us a "back
door" way to configure block-based filter.
Aside: updated db_bench so that -readonly implies -use_existing_db
Test Plan: Unit tests updated. Specifically,
door configuration interface and ignoring of
use_block_based_builder
.block-based filter access pattern to full filter access patter, by
re-ordering some things.
Performance test - create with
./db_bench -db=/dev/shm/rocksdb1 -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=fillrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0
with and without-use_block_based_filter
, which creates a DB with 21 SST files in L0. Read with./db_bench -db=/dev/shm/rocksdb1 -readonly -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=readrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -duration=30
Without -use_block_based_filter: readrandom 464 ops/sec, 689280 KB DB
With -use_block_based_filter: readrandom 169 ops/sec, 690996 KB DB
No consistent difference with fillrandom