-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support skipping filters in custom FilterPolicy, near-zero bpk #8250
Conversation
Summary: I am in the process of building the next generation of APIs for configuring filters. To get there, we need to increase the power of custom filter policies, in the information they have available (facebook#8246) and in the actions they can take in generating filters (this change). Here we give custom FilterPolicy the power to specify that generating filters should be skipped in some contexts. Previously, a non-null FilterPolicy would have to build some filter in all cases (with FilterBitsBuilder), except when optimize_filters_for_hits applies, which is out of control of the FilterPolicy. With this change, the FilterPolicy has the same power to disable filters completely in some contexts as optimize_filters_for_hits does. With facebook#8246 and this change, a FilterPolicy can implement the same behavior as optimize_filters_for_hits, obviously with more freedom to tweak the behavior. The new API support for this is a bit ugly for API compatibility reasons (see comment on FilterBitsBuilder::ShouldSkipFilter) but it should work fine in practice. Here we also change the behavior of passing < 0.5 bits_per_key to NewBloomFilterPolicy (or NewExperimentalRibbonFilterPolicy), to skip generating a filter instead of rounding up to 1 bit per key. The motivation doesn't really have to do with the current API but imagines a future API in which some mathematics determines a "best" bits_per_key for each level, perhaps based on a min and max range. In such a scenario, to avoid filter lookup overhead (e.g. block cache CPU overhead) in cases where it would be unproductive, we round bpk < 0.5 down to 0 (and SKIP generating a filter) and round 0.5 <= bpk < 1.0 up to 1.0. (We can tweak these details later.) To make the current (soon to be "old") API a simple instantiation of the new API, which simplifies testing considerably, we preliminarily apply this same strategy to the current API. Thus the new API will be a pure generalization. Also included: I noticed that the FilterBitsBuilder API docs do not mention re-use but re-use is required for building partitioned filters. API docs updated to call out that requirement. Test Plan: For the behavior change on < 0.5 bits per key, FullBloomTest::FilterSize is updated. Added test DBBloomFilterTestWithParam::SkipFilterOnEssentiallyZeroBpk uses this setting and custom API to ensure that filter construction and query is actually skipped, vs. using a trivial filter (which is tested for comparison).
@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
It is not so simple. |
DestroyAndReopen(options); | ||
PutAndGetFn(); | ||
|
||
// Verify no filter access nor contruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Construction".
How is this validating not constructed?
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0); | ||
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for the same test 2x?
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0); | ||
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same test twice?
use_delta_encoding_for_index_values, p_index_builder, partition_size); | ||
} else { | ||
return new FullFilterBlockBuilder(mopt.prefix_extractor.get(), | ||
table_opt.whole_key_filtering, | ||
filter_bits_builder); | ||
filter_bits_builder.release()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than the released pointer, maybe have a std::unique_ptr<>&& passed in and use std::move? That is what is done in most places in the code.
Personally I find rounding |
size_t CalculateSpace(size_t) override { return 6U; } | ||
double EstimatedFpRate(size_t, size_t) override { return 1.0; } | ||
size_t ApproximateNumEntries(size_t) override { return SIZE_MAX; } | ||
void AddKey(const Slice&) override {} | ||
Slice Finish(std::unique_ptr<const char[]>* buf) override { | ||
// Interpreted as "always true" filter (0 probes over 1 byte of | ||
// payload, 5 bytes metadata) | ||
buf->reset(new char[6]{}); | ||
return Slice(buf->get(), 6); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: define size 6.
}; | ||
|
||
// See FilterBitsBuilder::ShouldSkipFilter | ||
class SkipFilterBitsBuilder : public AlwaysTrueBitsBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should SkipFilterBitsBuilder
based on AlwaysTrueBitsBuilder
or it could inherit from BuiltinFilterBitsBuilder
directly?
Summary: * Inefficient block-based filter is no longer customizable in the public API, though (for now) can still be enabled. * Removed deprecated FilterPolicy::CreateFilter() and FilterPolicy::KeyMayMatch() * Removed `rocksdb_filterpolicy_create()` from C API * Change meaning of nullptr return from GetBuilderWithContext() from "use block-based filter" to "generate no filter in this case." This is a cleaner solution to the proposal in facebook#8250. * Also, when user specifies bits_per_key < 0.5, we now round this down to "no filter" because we expect a filter with >= 80% FP rate is unlikely to be worth the CPU cost of accessing it (esp with cache_index_and_filter_blocks=1 or partition_filters=1). * bits_per_key >= 0.5 and < 1.0 is still rounded up to 1.0 (for 62% FP rate) * This also gives us some support for configuring filters from OPTIONS file as currently saved: `filter_policy=rocksdb.BuiltinBloomFilter`. Opening from such an options file will enable reading filters (an improvement) but not writing new ones. (See Customizable follow-up below.) * Also removed deprecated functions * FilterBitsBuilder::CalculateNumEntry() * FilterPolicy::GetFilterBitsBuilder() * NewExperimentalRibbonFilterPolicy() * Remove default implementations of * FilterBitsBuilder::EstimateEntriesAdded() * FilterBitsBuilder::ApproximateNumEntries() * FilterPolicy::GetBuilderWithContext() * Remove support for "filter_policy=experimental_ribbon" configuration string. * Allow "filter_policy=bloomfilter:n" without bool to discourage use of block-based filter. Likely follow-up (later PRs): * Refactoring toward FilterPolicy Customizable, so that we can generate filters with same configuration as before when configuring from options file. * Remove support for user enabling block-based filter (ignore `bool use_block_based_builder`) * Some months after this change, we could even remove read support for block-based filter, because it is not critical to DB data preservation. * Make FilterBitsBuilder::FinishV2 to avoid `using FilterBitsBuilder::Finish` mess and add support for specifying a MemoryAllocator (for cache warming) Test Plan: A number of obsolete tests deleted and new tests or test cases added or updated.
Summary: * Inefficient block-based filter is no longer customizable in the public API, though (for now) can still be enabled. * Removed deprecated FilterPolicy::CreateFilter() and FilterPolicy::KeyMayMatch() * Removed `rocksdb_filterpolicy_create()` from C API * Change meaning of nullptr return from GetBuilderWithContext() from "use block-based filter" to "generate no filter in this case." This is a cleaner solution to the proposal in #8250. * Also, when user specifies bits_per_key < 0.5, we now round this down to "no filter" because we expect a filter with >= 80% FP rate is unlikely to be worth the CPU cost of accessing it (esp with cache_index_and_filter_blocks=1 or partition_filters=1). * bits_per_key >= 0.5 and < 1.0 is still rounded up to 1.0 (for 62% FP rate) * This also gives us some support for configuring filters from OPTIONS file as currently saved: `filter_policy=rocksdb.BuiltinBloomFilter`. Opening from such an options file will enable reading filters (an improvement) but not writing new ones. (See Customizable follow-up below.) * Also removed deprecated functions * FilterBitsBuilder::CalculateNumEntry() * FilterPolicy::GetFilterBitsBuilder() * NewExperimentalRibbonFilterPolicy() * Remove default implementations of * FilterBitsBuilder::EstimateEntriesAdded() * FilterBitsBuilder::ApproximateNumEntries() * FilterPolicy::GetBuilderWithContext() * Remove support for "filter_policy=experimental_ribbon" configuration string. * Allow "filter_policy=bloomfilter:n" without bool to discourage use of block-based filter. Some pieces for #9389 Likely follow-up (later PRs): * Refactoring toward FilterPolicy Customizable, so that we can generate filters with same configuration as before when configuring from options file. * Remove support for user enabling block-based filter (ignore `bool use_block_based_builder`) * Some months after this change, we could even remove read support for block-based filter, because it is not critical to DB data preservation. * Make FilterBitsBuilder::FinishV2 to avoid `using FilterBitsBuilder::Finish` mess and add support for specifying a MemoryAllocator (for cache warming) Pull Request resolved: #9501 Test Plan: A number of obsolete tests deleted and new tests or test cases added or updated. Reviewed By: hx235 Differential Revision: D34008011 Pulled By: pdillinger fbshipit-source-id: a39a720457c354e00d5b59166b686f7f59e392aa
Obsolete with 9501 |
Summary: I am in the process of building the next generation of APIs for
configuring filters. To get there, we need to increase the power of
custom filter policies, in the information they have available (#8246)
and in the actions they can take in generating filters (this change).
Here we give custom FilterPolicy the power to specify that generating
filters should be skipped in some contexts. Previously, a non-null
FilterPolicy would have to build some filter in all cases (with
FilterBitsBuilder), except when optimize_filters_for_hits applies, which
is out of control of the FilterPolicy. With this change, the
FilterPolicy has the same power to disable filters completely in some
contexts as optimize_filters_for_hits does. With #8246 and this change,
a FilterPolicy can implement the same behavior as
optimize_filters_for_hits, obviously with more freedom to tweak the
behavior.
The new API support for this is a bit ugly for API compatibility reasons
(see comment on FilterBitsBuilder::ShouldSkipFilter) but it should work
fine in practice.
Here we also change the behavior of passing < 0.5 bits_per_key to
NewBloomFilterPolicy (or NewExperimentalRibbonFilterPolicy), to skip
generating a filter instead of rounding up to 1 bit per key. The
motivation doesn't really have to do with the current API but imagines a
future API in which some math determines a "best" bits_per_key
for each level, perhaps based on a min and max range. In such a
scenario, to avoid filter lookup overhead (e.g. block cache CPU
overhead) in cases where it would be unproductive, we round bpk < 0.5
down to 0 (and SKIP generating a filter) and round 0.5 <= bpk < 1.0 up
to 1.0. (We can tweak these details later.) To make the current (soon to
be "old") API a simple instantiation of the new API, which simplifies
testing considerably, we preliminarily apply this same strategy to the
current API. Thus the new API will be a pure generalization.
Also included:
I noticed that the FilterBitsBuilder API docs do not mention re-use but
re-use is required for building partitioned filters. API docs updated to
call out that requirement.
Test Plan: For the behavior change on < 0.5 bits per key,
FullBloomTest::FilterSize is updated. New test
DBBloomFilterTestWithParam::SkipFilterOnEssentiallyZeroBpk uses this
setting and custom API to ensure that filter construction and query is
actually skipped, vs. using a trivial filter (which is tested for
comparison).