Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skipping filters in custom FilterPolicy, near-zero bpk #8250

Closed
wants to merge 2 commits into from

Conversation

pdillinger
Copy link
Contributor

Summary: I am in the process of building the next generation of APIs for
configuring filters. To get there, we need to increase the power of
custom filter policies, in the information they have available (#8246)
and in the actions they can take in generating filters (this change).

Here we give custom FilterPolicy the power to specify that generating
filters should be skipped in some contexts. Previously, a non-null
FilterPolicy would have to build some filter in all cases (with
FilterBitsBuilder), except when optimize_filters_for_hits applies, which
is out of control of the FilterPolicy. With this change, the
FilterPolicy has the same power to disable filters completely in some
contexts as optimize_filters_for_hits does. With #8246 and this change,
a FilterPolicy can implement the same behavior as
optimize_filters_for_hits, obviously with more freedom to tweak the
behavior.

The new API support for this is a bit ugly for API compatibility reasons
(see comment on FilterBitsBuilder::ShouldSkipFilter) but it should work
fine in practice.

Here we also change the behavior of passing < 0.5 bits_per_key to
NewBloomFilterPolicy (or NewExperimentalRibbonFilterPolicy), to skip
generating a filter instead of rounding up to 1 bit per key. The
motivation doesn't really have to do with the current API but imagines a
future API in which some math determines a "best" bits_per_key
for each level, perhaps based on a min and max range. In such a
scenario, to avoid filter lookup overhead (e.g. block cache CPU
overhead) in cases where it would be unproductive, we round bpk < 0.5
down to 0 (and SKIP generating a filter) and round 0.5 <= bpk < 1.0 up
to 1.0. (We can tweak these details later.) To make the current (soon to
be "old") API a simple instantiation of the new API, which simplifies
testing considerably, we preliminarily apply this same strategy to the
current API. Thus the new API will be a pure generalization.

Also included:

I noticed that the FilterBitsBuilder API docs do not mention re-use but
re-use is required for building partitioned filters. API docs updated to
call out that requirement.

Test Plan: For the behavior change on < 0.5 bits per key,
FullBloomTest::FilterSize is updated. New test
DBBloomFilterTestWithParam::SkipFilterOnEssentiallyZeroBpk uses this
setting and custom API to ensure that filter construction and query is
actually skipped, vs. using a trivial filter (which is tested for
comparison).

Summary: I am in the process of building the next generation of APIs for
configuring filters. To get there, we need to increase the power of
custom filter policies, in the information they have available (facebook#8246)
and in the actions they can take in generating filters (this change).

Here we give custom FilterPolicy the power to specify that generating
filters should be skipped in some contexts. Previously, a non-null
FilterPolicy would have to build some filter in all cases (with
FilterBitsBuilder), except when optimize_filters_for_hits applies, which
is out of control of the FilterPolicy. With this change, the
FilterPolicy has the same power to disable filters completely in some
contexts as optimize_filters_for_hits does. With facebook#8246 and this change,
a FilterPolicy can implement the same behavior as
optimize_filters_for_hits, obviously with more freedom to tweak the
behavior.

The new API support for this is a bit ugly for API compatibility reasons
(see comment on FilterBitsBuilder::ShouldSkipFilter) but it should work
fine in practice.

Here we also change the behavior of passing < 0.5 bits_per_key to
NewBloomFilterPolicy (or NewExperimentalRibbonFilterPolicy), to skip
generating a filter instead of rounding up to 1 bit per key. The
motivation doesn't really have to do with the current API but imagines a
future API in which some mathematics determines a "best" bits_per_key
for each level, perhaps based on a min and max range. In such a
scenario, to avoid filter lookup overhead (e.g. block cache CPU
overhead) in cases where it would be unproductive, we round bpk < 0.5
down to 0 (and SKIP generating a filter) and round 0.5 <= bpk < 1.0 up
to 1.0. (We can tweak these details later.) To make the current (soon to
be "old") API a simple instantiation of the new API, which simplifies
testing considerably, we preliminarily apply this same strategy to the
current API. Thus the new API will be a pure generalization.

Also included:

I noticed that the FilterBitsBuilder API docs do not mention re-use but
re-use is required for building partitioned filters. API docs updated to
call out that requirement.

Test Plan: For the behavior change on < 0.5 bits per key,
FullBloomTest::FilterSize is updated. Added test
DBBloomFilterTestWithParam::SkipFilterOnEssentiallyZeroBpk uses this
setting and custom API to ensure that filter construction and query is
actually skipped, vs. using a trivial filter (which is tested for
comparison).
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pdillinger
Copy link
Contributor Author

With #8246 and this change, a FilterPolicy can implement the same behavior as
optimize_filters_for_hits, obviously with more freedom to tweak the behavior.

It is not so simple. optimize_filters_for_hits also operates on the table read side, which this cannot duplicate unless we start consulting FilterPolicy for whether to skip filters. That might be a good idea, except that logic is checked every time Get queries a table file. More customizability likely means more performance hit, unless we can make these determinations on a less critical path.

DestroyAndReopen(options);
PutAndGetFn();

// Verify no filter access nor contruction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Construction".

How is this validating not constructed?

Comment on lines +579 to +580
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0);
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for the same test 2x?

Comment on lines +596 to +597
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0);
EXPECT_EQ(TestGetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE), 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same test twice?

use_delta_encoding_for_index_values, p_index_builder, partition_size);
} else {
return new FullFilterBlockBuilder(mopt.prefix_extractor.get(),
table_opt.whole_key_filtering,
filter_bits_builder);
filter_bits_builder.release());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than the released pointer, maybe have a std::unique_ptr<>&& passed in and use std::move? That is what is done in most places in the code.

@jay-zhuang
Copy link
Contributor

Personally I find rounding <0.5 to 0 is confusing, why not just have the user to set it 0 to explicitly skip filter?

Comment on lines +45 to +53
size_t CalculateSpace(size_t) override { return 6U; }
double EstimatedFpRate(size_t, size_t) override { return 1.0; }
size_t ApproximateNumEntries(size_t) override { return SIZE_MAX; }
void AddKey(const Slice&) override {}
Slice Finish(std::unique_ptr<const char[]>* buf) override {
// Interpreted as "always true" filter (0 probes over 1 byte of
// payload, 5 bytes metadata)
buf->reset(new char[6]{});
return Slice(buf->get(), 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: define size 6.

};

// See FilterBitsBuilder::ShouldSkipFilter
class SkipFilterBitsBuilder : public AlwaysTrueBitsBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should SkipFilterBitsBuilder based on AlwaysTrueBitsBuilder or it could inherit from BuiltinFilterBitsBuilder directly?

pdillinger added a commit to pdillinger/rocksdb that referenced this pull request Feb 4, 2022
Summary:
* Inefficient block-based filter is no longer customizable in the public
API, though (for now) can still be enabled.
  * Removed deprecated FilterPolicy::CreateFilter() and
  FilterPolicy::KeyMayMatch()
  * Removed `rocksdb_filterpolicy_create()` from C API
* Change meaning of nullptr return from GetBuilderWithContext() from "use
block-based filter" to "generate no filter in this case." This is a
cleaner solution to the proposal in facebook#8250.
  * Also, when user specifies bits_per_key < 0.5, we now round this down
  to "no filter" because we expect a filter with >= 80% FP rate is
  unlikely to be worth the CPU cost of accessing it (esp with
  cache_index_and_filter_blocks=1 or partition_filters=1).
  * bits_per_key >= 0.5 and < 1.0 is still rounded up to 1.0 (for 62% FP
  rate)
  * This also gives us some support for configuring filters from OPTIONS
  file as currently saved: `filter_policy=rocksdb.BuiltinBloomFilter`.
  Opening from such an options file will enable reading filters (an
  improvement) but not writing new ones. (See Customizable follow-up
  below.)
* Also removed deprecated functions
  * FilterBitsBuilder::CalculateNumEntry()
  * FilterPolicy::GetFilterBitsBuilder()
  * NewExperimentalRibbonFilterPolicy()
* Remove default implementations of
  * FilterBitsBuilder::EstimateEntriesAdded()
  * FilterBitsBuilder::ApproximateNumEntries()
  * FilterPolicy::GetBuilderWithContext()
* Remove support for "filter_policy=experimental_ribbon" configuration
string.
* Allow "filter_policy=bloomfilter:n" without bool to discourage use of
block-based filter.

Likely follow-up (later PRs):
* Refactoring toward FilterPolicy Customizable, so that we can generate
filters with same configuration as before when configuring from options
file.
* Remove support for user enabling block-based filter (ignore `bool
use_block_based_builder`)
  * Some months after this change, we could even remove read support for
  block-based filter, because it is not critical to DB data
  preservation.
* Make FilterBitsBuilder::FinishV2 to avoid `using
FilterBitsBuilder::Finish` mess and add support for specifying a
MemoryAllocator (for cache warming)

Test Plan: A number of obsolete tests deleted and new tests or test
cases added or updated.
facebook-github-bot pushed a commit that referenced this pull request Feb 8, 2022
Summary:
* Inefficient block-based filter is no longer customizable in the public
API, though (for now) can still be enabled.
  * Removed deprecated FilterPolicy::CreateFilter() and
  FilterPolicy::KeyMayMatch()
  * Removed `rocksdb_filterpolicy_create()` from C API
* Change meaning of nullptr return from GetBuilderWithContext() from "use
block-based filter" to "generate no filter in this case." This is a
cleaner solution to the proposal in #8250.
  * Also, when user specifies bits_per_key < 0.5, we now round this down
  to "no filter" because we expect a filter with >= 80% FP rate is
  unlikely to be worth the CPU cost of accessing it (esp with
  cache_index_and_filter_blocks=1 or partition_filters=1).
  * bits_per_key >= 0.5 and < 1.0 is still rounded up to 1.0 (for 62% FP
  rate)
  * This also gives us some support for configuring filters from OPTIONS
  file as currently saved: `filter_policy=rocksdb.BuiltinBloomFilter`.
  Opening from such an options file will enable reading filters (an
  improvement) but not writing new ones. (See Customizable follow-up
  below.)
* Also removed deprecated functions
  * FilterBitsBuilder::CalculateNumEntry()
  * FilterPolicy::GetFilterBitsBuilder()
  * NewExperimentalRibbonFilterPolicy()
* Remove default implementations of
  * FilterBitsBuilder::EstimateEntriesAdded()
  * FilterBitsBuilder::ApproximateNumEntries()
  * FilterPolicy::GetBuilderWithContext()
* Remove support for "filter_policy=experimental_ribbon" configuration
string.
* Allow "filter_policy=bloomfilter:n" without bool to discourage use of
block-based filter.

Some pieces for #9389

Likely follow-up (later PRs):
* Refactoring toward FilterPolicy Customizable, so that we can generate
filters with same configuration as before when configuring from options
file.
* Remove support for user enabling block-based filter (ignore `bool
use_block_based_builder`)
  * Some months after this change, we could even remove read support for
  block-based filter, because it is not critical to DB data
  preservation.
* Make FilterBitsBuilder::FinishV2 to avoid `using
FilterBitsBuilder::Finish` mess and add support for specifying a
MemoryAllocator (for cache warming)

Pull Request resolved: #9501

Test Plan:
A number of obsolete tests deleted and new tests or test
cases added or updated.

Reviewed By: hx235

Differential Revision: D34008011

Pulled By: pdillinger

fbshipit-source-id: a39a720457c354e00d5b59166b686f7f59e392aa
@pdillinger
Copy link
Contributor Author

Obsolete with 9501

@pdillinger pdillinger closed this Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants