[4/4][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit #9073

hx235 · 2021-10-24T05:16:48Z

Note: This PR is the 4th part of a bigger PR stack (#9073) and will rebase/merge only after the first three PRs (#9070, #9071, #9130) merge.

Context:
Similar to #8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction by charging dummy entry to block cache, moving toward the goal of single global memory limit using block cache capacity. It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under LRUCacheOptions::strict_capacity_limit=true.

The option to turn on this feature is BlockBasedTableOptions::reserve_table_builder_memory = true which by default is set to false. We decided not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.

Summary:

Reserved/released cache for creation/destruction of three main memory users using CacheReservationManager:
- hash entries (i.ehash_entries.size(), we bucket-charge hash entries during insertion for performance),
- banding (Ribbon Filter only, bytes_coeff_rows +bytes_result_rows + bytes_backtrack),
- final filter (i.e, mutable_buf's size).
  - Implementation details: in order to use CacheReservationManager::CacheReservationHandle to account final filter's memory, we have to store the CacheReservationManager object and CacheReservationHandle for final filter in XXPH3BitsFilterBuilder as well as explicitly delete the filter bits builder when done with the final filter in block based table.
Added option fo run filter_bench with this memory reservation feature

Test:

Added new tests in db_bloom_filter_test to verify filter construction peak cache reservation under combination of BlockBasedTable::Rep::FilterType (e.g, kFullFilter, kPartitionedFilter), BloomFilterPolicy::Mode(e.g, kFastLocalBloom, kStandard128Ribbon, kDeprecatedBlock) and BlockBasedTableOptions::reserve_table_builder_memory
- To address the concern for slow test: tests with memory reservation under kFullFilter + kStandard128Ribbon and kPartitionedFilter take around 3000 - 6000 ms and others take around 1500 - 2000 ms, in total adding 20000 - 25000 ms to the test suit running locally
Added new test in bloom_test to verify Ribbon Filter fallback on large banding in FullFilter
Added test in filter_bench to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over 20 run as below:
- FastLocalBloom
  - baseline ./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg':
    - Build avg ns/key: 29.56295 (DEBUG_LEVEL=1), 29.98153 (DEBUG_LEVEL=0)
  - new feature (expected to be similar as above)./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg':
    - Build avg ns/key: 30.99046 (DEBUG_LEVEL=1), 30.48867 (DEBUG_LEVEL=0)
  - new feature of RibbonFilter with fallback (expected to be similar as above) ./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg' :
    - Build avg ns/key: 31.146975 (DEBUG_LEVEL=1), 30.08165 (DEBUG_LEVEL=0)
- Ribbon128
  - baseline ./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg':
    - Build avg ns/key: 129.17585 (DEBUG_LEVEL=1), 130.5225 (DEBUG_LEVEL=0)
  - new feature (expected to be similar as above) ./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' :
    - Build avg ns/key: 131.61645 (DEBUG_LEVEL=1), 132.98075 (DEBUG_LEVEL=0)
  - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) ./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg' :
    - Build avg ns/key: 52.032965 (DEBUG_LEVEL=1), 52.597825 (DEBUG_LEVEL=0)
    - And the warning message of "Cache reservation for Ribbon filter banding failed due to cache full" is indeed logged to console.

hx235 · 2021-10-28T11:04:33Z

Update:

Left some PR comments on the current test plan and impl
Ready for review after the signals are cleared

facebook-github-bot · 2021-10-28T11:12:09Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

hx235 · 2021-10-28T11:30:03Z

util/bloom_test.cc

+  const std::size_t num_entries = 10000;
+  char key_buffer[sizeof(int)];
+  for (std::size_t i = 0; i < num_entries; ++i) {
+    filter_bits_builder->AddKey(Key(static_cast<int>(i), key_buffer));


Note to @pdillinger:

It's a bit difficult to verify the charged hash_entries's size as we reserve and release its memory within Finish to avoid reserving memory per insertion expensively. After all, we don't need per-insertion granularity here as we don't react to cache insertion result.

The test TEST_P(FullBloomTest, RibbonFilterFallBackOnLargeBanding) indirectly verifies that through some math calculation and I also printed them out during development for verification.

If more tests are needed, I am thinking of adding sync point but any other suggestions?

If you have even a rough bound on peak memory usage by Bloom construction, that suffices to check for hash_entries because that is the overwhelming majority of usage for Bloom.

util/bloom_test.cc

hx235 · 2021-10-28T11:34:54Z

util/bloom_test.cc

+    // -1 = Marker for newer Bloom implementations
+    // -2 = Marker for Standard128 Ribbon
+    if (will_fall_back) {
+      EXPECT_EQ(filter.data()[filter.size() - 5], static_cast<char>(-1));


Note to @pdillinger:

It's also a bit difficult to verify the fallback is indeed due to large banding, considering there are multiple paths leading to falling backs. I had to verify it by printing stuff out.

If more tests are needed, I am again thinking of using sync point but any other suggestions?

I'm satisfied with your single control (!will_fall_back case) but if you wanted to also control your other concern, you could have another case/iteration that turns off reserve_table_builder_memory.

hx235 · 2021-10-28T11:43:10Z

table/block_based/filter_policy.cc

+      if (cache_res_mgr) {
+        Status s = cache_res_mgr->UpdateCacheReservation<
+            CacheEntryRole::kXXPH3FilterConstruction>(
+            cache_res_mgr->GetTotalMemoryUsed() + requested);


Question to @pdillinger:
In the branch of ifdef ROCKSDB_MALLOC_USABLE_SIZ && if (aggregate_rounding_balance_ != nullptr), will the returned value rv hugely different from the requested value, that is actually used in allocation?

The reason I asked is that
(1) I used the returned value rv (i.e, len_with_metadata) to approximate the length of buf/mutable_buf in all cases. (2)I had a bit difficulty in fully understanding RoundDownUsableSpaceto make sure it's okay to do so in this case (although I had a feeling that these two values are just aroundkExtraPadding away ) (3) I also had some difficulties in covering this branch (ifdef ROCKSDB_MALLOC_USABLE_SIZ && if (aggregate_rounding_balance_ != nullptr)` in testing but will try harder later.

include/rocksdb/table.h

hx235 · 2021-10-28T19:04:00Z

util/bloom_test.cc

@@ -601,6 +602,351 @@ TEST_P(FullBloomTest, OptimizeForMemory) {
  }
 }

+TEST_P(FullBloomTest, ReserveBloomAndRibbonFilterConstructionMemory) {


Note to @pdillinger:

Just to follow up on our discussion on supporting partitioned filter, for Partitioned Filter, I think we need to do testing at the abstraction level like partitioned_filter_block_test.cc. One reason is that we need to adds up all the final filter length to make assertion on charging the wholemutable_buf right. I haven't looked deep into that but it should be doable.

We can either test that or we can guard our feature from Partitioned Filter

table/block_based/block_based_table_builder.cc

facebook-github-bot · 2021-10-30T00:36:10Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-10-30T00:36:25Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

hx235 · 2021-10-30T00:36:43Z

Update:

Rebase on update to [1/4][PerfImprove][ResourceMngmt] Deallocate payload of BlockBasedTableBuilder::Rep::FilterBlockBuilder earlier for Full/PartitionedFilter #9070

pdillinger

LGTM, thanks!

pdillinger · 2021-11-17T21:14:25Z

options/options_settable_test.cc

@@ -175,6 +175,7 @@ TEST_F(OptionsSettableTest, BlockBasedTableOptionsAllFieldsSettable) {
      "optimize_filters_for_memory=true;"
      "index_block_restart_interval=4;"
      "filter_policy=bloomfilter:4:true;whole_key_filtering=1;"
+      "reserve_table_builder_memory=false;"


Ideally you should also add a small case to OptionsTest.GetBlockBasedTableOptionsFromString to ensure configuration string is properly connected to the field

pdillinger · 2021-11-17T21:29:39Z

db/db_bloom_filter_test.cc

+        std::make_tuple(true, BloomFilterPolicy::Mode::kDeprecatedBlock, false),
+        std::make_tuple(true, BloomFilterPolicy::Mode::kLegacyBloom, false)));
+
+TEST_P(DBFilterConstructionReserveMemoryTestWithParam, ReserveMemory) {


Maybe drop a TODO comment here about improving running time of the test, possible approach & challenges that presents to keeping the production code simple and the test reflecting production reality.

…cket modular, api comment

…on unit

facebook-github-bot · 2021-11-18T00:54:50Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2021-11-18T00:55:06Z

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

hx235 · 2021-11-18T00:55:33Z

Update:

Addressed comment on option test and todo comment
Rebased
Will merge after signals are cleared

facebook-github-bot · 2021-11-18T17:42:32Z

@hx235 merged this pull request in 74544d5.

…ter earlier (#9345) Summary: Note: rebase on and merge after #9349, as part of #9342 **Context:** #9073 charged the hash entries' memory in block cache with `CacheReservationHandle`. However, in the edge case where Ribbon Filter falls back to Bloom Filter and swaps its hash entries to the embedded bloom filter object, the handles associated with those entries are not swapped and thus not released as soon as those entries are cleared during Bloom Filter's finish process. Although this is a minor issue since RocksDB internal calls `FilterBitsBuilder->Reset()` right after `FilterBitsBuilder->Finish()` on the main path, which releases all the cache reservation related to both the Ribbon Filter and its embedded Bloom Filter, it still worths this fix to avoid confusion. **Summary:** - Swapped the `CacheReservationHandle` associated with the hash entries on Ribbon Filter's fallback Pull Request resolved: #9345 Test Plan: - Added a unit test to verify the number of cache reservation after clearing hash entries, which failed before the change and now succeeds Reviewed By: pdillinger Differential Revision: D33377225 Pulled By: hx235 fbshipit-source-id: 7487f4c40dfb6ee7928232021f93ef2c5329cffa

Summary: Note: rebase on and merge after #9349, #9345, (optional) #9393 **Context:** (Quoted from pdillinger) Layers of information during new Bloom/Ribbon Filter construction in building block-based tables includes the following: a) set of keys to add to filter b) set of hashes to add to filter (64-bit hash applied to each key) c) set of Bloom indices to set in filter, with duplicates d) set of Bloom indices to set in filter, deduplicated e) final filter and its checksum This PR aims to detect corruption (e.g, unexpected hardware/software corruption on data structures residing in the memory for a long time) from b) to e) and leave a) as future works for application level. - b)'s corruption is detected by verifying the xor checksum of the hash entries calculated as the entries accumulate before being added to the filter. (i.e, `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()`) - c) - e)'s corruption is detected by verifying the hash entries indeed exists in the constructed filter by re-querying these hash entries in the filter (i.e, `FilterBitsBuilder::MaybePostVerify()`) after computing the block checksum (except for PartitionFilter, which is done right after each `FilterBitsBuilder::Finish` for impl simplicity - see code comment for more). For this stage of detection, we assume hash entries are not corrupted after checking on b) since the time interval from b) to c) is relatively short IMO. Option to enable this feature of detection is `BlockBasedTableOptions::detect_filter_construct_corruption` which is false by default. **Summary:** - Implemented new functions `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()` and `FilterBitsBuilder::MaybePostVerify()` - Ensured hash entries, final filter and banding and their [cache reservation ](#9073) are released properly despite corruption - See [Filter.construction.artifacts.release.point.pdf ](https://github.com/facebook/rocksdb/files/7923487/Design.Filter.construction.artifacts.release.point.pdf) for high-level design - Bundled and refactored hash entries's related artifact in XXPH3FilterBitsBuilder into `HashEntriesInfo` for better control on lifetime of these artifact during `SwapEntires`, `ResetEntries` - Ensured RocksDB block-based table builder calls `FilterBitsBuilder::MaybePostVerify()` after constructing the filter by `FilterBitsBuilder::Finish()` - When encountering such filter construction corruption, stop writing the filter content to files and mark such a block-based table building non-ok by storing the corruption status in the builder. Pull Request resolved: #9342 Test Plan: - Added new unit test `DBFilterConstructionCorruptionTestWithParam.DetectCorruption` - Included this new feature in `DBFilterConstructionReserveMemoryTestWithParam.ReserveMemory` as this feature heavily touch ReserveMemory's impl - For fallback case, I run `./filter_bench -impl=3 -detect_filter_construct_corruption=true -reserve_table_builder_memory=true -strict_capacity_limit=true -quick -runs 10 | grep 'Build avg'` to make sure nothing break. - Added to `filter_bench`: increased filter construction time by **30%**, mostly by `MaybePostVerify()` - FastLocalBloom - Before change: `./filter_bench -impl=2 -quick -runs 10 | grep 'Build avg'`: **28.86643s** - After change: - `./filter_bench -impl=2 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless): **27.6644s (-4% perf improvement might be due to now we don't drop bloom hash entry in `AddAllEntries` along iteration but in bulk later, same with the bypassing-MaybePostVerify case below)** - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (expect acceptable increase): **34.41159s (+20%)** - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (by-passing MaybePostVerify, expect minor increase): **27.13431s (-6%)** - Standard128Ribbon - Before change: `./filter_bench -impl=3 -quick -runs 10 | grep 'Build avg'`: **122.5384s** - After change: - `./filter_bench -impl=3 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless - verified by removing MaybePostVerify under this case and found only +-1ns difference): **124.3588s (+2%)** - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(expect acceptable increase): **159.4946s (+30%)** - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(by-passing MaybePostVerify, expect minor increase) : **125.258s (+2%)** - Added to `db_stress`: `make crash_test`, `./db_stress --detect_filter_construct_corruption=true` - Manually smoke-tested: manually corrupted the filter construction in some db level tests with basic PUT and background flush. As expected, the error did get returned to users in subsequent PUT and Flush status. Reviewed By: pdillinger Differential Revision: D33746928 Pulled By: hx235 fbshipit-source-id: cb056426be5a7debc1cd16f23bc250f36a08ca57

Summary: **Context:** Previous PR #9748, #9073, #8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`. Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags. **Summary:** - Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that - Added missing db bench/stress test for some memory charging features - Renamed related test suite to indicate they are under the same theme of memory charging - Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication - Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature. Pull Request resolved: #9926 Test Plan: - New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)` - New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)` - CI - db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_compression_dictionary_building_buffer=1(remove this for comparison) -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'` #-run | (pre-PR) avg micros/op | std micros/op | (post-PR) micros/op | std micros/op | change (%) -- | -- | -- | -- | -- | -- 10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721 20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465** 40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078** - db_stress: `python3 tools/db_crashtest.py blackbox -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D36054712 Pulled By: hx235 fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af

facebook-github-bot added the CLA Signed label Oct 24, 2021

This was referenced Oct 24, 2021

[1/4][PerfImprove][ResourceMngmt] Deallocate payload of BlockBasedTableBuilder::Rep::FilterBlockBuilder earlier for Full/PartitionedFilter #9070

Closed

[2/4][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() #9071

Closed

hx235 marked this pull request as draft October 24, 2021 05:18

hx235 changed the title ~~[3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit~~ [no-review][3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit Oct 24, 2021

hx235 force-pushed the reserve_filter_construction_mem branch 8 times, most recently from 784cdab to 8baad09 Compare October 28, 2021 10:23

hx235 changed the title ~~[no-review][3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit~~ [3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit Oct 28, 2021

hx235 commented Oct 28, 2021

View reviewed changes

util/bloom_test.cc Outdated Show resolved Hide resolved

hx235 commented Oct 28, 2021

View reviewed changes

ajkr reviewed Oct 28, 2021

View reviewed changes

include/rocksdb/table.h Outdated Show resolved Hide resolved

hx235 commented Oct 28, 2021

View reviewed changes

table/block_based/block_based_table_builder.cc Outdated Show resolved Hide resolved

hx235 requested a review from pdillinger October 28, 2021 19:36

hx235 marked this pull request as ready for review October 28, 2021 23:19

hx235 force-pushed the reserve_filter_construction_mem branch from 8baad09 to d5c8077 Compare October 30, 2021 00:36

hx235 force-pushed the reserve_filter_construction_mem branch from d5c8077 to 72d9dd3 Compare November 2, 2021 00:45

pdillinger approved these changes Nov 17, 2021

View reviewed changes

hx235 added 9 commits November 17, 2021 16:52

reserve filter construction memory

32fbcc0

Use handle for mutable_buf

c9bff7c

Add option reserve_table_builder_memory

17a2721

Feedback: API wording, self-service CRM, aggresive deletion of FBB

6b66530

Feedback: charge hash bucket, peak test, coding improvement

d8055da

Improve: reduce some testing key num, reword test comment

9168889

Feedback: test instantiate, static noop function comparation, half bu…

26233be

…cket modular, api comment

Hack: ensure NoopDeleterForFilterConstruction are from same translati…

cf42612

…on unit

Feedback: option test & comment to speed up test

f15e591

hx235 force-pushed the reserve_filter_construction_mem branch from 76c84f1 to f15e591 Compare November 18, 2021 00:54

facebook-github-bot closed this in 74544d5 Nov 18, 2021

facebook-github-bot added the Merged label Nov 18, 2021

This was referenced Dec 29, 2021

Detect (new) Bloom/Ribbon Filter construction corruption #9342

Closed

[2/4][PerfImprove] Release cache reservation of hash entries of the fall-back Ribbon Filter earlier #9345

Closed

hx235 mentioned this pull request Jan 27, 2022

Remove deprecated API AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit #9452

Closed

hx235 mentioned this pull request May 1, 2022

Rewrite memory-charging feature's option API #9926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4/4][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit #9073

[4/4][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit #9073

hx235 commented Oct 24, 2021 •

edited

hx235 commented Oct 28, 2021 •

edited

facebook-github-bot commented Oct 28, 2021

hx235 Oct 28, 2021 •

edited

pdillinger Nov 10, 2021

hx235 Oct 28, 2021 •

edited

pdillinger Nov 10, 2021

hx235 Oct 28, 2021 •

edited

hx235 Oct 28, 2021 •

edited

facebook-github-bot commented Oct 30, 2021

facebook-github-bot commented Oct 30, 2021

hx235 commented Oct 30, 2021

pdillinger left a comment

pdillinger Nov 17, 2021

pdillinger Nov 17, 2021

facebook-github-bot commented Nov 18, 2021

facebook-github-bot commented Nov 18, 2021

hx235 commented Nov 18, 2021

facebook-github-bot commented Nov 18, 2021

[4/4][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit #9073

[4/4][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit #9073

Conversation

hx235 commented Oct 24, 2021 • edited

hx235 commented Oct 28, 2021 • edited

facebook-github-bot commented Oct 28, 2021

hx235 Oct 28, 2021 • edited

Choose a reason for hiding this comment

pdillinger Nov 10, 2021

Choose a reason for hiding this comment

hx235 Oct 28, 2021 • edited

Choose a reason for hiding this comment

pdillinger Nov 10, 2021

Choose a reason for hiding this comment

hx235 Oct 28, 2021 • edited

Choose a reason for hiding this comment

hx235 Oct 28, 2021 • edited

Choose a reason for hiding this comment

facebook-github-bot commented Oct 30, 2021

facebook-github-bot commented Oct 30, 2021

hx235 commented Oct 30, 2021

pdillinger left a comment

Choose a reason for hiding this comment

pdillinger Nov 17, 2021

Choose a reason for hiding this comment

pdillinger Nov 17, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Nov 18, 2021

facebook-github-bot commented Nov 18, 2021

hx235 commented Nov 18, 2021

facebook-github-bot commented Nov 18, 2021

hx235 commented Oct 24, 2021 •

edited

hx235 commented Oct 28, 2021 •

edited

hx235 Oct 28, 2021 •

edited

hx235 Oct 28, 2021 •

edited

hx235 Oct 28, 2021 •

edited

hx235 Oct 28, 2021 •

edited