Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[4/4][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit #9073

Closed
wants to merge 9 commits into from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Oct 24, 2021

Note: This PR is the 4th part of a bigger PR stack (#9073) and will rebase/merge only after the first three PRs (#9070, #9071, #9130) merge.

Context:
Similar to #8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction by charging dummy entry to block cache, moving toward the goal of single global memory limit using block cache capacity. It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under LRUCacheOptions::strict_capacity_limit=true.

The option to turn on this feature is BlockBasedTableOptions::reserve_table_builder_memory = true which by default is set to false. We decided not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.

Summary:

  • Reserved/released cache for creation/destruction of three main memory users using CacheReservationManager:
    • hash entries (i.ehash_entries.size(), we bucket-charge hash entries during insertion for performance),
    • banding (Ribbon Filter only, bytes_coeff_rows +bytes_result_rows + bytes_backtrack),
    • final filter (i.e, mutable_buf's size).
      • Implementation details: in order to use CacheReservationManager::CacheReservationHandle to account final filter's memory, we have to store the CacheReservationManager object and CacheReservationHandle for final filter in XXPH3BitsFilterBuilder as well as explicitly delete the filter bits builder when done with the final filter in block based table.
  • Added option fo run filter_bench with this memory reservation feature

Test:

  • Added new tests in db_bloom_filter_test to verify filter construction peak cache reservation under combination of BlockBasedTable::Rep::FilterType (e.g, kFullFilter, kPartitionedFilter), BloomFilterPolicy::Mode(e.g, kFastLocalBloom, kStandard128Ribbon, kDeprecatedBlock) and BlockBasedTableOptions::reserve_table_builder_memory
    • To address the concern for slow test: tests with memory reservation under kFullFilter + kStandard128Ribbon and kPartitionedFilter take around 3000 - 6000 ms and others take around 1500 - 2000 ms, in total adding 20000 - 25000 ms to the test suit running locally
  • Added new test in bloom_test to verify Ribbon Filter fallback on large banding in FullFilter
  • Added test in filter_bench to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over 20 run as below:
    • FastLocalBloom

      • baseline ./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg':
        • Build avg ns/key: 29.56295 (DEBUG_LEVEL=1), 29.98153 (DEBUG_LEVEL=0)
      • new feature (expected to be similar as above)./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg':
        • Build avg ns/key: 30.99046 (DEBUG_LEVEL=1), 30.48867 (DEBUG_LEVEL=0)
      • new feature of RibbonFilter with fallback (expected to be similar as above) ./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg' :
        • Build avg ns/key: 31.146975 (DEBUG_LEVEL=1), 30.08165 (DEBUG_LEVEL=0)
    • Ribbon128

      • baseline ./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg':
        • Build avg ns/key: 129.17585 (DEBUG_LEVEL=1), 130.5225 (DEBUG_LEVEL=0)
      • new feature (expected to be similar as above) ./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' :
        • Build avg ns/key: 131.61645 (DEBUG_LEVEL=1), 132.98075 (DEBUG_LEVEL=0)
      • new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) ./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg' :
        • Build avg ns/key: 52.032965 (DEBUG_LEVEL=1), 52.597825 (DEBUG_LEVEL=0)
        • And the warning message of "Cache reservation for Ribbon filter banding failed due to cache full" is indeed logged to console.

@hx235 hx235 marked this pull request as draft October 24, 2021 05:18
@hx235 hx235 changed the title [3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit [no-review][3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit Oct 24, 2021
@hx235 hx235 force-pushed the reserve_filter_construction_mem branch 8 times, most recently from 784cdab to 8baad09 Compare October 28, 2021 10:23
@hx235 hx235 changed the title [no-review][3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit [3/3][ResourceMngmt] Account Bloom/Ribbon filter construction memory in global memory limit Oct 28, 2021
@hx235
Copy link
Contributor Author

hx235 commented Oct 28, 2021

Update:

  • Left some PR comments on the current test plan and impl
  • Ready for review after the signals are cleared

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

const std::size_t num_entries = 10000;
char key_buffer[sizeof(int)];
for (std::size_t i = 0; i < num_entries; ++i) {
filter_bits_builder->AddKey(Key(static_cast<int>(i), key_buffer));
Copy link
Contributor Author

@hx235 hx235 Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to @pdillinger:

It's a bit difficult to verify the charged hash_entries's size as we reserve and release its memory within Finish to avoid reserving memory per insertion expensively. After all, we don't need per-insertion granularity here as we don't react to cache insertion result.

The test TEST_P(FullBloomTest, RibbonFilterFallBackOnLargeBanding) indirectly verifies that through some math calculation and I also printed them out during development for verification.

If more tests are needed, I am thinking of adding sync point but any other suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have even a rough bound on peak memory usage by Bloom construction, that suffices to check for hash_entries because that is the overwhelming majority of usage for Bloom.

util/bloom_test.cc Outdated Show resolved Hide resolved
// -1 = Marker for newer Bloom implementations
// -2 = Marker for Standard128 Ribbon
if (will_fall_back) {
EXPECT_EQ(filter.data()[filter.size() - 5], static_cast<char>(-1));
Copy link
Contributor Author

@hx235 hx235 Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to @pdillinger:

It's also a bit difficult to verify the fallback is indeed due to large banding, considering there are multiple paths leading to falling backs. I had to verify it by printing stuff out.

If more tests are needed, I am again thinking of using sync point but any other suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm satisfied with your single control (!will_fall_back case) but if you wanted to also control your other concern, you could have another case/iteration that turns off reserve_table_builder_memory.

if (cache_res_mgr) {
Status s = cache_res_mgr->UpdateCacheReservation<
CacheEntryRole::kXXPH3FilterConstruction>(
cache_res_mgr->GetTotalMemoryUsed() + requested);
Copy link
Contributor Author

@hx235 hx235 Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question to @pdillinger:
In the branch of ifdef ROCKSDB_MALLOC_USABLE_SIZ && if (aggregate_rounding_balance_ != nullptr), will the returned value rv hugely different from the requested value, that is actually used in allocation?

The reason I asked is that
(1) I used the returned value rv (i.e, len_with_metadata) to approximate the length of buf/mutable_buf in all cases. (2)I had a bit difficulty in fully understanding RoundDownUsableSpaceto make sure it's okay to do so in this case (although I had a feeling that these two values are just aroundkExtraPadding away ) (3) I also had some difficulties in covering this branch (ifdef ROCKSDB_MALLOC_USABLE_SIZ && if (aggregate_rounding_balance_ != nullptr)` in testing but will try harder later.

include/rocksdb/table.h Outdated Show resolved Hide resolved
@@ -601,6 +602,351 @@ TEST_P(FullBloomTest, OptimizeForMemory) {
}
}

TEST_P(FullBloomTest, ReserveBloomAndRibbonFilterConstructionMemory) {
Copy link
Contributor Author

@hx235 hx235 Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to @pdillinger:

Just to follow up on our discussion on supporting partitioned filter, for Partitioned Filter, I think we need to do testing at the abstraction level like partitioned_filter_block_test.cc. One reason is that we need to adds up all the final filter length to make assertion on charging the wholemutable_buf right. I haven't looked deep into that but it should be doable.

We can either test that or we can guard our feature from Partitioned Filter

@hx235 hx235 requested a review from pdillinger October 28, 2021 19:36
@hx235 hx235 marked this pull request as ready for review October 28, 2021 23:19
@hx235 hx235 force-pushed the reserve_filter_construction_mem branch from 8baad09 to d5c8077 Compare October 30, 2021 00:36
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor Author

hx235 commented Oct 30, 2021

@hx235 hx235 force-pushed the reserve_filter_construction_mem branch from d5c8077 to 72d9dd3 Compare November 2, 2021 00:45
Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -175,6 +175,7 @@ TEST_F(OptionsSettableTest, BlockBasedTableOptionsAllFieldsSettable) {
"optimize_filters_for_memory=true;"
"index_block_restart_interval=4;"
"filter_policy=bloomfilter:4:true;whole_key_filtering=1;"
"reserve_table_builder_memory=false;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally you should also add a small case to OptionsTest.GetBlockBasedTableOptionsFromString to ensure configuration string is properly connected to the field

std::make_tuple(true, BloomFilterPolicy::Mode::kDeprecatedBlock, false),
std::make_tuple(true, BloomFilterPolicy::Mode::kLegacyBloom, false)));

TEST_P(DBFilterConstructionReserveMemoryTestWithParam, ReserveMemory) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe drop a TODO comment here about improving running time of the test, possible approach & challenges that presents to keeping the production code simple and the test reflecting production reality.

@hx235 hx235 force-pushed the reserve_filter_construction_mem branch from 76c84f1 to f15e591 Compare November 18, 2021 00:54
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor Author

hx235 commented Nov 18, 2021

Update:

  • Addressed comment on option test and todo comment
  • Rebased
  • Will merge after signals are cleared

@facebook-github-bot
Copy link
Contributor

@hx235 merged this pull request in 74544d5.

facebook-github-bot pushed a commit that referenced this pull request Jan 7, 2022
…ter earlier (#9345)

Summary:
Note: rebase on and merge after #9349, as part of #9342
**Context:**
#9073 charged the hash entries' memory in block cache with `CacheReservationHandle`. However, in the edge case where Ribbon Filter falls back to Bloom Filter and swaps its hash entries to the embedded bloom filter object, the handles associated with those entries are not swapped and thus not released as soon as those entries are cleared during Bloom Filter's finish process.

Although this is a minor issue since RocksDB internal calls `FilterBitsBuilder->Reset()` right after `FilterBitsBuilder->Finish()` on the main path, which releases all the cache reservation related to both the Ribbon Filter and its embedded Bloom Filter, it still worths this fix to avoid confusion.

**Summary:**
- Swapped the `CacheReservationHandle` associated with the hash entries on Ribbon Filter's fallback

Pull Request resolved: #9345

Test Plan: - Added a unit test to verify the number of cache reservation after clearing hash entries, which failed before the change and now succeeds

Reviewed By: pdillinger

Differential Revision: D33377225

Pulled By: hx235

fbshipit-source-id: 7487f4c40dfb6ee7928232021f93ef2c5329cffa
facebook-github-bot pushed a commit that referenced this pull request Feb 2, 2022
Summary:
Note: rebase on and merge after #9349, #9345, (optional) #9393
**Context:**
(Quoted from pdillinger) Layers of information during new Bloom/Ribbon Filter construction in building block-based tables includes the following:
a) set of keys to add to filter
b) set of hashes to add to filter (64-bit hash applied to each key)
c) set of Bloom indices to set in filter, with duplicates
d) set of Bloom indices to set in filter, deduplicated
e) final filter and its checksum

This PR aims to detect corruption (e.g, unexpected hardware/software corruption on data structures residing in the memory for a long time) from b) to e) and leave a) as future works for application level.
- b)'s corruption is detected by verifying the xor checksum of the hash entries calculated as the entries accumulate before being added to the filter. (i.e, `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()`)
- c) - e)'s corruption is detected by verifying the hash entries indeed exists in the constructed filter by re-querying these hash entries in the filter (i.e, `FilterBitsBuilder::MaybePostVerify()`) after computing the block checksum (except for PartitionFilter, which is done right after each `FilterBitsBuilder::Finish` for impl simplicity - see code comment for more). For this stage of detection, we assume hash entries are not corrupted after checking on b) since the time interval from b) to c) is relatively short IMO.

Option to enable this feature of detection is `BlockBasedTableOptions::detect_filter_construct_corruption` which is false by default.

**Summary:**
- Implemented new functions `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()` and `FilterBitsBuilder::MaybePostVerify()`
- Ensured hash entries, final filter and banding and their [cache reservation ](#9073) are released properly despite corruption
   - See [Filter.construction.artifacts.release.point.pdf ](https://github.com/facebook/rocksdb/files/7923487/Design.Filter.construction.artifacts.release.point.pdf) for high-level design
   -  Bundled and refactored hash entries's related artifact in XXPH3FilterBitsBuilder into `HashEntriesInfo` for better control on lifetime of these artifact during `SwapEntires`, `ResetEntries`
- Ensured RocksDB block-based table builder calls `FilterBitsBuilder::MaybePostVerify()` after constructing the filter by `FilterBitsBuilder::Finish()`
- When encountering such filter construction corruption, stop writing the filter content to files and mark such a block-based table building non-ok by storing the corruption status in the builder.

Pull Request resolved: #9342

Test Plan:
- Added new unit test `DBFilterConstructionCorruptionTestWithParam.DetectCorruption`
- Included this new feature in `DBFilterConstructionReserveMemoryTestWithParam.ReserveMemory` as this feature heavily touch ReserveMemory's impl
   - For fallback case, I run `./filter_bench -impl=3 -detect_filter_construct_corruption=true -reserve_table_builder_memory=true -strict_capacity_limit=true  -quick -runs 10 | grep 'Build avg'` to make sure nothing break.
- Added to `filter_bench`: increased filter construction time by **30%**, mostly by `MaybePostVerify()`
   -  FastLocalBloom
       - Before change: `./filter_bench -impl=2 -quick -runs 10 | grep 'Build avg'`: **28.86643s**
       - After change:
          -  `./filter_bench -impl=2 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless): **27.6644s (-4% perf improvement might be due to now we don't drop bloom hash entry in `AddAllEntries` along iteration but in bulk later, same with the bypassing-MaybePostVerify case below)**
          - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (expect acceptable increase): **34.41159s (+20%)**
          - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (by-passing MaybePostVerify, expect minor increase): **27.13431s (-6%)**
    -  Standard128Ribbon
       - Before change: `./filter_bench -impl=3 -quick -runs 10 | grep 'Build avg'`: **122.5384s**
       - After change:
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless - verified by removing MaybePostVerify under this case and found only +-1ns difference): **124.3588s (+2%)**
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(expect acceptable increase): **159.4946s (+30%)**
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(by-passing MaybePostVerify, expect minor increase) : **125.258s (+2%)**
- Added to `db_stress`: `make crash_test`, `./db_stress --detect_filter_construct_corruption=true`
- Manually smoke-tested: manually corrupted the filter construction in some db level tests with basic PUT and background flush. As expected, the error did get returned to users in subsequent PUT and Flush status.

Reviewed By: pdillinger

Differential Revision: D33746928

Pulled By: hx235

fbshipit-source-id: cb056426be5a7debc1cd16f23bc250f36a08ca57
facebook-github-bot pushed a commit that referenced this pull request May 17, 2022
Summary:
**Context:**
Previous PR #9748, #9073, #8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`.

Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags.

**Summary:**
- Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that
- Added missing db bench/stress test for some memory charging features
- Renamed related test suite to indicate they are under the same theme of memory charging
- Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication
- Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature.

Pull Request resolved: #9926

Test Plan:
- New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)`
- New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)`
- CI
- db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR  -charge_compression_dictionary_building_buffer=1(remove this for comparison)  -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'`

#-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
-- | -- | -- | -- | -- | --
10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721
20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465**
40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078**

- db_stress: `python3 tools/db_crashtest.py blackbox  -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal

Reviewed By: ajkr

Differential Revision: D36054712

Pulled By: hx235

fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants