Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2/4][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() #9071

Closed
wants to merge 7 commits into from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Oct 24, 2021

Note: This PR is the 2nd PR of a bigger PR stack (#9073).

Context:
CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used) accepts an accumulated total memory used (e.g, used 10MB so far) instead of usage change (e.g, increase by 5 MB, decrease by 5 MB). It has benefits including consolidating API for increase and decrease as described in #8506.

However, not every CacheReservationManager user keeps track of this accumulated total memory usage. For example, Bloom/Ribbon Filter construction (e.g, here in #9073) does not while WriteBufferManager and compression dictionary buffering do.

Considering future users might or might not keep track of this counter and implementing this counter within CacheReservationManager is easy due to the passed-in std::size_t new_memory_used in calling CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used), it is proposed to add a new API CacheReservationManager::GetTotalMemoryUsage().

As noted in the API comments, since CacheReservationManager is NOT thread-safe, external synchronization is
needed in calling UpdateCacheReservation() if you want GetTotalMemoryUsed() returns the indeed latest memory used.

Summary:

  • Added and updated private counter memory_used_ every time CacheReservationManager::UpdateCacheReservation(std::size_t new_memory_used) is called regardless if the call returns non-okay status
  • Added CacheReservationManager::GetTotalMemoryUsage() to return memory_used_

Test:

  • Passing new tests
  • Passing existing tests

@hx235 hx235 changed the title [Depend on #9070][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() [2/3][ResourceMngmt] Add new API CacheReservationManager::GetTotalMemoryUsage() Oct 24, 2021
@hx235 hx235 requested a review from ajkr October 24, 2021 19:48
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -90,6 +118,7 @@ class CacheReservationManager {
std::shared_ptr<Cache> cache_;
bool delayed_decrease_;
std::atomic<std::size_t> cache_allocated_size_;
std::atomic<std::size_t> memory_used_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it confusing that the class is marked NOT thread safe and yet we are using atomics here. Are GetTotalReservedCacheSize() and GetTotalMemoryUsed() actually considered thread safe?

Copy link
Contributor Author

@hx235 hx235 Oct 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! GetTotalReservedCacheSize() can be considered thread-safe and I just updated the class comment to specify that.

For GetTotalReservedCacheSize(), the reason to use atomic variable is because I decided to replace WriteBufferManager's atomic variable dummy_size_ in its previous implementation with CacheReservationManager::cache_allocated_size_ cuz they calculate the same things as you can see from here and here.

So for consistency, I used atomic variable for cache_allocated_size_;.

In term of using std::memory_order_relaxed for such an atomic variable, there was a concern on GetTotalReservedCacheSize() might not pick up the latest update from another threads but this discussion here convinced me.

However, lately I've been thinking that this can cause inaccuracy issue on CacheReservationManager::UpdateCacheReservation() in multi-threaded setting even with external synchronization due to the impl depends on the latest cache size . And the inaccuracy can lead to overcharging/undercharging the cache. But so far the callers for accounting memory in compression buffer building and bloom/ribbon filter construction are all single threaded so this issue is less of a problem. However, for WriteBufferManager, this can be an issue.

(I was actually planning to include this in my tech debt week cuz the more I look back on my previous work, the more tech debt I could find :p Btw, cc @ajkr here since I was planning to reach out for this for tech debt week)

For memory_used_ - yes, it suffices to just use size_t with external synchronization or atomic with memory_order::seq_cst/(or memory_order_acquire but I need to look into that) for multi-threaded setting, as we always want the latest memory usage for the same reason above. I updated my PR to use the former since this class was already emphasized as non thread-safe.

Copy link
Contributor

@ajkr ajkr Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, lately I've been thinking that this can cause inaccuracy issue on CacheReservationManager::UpdateCacheReservation() in multi-threaded setting even with external synchronization due to the impl depends on the latest cache size . And the inaccuracy can lead to overcharging/undercharging the cache. But so far the callers for accounting memory in compression buffer building and bloom/ribbon filter construction are all single threaded so this issue is less of a problem. However, for WriteBufferManager, this can be an issue.

Without external synchronization, we only expose read-only access to cache_allocated_size_ via GetTotalReservedCacheSize(), right? I am not sure how UpdateCacheReservation() could do the wrong thing since any function that modifies cache_allocated_size_ should be run entirely before or entirely after UpdateCacheReservation() thanks to the external synchronization requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modifies cache_allocated_size_ should be run entirely before or entirely after UpdateCacheReservation() thanks to the external synchronization requirement.

That's correct :) The thing I worry about is the memory order memory_order_relaxed currently being used by cache_allocated_size_. If I understand correctly, memory_order_relaxed does not guarantee the latest write from thread A can be seen by thread B. So if thread A doing an update changing cache_allocated_size_ and thread B doing an update after that based on the cache_allocated_size_, thread B might not see thread A's update due to the memory order.

I am thinking of changing to memory_order_seq_cst or memory_order_acquire/memory_order_release but need to be careful due to performance penalties (which is I assume to be the reason to use memory_order_relaxed) and deadlock (since we have external synchronization using lock and internal memory order "lock")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, memory_order_relaxed does not guarantee the latest write from thread A can be seen by thread B

It should guarantee it in the relevant case:

  • A updates cache_allocated_size_
  • A releases mutex
  • B acquires mutex
  • B reads cache_allocated_size_

I think all memory operations prior to A's release of mutex should be visible by B after its acquire of mutex.

Copy link
Contributor Author

@hx235 hx235 Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting! I didn't know mutex can affect memory order :) Let me think more about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "4.12 Memory Synchronization" is a relevant reference for the pthread_mutex_* functions - https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting! I used to think c++'s atomic memory order works independently of mutex - but the second answer in here seems to support your argument.

The author said "In addition, memory operations that are released by a mutex are synchronized (visible) with another thread that acquires the same mutex." by reading between lines in some c++ standard.

I wish there was c++ standard reference explicitly stating that :/

Copy link
Contributor

@ajkr ajkr Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is memory_order_relaxed is an ambiguous name because it could mean (a) no special memory ordering, or (b) especially loose memory ordering. Since in my understanding it actually means (a), I think of it the same as updating a non-atomic like a plain int in a mutex (which of course has to be visible to the next thread that grabs the mutex), with the additional guarantee that out-of-mutex observers cannot see partial writes (because it's atomic).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point 👍

@hx235 hx235 marked this pull request as draft October 26, 2021 02:20
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@hx235
Copy link
Contributor Author

hx235 commented Oct 26, 2021

Update:

  • Clarified in header comment an exception to non thread-safe usage of CacheReservationManager when calling GetTotalReservedCacheSize()
  • Implemented memory_used_ as ordinary std::size_t instead of atomic version for the reason mentioned in here
  • Ready for review when the signals are cleared

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hx235 hx235 marked this pull request as ready for review October 26, 2021 03:44
@hx235 hx235 marked this pull request as draft October 26, 2021 03:48
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@hx235 hx235 marked this pull request as ready for review October 26, 2021 04:51
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor Author

hx235 commented Oct 26, 2021

Update:

  • Rebased onto upstream/main for clearer commit history

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor Author

hx235 commented Oct 26, 2021

Update

  • Added new line at the end of file as required
  • Ready for review after signals are clear

Comment on lines 80 to 85
// If the new_memoy_used passed in reflects a hypothetical usage
// instead of the actual memory usage and this hypothetical usage is
// not going to happen, you might consider calling
// UpdateCacheReservation() to reversely update the actual memory usage
// since the hypothetical usage that passed in will affect
// GetTotalMemoryUsed() result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a higher-level SetTotalMemoryUsed() API that internally sets memory_used_ and calls UpdateCacheReservation()? And if there are no users of hypothetical usage (are there?), we can even make UpdateCacheReservation() private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if there are no users of hypothetical usage, we can also delete the comment to achieve the same effect of simplifying the API. The parameter is currently called new_mem_used so it is what I'd expect GetTotalMemoryUsed() to return (and not what I'd expect to be a hypothetical number).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a higher-level SetTotalMemoryUsed() API that internally sets memory_used_ and calls UpdateCacheReservation()?

That's an interesting idea! The reason why I didn't expose SetTotalMemoryUsed() is to save one step for class user calling UpdateCacheReservation(). The underlying assumption of mine is that the class user is more likely to care about modifying the cache reservation than modifying the memory being tracked.

And if there are no users of hypothetical usage (are there?)
There wasn't before but there now is one in filter construction here.

This implementation is based on the discussion with @pdillinger that we can potentially save the expensive operation ResetAndFindSeedToSolve which also allocates the actual memory by checking the cache status first. So the hypothetical memory usage might not become reality in some case. It is exactly the case this comment is emphasizing: a case where we have "reserve-before-use" and "use-not-happening-after" and we care about GetTotalMemoryUsed() being accurate.

if there are no users of hypothetical usage, we can also delete the comment to achieve the same effect of simplifying the API.

Or maybe we remove the comment to discourage this kind of usage and considers the one in filter construction as a "hack".

Copy link
Contributor Author

@hx235 hx235 Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we remove the comment to discourage this kind of usage and considers the one in filter construction as a "hack".

@ajkr is this an acceptable solution based on the discussion above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that works for me as it makes the API easy to understand. Sorry for the delay.

facebook-github-bot pushed a commit that referenced this pull request Nov 1, 2021
Summary:
Note: it might conflict with another CRM related PR #9071 and so will merge after that's merged.

Context:
As `CacheReservationManager` being used by more memory users, it is convenient to retrieve the dummy entry size for `CacheReservationManager` instead of hard-coding `256 * 1024` in writing tests. Plus it allows more flexibility to change our implementation on dummy entry size.

A follow-up PR is needed to replace those hard-coded dummy entry size value in `db_test2.cc`, `db_write_buffer_manager_test.cc`, `write_buffer_manager_test.cc`, `table_test.cc` and the ones introduced in #9072 (comment).
- Exposed the private static constexpr `kDummyEntrySize` through public static `CacheReservationManager::GetDummyEntrySize()`

Pull Request resolved: #9072

Test Plan:
- Passing new tests
- Passing existing tests

Reviewed By: ajkr

Differential Revision: D32043684

Pulled By: hx235

fbshipit-source-id: ddefc6921c052adab6a2cda2394eb26da3076a50
@hx235
Copy link
Contributor Author

hx235 commented Nov 4, 2021

Update:

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor Author

hx235 commented Nov 5, 2021

Update:

  • Removed confusing API comment
  • Rebased
  • Ready for review when signals are cleared

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@hx235 hx235 requested a review from ajkr November 5, 2021 15:09
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 merged this pull request in ffd6085.

facebook-github-bot pushed a commit that referenced this pull request Nov 9, 2021
Summary:
Note: This PR is the 3rd PR of a bigger PR stack (#9073) and depends on the second PR (#9071). **See changes from this PR only https://github.com/facebook/rocksdb/pull/9130/commits/00447324d082136b0e777d3ab6a3df3a8452c633**

Context:
pdillinger brought up a good [point](#9073 (comment)) about lacking RAII support for per cache reservation in `CacheReservationManager`  when reviewing #9073.

To summarize the discussion, the current API `CacheReservationManager::UpdateCacheReservation()` requires callers to explicitly calculate and pass in a correct`new_mem_used` to release a cache reservation (if they don't want to rely on the clean-up during `CacheReservationManager`'s destruction - such as they want to release it earlier).

While this implementation has convenience in some use-case such as `WriteBufferManager`, where [reservation](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L69-L91) and [release](https://github.com/facebook/rocksdb/blob/main/memtable/write_buffer_manager.cc#L109-L129) amounts do not necessarily correspond symmetrically and thus a flexible `new_mem_used` inputing is needed, it can be prone to caller's calculation error as well as cause a mass of codes in releasing cache in other use-case such as filter construction, where reservation and release amounts do correspond symmetrically and many code paths requiring a cache release, as [pointed](#9073 (comment)) out by pdillinger.

Therefore we decided to provide a new API in `CacheReservationManager` to update reservation with better RAII support for per cache reservation, using a handle to manage the life time of that particular cache reservation.
- Added a new class `CacheReservationHandle`
- Added a new API `CacheReservationManager::MakeCacheReservation()` that outputs a `CacheReservationHandle` for managing the reservation
- Updated class comments to clarify two different cache reservation methods

Tests:
- Passing new tests

Pull Request resolved: #9130

Reviewed By: pdillinger

Differential Revision: D32199446

Pulled By: hx235

fbshipit-source-id: 1cba7c636e5ecfb55b0c1e0c2d218cc9b5b30b4e
facebook-github-bot pushed a commit that referenced this pull request Nov 18, 2021
#9073)

Summary:
Note: This PR is the 4th part of a bigger PR stack (#9073) and will rebase/merge only after the first three PRs (#9070, #9071, #9130) merge.

**Context:**
Similar to #8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`.

The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](#9073 (comment)) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.

**Summary:**
- Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction:
   - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance),
   - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`),
   - final filter (i.e, `mutable_buf`'s size).
      - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as  explicitly delete the filter bits builder when done with the final filter in block based table.
- Added option fo run `filter_bench` with this memory reservation feature

Pull Request resolved: #9073

Test Plan:
- Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of  `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory`
  - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally
- Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter
- Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below:
   - FastLocalBloom
      - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`:
         - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0)
      - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`:
         - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0)
      - new feature of RibbonFilter with fallback  (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
         - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0)

    - Ribbon128
       - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`:
           - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0)
       - new feature  (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `:
           - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0)
       - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
          - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0)
          - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console.

Reviewed By: pdillinger

Differential Revision: D31991348

Pulled By: hx235

fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants