Some clean-up of secondary cache #10730

pdillinger · 2022-09-24T00:15:29Z

Summary:
This is intended as a step toward possibly separating secondary cache integration from the
Cache implementation as much as possible, to (hopefully) minimize code duplication in
adding secondary cache support to HyperClockCache.

Major clarifications to API docs of secondary cache compatible parts of Cache. For example, previously the docs seemed to suggest that Wait() was not needed if IsReady()==true. And it wasn't clear what operations were actually supported on pending handles.
Add some assertions related to these requirements, such as that we don't Release() before Wait() (which would leak a secondary cache handle).
Fix a leaky abstraction with dummy handles, which are supposed to be internal to the Cache. Previously, these just used value=nullptr to indicate dummy handle, which meant that they could be confused with legitimate value=nullptr cases like cache reservations. Also fixed blob_source_test which was relying on this leaky abstraction.
Drop "incomplete" terminology, which was another name for "pending".
Split handle flags into "mutable" ones requiring mutex and "immutable" ones which do not. Because of single-threaded access to pending handles, the "Is Pending" flag can be in the "immutable" set. This allows removal of a TSAN work-around and removing a mutex acquire-release in IsReady().
Remove some unnecessary handling of charges on handles of failed lookups. Keeping total_charge=0 means no special handling needed. (Removed one unnecessary mutex acquire/release.)
Simplify handling of dummy handle in Lookup(). There is no need to explicitly Ref & Release w/Erase if we generally overwrite the dummy anyway. (Removed one mutex acquire/release, a call to Release().)

Intended follow-up:

Clarify APIs in secondary_cache.h
- Doesn't SecondaryCacheResultHandle transfer ownership of the Value() on success (implementations should not release the value in destructor)?
- Does Wait() need to be called if IsReady() == true? (This would be different from Cache.)
- Do Value() and Size() have undefined behavior if IsReady() == false?
- Why have a custom API for what is essentially a std::future<std::pair<void*, size_t>>?
Improve unit testing of standalone handle case
Apparent null e bug in free_standalone_handle case
Clean up secondary cache testing in lru_cache_test
- Why does TestSecondaryCacheResultHandle hold on to a Cache::Handle?
- Why does TestSecondaryCacheResultHandle::Wait() do nothing? Shouldn't it establish the post-condition IsReady() == true?
- (Assuming that is sorted out...) Shouldn't TestSecondaryCache::WaitAll simply wait on each handle in order (no casting required)? How about making that the default implementation?
- Why does TestSecondaryCacheResultHandle::Size() check Value() first? If the API is intended to be returning 0 before IsReady(), then that is weird but should at least be documented. Otherwise, if it's intended to be undefined behavior, we should assert IsReady().
Consider replacing "standalone" and "dummy" entries with a single kind of "weak" entry that deletes its value when it reaches zero refs. Suppose you are using compressed secondary cache and have two iterators at similar places. It will probably common for one iterator to have standalone results pinned (out of cache) when the second iterator needs those same blocks and has to re-load them from secondary cache and duplicate the memory. Combining the dummy and the standalone should fix this.

Test Plan: existing tests (minor update), and crash test with sanitizers and secondary cache

Performance test for any regressions in LRUCache (primary only):
Create DB with

TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16

Test before & after (run at same time) with

TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X100] -readonly -num=30000000 -bloom_bits=16 -cache_index_and_filter_blocks=1 -cache_size=233000000 -duration 30 -threads=16

Before: readrandom [AVG 100 runs] : 22234 (± 63) ops/sec; 1.6 (± 0.0) MB/sec
After: readrandom [AVG 100 runs] : 22197 (± 64) ops/sec; 1.6 (± 0.0) MB/sec
That's within 0.2%, which is not significant by the confidence intervals.

Summary: * Major clarifications to API docs of secondary cache compatible parts of Cache. For example, previously the docs seemed to suggest that Wait() was not needed if IsReady()==true. And it wasn't clear what operations were actually supported on pending handles. * Add some assertions related to these requirements, such as that we don't Release() before Wait() (which would leak a secondary cache handle). * Fix a leaky abstraction with dummy handles, which are supposed to be internal to the Cache. Previously, these just used value=nullptr to indicate dummy handle, which meant that they could be confused with legitimate value=nullptr cases like cache reservations. Also fixed blob_source_test which was relying on this leaky abstraction. * Drop "incomplete" terminology, which was another name for "pending". * Remove unnecessary "standalone" flag on a handle. It can be a simple internal parameter instead. * Split handle flags into "mutable" ones requiring mutex and "immutable" ones which do not. Because of single-threaded access to pending handles, the "Is Pending" flag can be in the "immutable" set. This allows removal of a TSAN work-around and removing a mutex acquire-release in IsReady(). * Remove some unnecessary handling of charges on handles of failed lookups. Keeping total_charge=0 means no special handling needed. (Removed one unnecessary mutex acquire/release.) * Simplify handling of dummy handle in Lookup(). There is no need to explicitly Ref & Release w/Erase if we generally overwrite the dummy anyway. (Removed one mutex acquire/release, a call to Release().) Test Plan: existing tests (minor update), and (TODO) crash test with sanitizers and secondary cache

facebook-github-bot · 2022-09-26T19:28:54Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

gitbw95 · 2022-09-26T20:06:28Z

cache/lru_cache.cc

@@ -579,12 +576,10 @@ Cache::Handle* LRUCacheShard::Lookup(
      e->Ref();
      e->SetIsInSecondaryCache(is_in_sec_cache);



For a standalone handle , wait can be either true or false.

If wait is false, the Promote() in WaitAll should still promote it as a standalone handle; but
In WaitAll():
shard->Promote(lru_handle, /standalone=/false);

OK, oops I missed that path

gitbw95 · 2022-09-26T20:18:41Z

cache/lru_cache.cc

        found_dummy_entry = true;
+        // Let the dummy entry be overwritten
+        e = nullptr;


This may cause an ASAN issue.

table_.Lookup(key, hash) does not acquire a reference

gitbw95 · 2022-09-26T20:24:14Z

cache/lru_cache.h

-    IN_LOW_PRI_POOL = (1 << 8),
-    // Whether this entry is not inserted into the cache (both hash table and
-    // LRU list).
-    IS_STANDALONE = (1 << 9),


If it works after removing IS_STANDALONE, pls update the comment at line 43.

gitbw95 · 2022-09-26T20:32:57Z

cache/lru_cache.cc

      Cache::Priority priority =
          e->IsHighPri() ? Cache::Priority::HIGH : Cache::Priority::LOW;
-      s = Insert(e->key(), e->hash, /*value=*/nullptr, 0,
+      s = Insert(e->key(), e->hash, kDummyValueMarker, /*charge=*/0,


kDummyValueMarker is 18 characters long, so the charge should be 18; otherwise, this becomes a hidden memory usage.

kDummyValueMarker[...] is static data. There is no incremental hidden memory usage.

(aside from the pre-existing limitation of not accounting for metadata used by dummy handles)

gitbw95

Please address the comments.

facebook-github-bot · 2022-09-27T05:48:05Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-09-27T05:50:05Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-09-27T15:47:38Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-09-27T18:58:16Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

anand1976

LGTM

anand1976 · 2022-09-29T17:49:07Z

include/rocksdb/cache.h

-  // its not ready. The caller should then call Value() to check if the
-  // item was successfully retrieved. If unsuccessful (perhaps due to an
-  // IO error), Value() will return nullptr.
+  // ======================== Async Lookup (wait=true) ======================


I think you mean "wait==false" here and below? If wait==true, then its a sync call and handle will never be pending.

…he_cleanup

facebook-github-bot · 2022-10-03T20:19:10Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-10-03T21:03:46Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

pdillinger requested review from anand1976 and gitbw95 September 24, 2022 00:15

facebook-github-bot added the CLA Signed label Sep 24, 2022

pdillinger added 3 commits September 26, 2022 08:48

Merge remote-tracking branch 'origin/main' into secondary_cache_cleanup

3128c71

Fix UAF

f8792fd

Fix TSAN

81810bd

gitbw95 reviewed Sep 26, 2022

View reviewed changes

gitbw95 suggested changes Sep 26, 2022

View reviewed changes

pdillinger added 2 commits September 26, 2022 22:45

Fix standalone

f4f14c2

Merge remote-tracking branch 'origin/main' into secondary_cache_cleanup

4ed47a4

Re-fix standalone

c81ddaa

Merge remote-tracking branch 'origin/main' into secondary_cache_cleanup

a8da412

anand1976 approved these changes Sep 29, 2022

View reviewed changes

pdillinger added 2 commits October 3, 2022 13:08

Merge branch 'main' of github.com:facebook/rocksdb into secondary_cac…

c52dc7f

…he_cleanup

Fix async lookup comments

f58d260

facebook-github-bot closed this in 5f4391d Oct 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some clean-up of secondary cache #10730

Some clean-up of secondary cache #10730

pdillinger commented Sep 24, 2022 •

edited

Loading

facebook-github-bot commented Sep 26, 2022

gitbw95 Sep 26, 2022

pdillinger Sep 26, 2022

gitbw95 Sep 26, 2022

pdillinger Sep 26, 2022

gitbw95 Sep 26, 2022

gitbw95 Sep 26, 2022

pdillinger Sep 26, 2022

pdillinger Sep 26, 2022

gitbw95 left a comment

facebook-github-bot commented Sep 27, 2022

facebook-github-bot commented Sep 27, 2022

facebook-github-bot commented Sep 27, 2022

facebook-github-bot commented Sep 27, 2022

anand1976 left a comment

anand1976 Sep 29, 2022

facebook-github-bot commented Oct 3, 2022

facebook-github-bot commented Oct 3, 2022

		@@ -579,12 +576,10 @@ Cache::Handle* LRUCacheShard::Lookup(
		e->Ref();
		e->SetIsInSecondaryCache(is_in_sec_cache);

Some clean-up of secondary cache #10730

Some clean-up of secondary cache #10730

Conversation

pdillinger commented Sep 24, 2022 • edited Loading

facebook-github-bot commented Sep 26, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gitbw95 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Sep 27, 2022

facebook-github-bot commented Sep 27, 2022

facebook-github-bot commented Sep 27, 2022

facebook-github-bot commented Sep 27, 2022

anand1976 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 3, 2022

facebook-github-bot commented Oct 3, 2022

pdillinger commented Sep 24, 2022 •

edited

Loading