Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some clean-up of secondary cache #10730

Closed
wants to merge 10 commits into from

Conversation

pdillinger
Copy link
Contributor

@pdillinger pdillinger commented Sep 24, 2022

Summary:
This is intended as a step toward possibly separating secondary cache integration from the
Cache implementation as much as possible, to (hopefully) minimize code duplication in
adding secondary cache support to HyperClockCache.

  • Major clarifications to API docs of secondary cache compatible parts of Cache. For example, previously the docs seemed to suggest that Wait() was not needed if IsReady()==true. And it wasn't clear what operations were actually supported on pending handles.
  • Add some assertions related to these requirements, such as that we don't Release() before Wait() (which would leak a secondary cache handle).
  • Fix a leaky abstraction with dummy handles, which are supposed to be internal to the Cache. Previously, these just used value=nullptr to indicate dummy handle, which meant that they could be confused with legitimate value=nullptr cases like cache reservations. Also fixed blob_source_test which was relying on this leaky abstraction.
  • Drop "incomplete" terminology, which was another name for "pending".
  • Split handle flags into "mutable" ones requiring mutex and "immutable" ones which do not. Because of single-threaded access to pending handles, the "Is Pending" flag can be in the "immutable" set. This allows removal of a TSAN work-around and removing a mutex acquire-release in IsReady().
  • Remove some unnecessary handling of charges on handles of failed lookups. Keeping total_charge=0 means no special handling needed. (Removed one unnecessary mutex acquire/release.)
  • Simplify handling of dummy handle in Lookup(). There is no need to explicitly Ref & Release w/Erase if we generally overwrite the dummy anyway. (Removed one mutex acquire/release, a call to Release().)

Intended follow-up:

  • Clarify APIs in secondary_cache.h
    • Doesn't SecondaryCacheResultHandle transfer ownership of the Value() on success (implementations should not release the value in destructor)?
    • Does Wait() need to be called if IsReady() == true? (This would be different from Cache.)
    • Do Value() and Size() have undefined behavior if IsReady() == false?
    • Why have a custom API for what is essentially a std::future<std::pair<void*, size_t>>?
  • Improve unit testing of standalone handle case
  • Apparent null e bug in free_standalone_handle case
  • Clean up secondary cache testing in lru_cache_test
    • Why does TestSecondaryCacheResultHandle hold on to a Cache::Handle?
    • Why does TestSecondaryCacheResultHandle::Wait() do nothing? Shouldn't it establish the post-condition IsReady() == true?
    • (Assuming that is sorted out...) Shouldn't TestSecondaryCache::WaitAll simply wait on each handle in order (no casting required)? How about making that the default implementation?
    • Why does TestSecondaryCacheResultHandle::Size() check Value() first? If the API is intended to be returning 0 before IsReady(), then that is weird but should at least be documented. Otherwise, if it's intended to be undefined behavior, we should assert IsReady().
  • Consider replacing "standalone" and "dummy" entries with a single kind of "weak" entry that deletes its value when it reaches zero refs. Suppose you are using compressed secondary cache and have two iterators at similar places. It will probably common for one iterator to have standalone results pinned (out of cache) when the second iterator needs those same blocks and has to re-load them from secondary cache and duplicate the memory. Combining the dummy and the standalone should fix this.

Test Plan: existing tests (minor update), and crash test with sanitizers and secondary cache

Performance test for any regressions in LRUCache (primary only):
Create DB with

TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16

Test before & after (run at same time) with

TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X100] -readonly -num=30000000 -bloom_bits=16 -cache_index_and_filter_blocks=1 -cache_size=233000000 -duration 30 -threads=16

Before: readrandom [AVG 100 runs] : 22234 (± 63) ops/sec; 1.6 (± 0.0) MB/sec
After: readrandom [AVG 100 runs] : 22197 (± 64) ops/sec; 1.6 (± 0.0) MB/sec
That's within 0.2%, which is not significant by the confidence intervals.

Summary:
* Major clarifications to API docs of secondary cache compatible parts
of Cache. For example, previously the docs seemed to suggest that Wait()
was not needed if IsReady()==true. And it wasn't clear what operations
were actually supported on pending handles.
* Add some assertions related to these requirements, such as that we
don't Release() before Wait() (which would leak a secondary cache
handle).
* Fix a leaky abstraction with dummy handles, which are supposed to be
internal to the Cache. Previously, these just used value=nullptr to
indicate dummy handle, which meant that they could be confused with
legitimate value=nullptr cases like cache reservations. Also fixed
blob_source_test which was relying on this leaky abstraction.
* Drop "incomplete" terminology, which was another name for "pending".
* Remove unnecessary "standalone" flag on a handle. It can be a simple
internal parameter instead.
* Split handle flags into "mutable" ones requiring mutex and "immutable"
ones which do not. Because of single-threaded access to pending handles,
the "Is Pending" flag can be in the "immutable" set. This allows removal
of a TSAN work-around and removing a mutex acquire-release in IsReady().
* Remove some unnecessary handling of charges on handles of failed
lookups. Keeping total_charge=0 means no special handling needed.
(Removed one unnecessary mutex acquire/release.)
* Simplify handling of dummy handle in Lookup(). There is no need to
explicitly Ref & Release w/Erase if we generally overwrite the dummy
anyway. (Removed one mutex acquire/release, a call to Release().)

Test Plan: existing tests (minor update), and (TODO) crash test with
sanitizers and secondary cache
@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@@ -579,12 +576,10 @@ Cache::Handle* LRUCacheShard::Lookup(
e->Ref();
e->SetIsInSecondaryCache(is_in_sec_cache);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. For a standalone handle , wait can be either true or false.
  2. If wait is false, the Promote() in WaitAll should still promote it as a standalone handle; but
    In WaitAll():
    shard->Promote(lru_handle, /standalone=/false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, oops I missed that path

found_dummy_entry = true;
// Let the dummy entry be overwritten
e = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may cause an ASAN issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table_.Lookup(key, hash) does not acquire a reference

IN_LOW_PRI_POOL = (1 << 8),
// Whether this entry is not inserted into the cache (both hash table and
// LRU list).
IS_STANDALONE = (1 << 9),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works after removing IS_STANDALONE, pls update the comment at line 43.

Cache::Priority priority =
e->IsHighPri() ? Cache::Priority::HIGH : Cache::Priority::LOW;
s = Insert(e->key(), e->hash, /*value=*/nullptr, 0,
s = Insert(e->key(), e->hash, kDummyValueMarker, /*charge=*/0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kDummyValueMarker is 18 characters long, so the charge should be 18; otherwise, this becomes a hidden memory usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kDummyValueMarker[...] is static data. There is no incremental hidden memory usage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(aside from the pre-existing limitation of not accounting for metadata used by dummy handles)

Copy link
Contributor

@gitbw95 gitbw95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// its not ready. The caller should then call Value() to check if the
// item was successfully retrieved. If unsuccessful (perhaps due to an
// IO error), Value() will return nullptr.
// ======================== Async Lookup (wait=true) ======================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean "wait==false" here and below? If wait==true, then its a sync call and handle will never be pending.

@facebook-github-bot
Copy link
Contributor

@pdillinger has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants