-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards a production-quality ClockCache #10418
Conversation
cache/cache_test.cc
Outdated
if (type == kFast || type == kClock) { | ||
ROCKSDB_GTEST_BYPASS("FastLRUCache and ClockCache require 16-byte keys."); | ||
return; | ||
} | ||
|
||
// cache is std::shared_ptr and will be automatically cleaned up. | ||
const uint64_t kCapacity = 200000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use size_t
, to keep 32-bit compilers happy
autovector<ClockHandle> deleted; | ||
uint32_t max_iterations = | ||
1 + static_cast<uint32_t>(GetTableSize() * kLoadFactor); | ||
ClockHandle::ClockPriority::HIGH * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may take up to HIGH passes over the array to evict an item. I will add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the shift by kClockPriorityOffset seems unnecessarily and confusingly intermingled.
cache/clock_cache.h
Outdated
@@ -274,14 +278,18 @@ struct ClockHandle { | |||
|
|||
std::atomic<uint32_t> refs; | |||
|
|||
// True iff the handle is allocated separately from hash table. | |||
bool dangling; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like "dangling" to describe this kind of handle, because something "dangling" is loosely attached. To me that sounds more like erased entries. How about "stopgap"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "detached"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. To me, "stopgap" is more associated with rare and degraded functionality (can't lookup the entry) than "detached." But not a big difference.
cache/clock_cache.cc
Outdated
if (occupancy_local + 1 > table_.GetOccupancyLimit()) { | ||
// Even if the user wishes to overload the cache, we can't insert into | ||
// the hash table. Instead, we dynamically allocate a new handle. | ||
h = reinterpret_cast<ClockHandle*>(new ClockHandle()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test comment.
Test comment. |
double load_factor = | ||
std::min(fast_lru_cache::kLoadFactor, clock_cache::kLoadFactor); | ||
for (int i = 0; i < 2 * static_cast<int>(kCacheSize / load_factor); i++) { | ||
for (int i = 0; i < 100 * kCacheSize; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the load_factor thing so we don't rely so specifically on implementation details.
e->SetHit(); | ||
// The handle is now referenced, so we take it out of clock. | ||
ClockOff(e); | ||
e->InternalToExternalRef(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not for correctness. I want the following invariant to hold: with an external reference, no field is modified.
cache/clock_cache.cc
Outdated
@@ -358,6 +356,7 @@ ClockCacheShard::ClockCacheShard( | |||
size_t capacity, size_t estimated_value_size, bool strict_capacity_limit, | |||
CacheMetadataChargePolicy metadata_charge_policy) | |||
: strict_capacity_limit_(strict_capacity_limit), | |||
dangling_usage_(0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the usage by elements that have been dynamically allocated separately from the table.
if (handle == nullptr) { | ||
// Don't insert the entry but still return ok, as if the entry inserted | ||
// into cache and get evicted immediately. | ||
deleted.push_back(tmp); | ||
tmp.FreeData(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugfix.
cache/clock_cache.cc
Outdated
// Free space with the clock policy until enough space is freed or there are | ||
// no evictable elements. | ||
table_.ClockRun(tmp.total_charge); | ||
table_.ClockRun(tmp.total_charge + dangling_usage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClockHandleTable is oblivious to dangling handles.
new (cache_) fast_lru_cache::LRUCacheShard( | ||
capacity, 1 /*estimated_value_size*/, false /*strict_capacity_limit*/, | ||
kDontChargeCacheMetadata); | ||
cache_ = reinterpret_cast<LRUCacheShard*>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastLRUCache tests didn't change. Only cosmetic changes.
// ASSERT_TRUE(in_high_pri_pool); | ||
// ASSERT_EQ(num_high_pri_pool_keys, high_pri_pool_keys); | ||
// } | ||
size_t CalcEstimatedHandleChargeWrapper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ported tests from FastLRUCache.
autovector<ClockHandle> deleted; | ||
uint32_t max_iterations = | ||
1 + static_cast<uint32_t>(GetTableSize() * kLoadFactor); | ||
ClockHandle::ClockPriority::HIGH * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may take up to HIGH passes over the array to evict an item. I will add a comment.
cache/clock_cache.h
Outdated
@@ -274,14 +278,18 @@ struct ClockHandle { | |||
|
|||
std::atomic<uint32_t> refs; | |||
|
|||
// True iff the handle is allocated separately from hash table. | |||
bool dangling; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "detached"?
cache/clock_cache.cc
Outdated
if (occupancy_local + 1 > table_.GetOccupancyLimit()) { | ||
// Even if the user wishes to overload the cache, we can't insert into | ||
// the hash table. Instead, we dynamically allocate a new handle. | ||
h = reinterpret_cast<ClockHandle*>(new ClockHandle()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM
autovector<ClockHandle> deleted; | ||
uint32_t max_iterations = | ||
1 + static_cast<uint32_t>(GetTableSize() * kLoadFactor); | ||
ClockHandle::ClockPriority::HIGH * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the shift by kClockPriorityOffset seems unnecessarily and confusingly intermingled.
cache/clock_cache.cc
Outdated
@@ -530,6 +550,20 @@ bool ClockCacheShard::Release(Cache::Handle* handle, bool erase_if_last_ref) { | |||
} | |||
|
|||
ClockHandle* h = reinterpret_cast<ClockHandle*>(handle); | |||
|
|||
if (h->IsDangling()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice observation, like other immutable fields. And in normal operation, it's a very predictable branch.
Nit: without historical information, IIRC default branch prediction usually goes into an if
rather than skipping over it (because on average, jumps are quite in the negative direction). We have a hint mechanism for cases like this: if (UNLIKELY(...))
. Feel free to use LIKELY and UNLIKELY in more places when there's an "almost always" direction.
cache/clock_cache.h
Outdated
@@ -274,14 +278,18 @@ struct ClockHandle { | |||
|
|||
std::atomic<uint32_t> refs; | |||
|
|||
// True iff the handle is allocated separately from hash table. | |||
bool dangling; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with me. To me, "stopgap" is more associated with rare and degraded functionality (can't lookup the entry) than "detached." But not a big difference.
@guidotag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@guidotag has updated the pull request. You must reimport the pull request before landing. |
@guidotag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@guidotag has updated the pull request. You must reimport the pull request before landing. |
@guidotag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@guidotag has updated the pull request. You must reimport the pull request before landing. |
@guidotag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@guidotag has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: In this PR we bring ClockCache closer to production quality. We implement the following changes:
strict_capacity_limit == false
: When an insertion over capacity is commanded, we allocate a handle separately from the hash table.As a byproduct of 1. and 2. we are able to turn on ClockCache in the stress tests.
Test plan:
make -j24 USE_CLANG=1 COMPILE_WITH_ASAN=1 COMPILE_WITH_UBSAN=1 check
make -j24 USE_CLANG=1 COMPILE_WITH_TSAN=1 check
make -j24 USE_CLANG=1 COMPILE_WITH_ASAN=1 COMPILE_WITH_UBSAN=1 CRASH_TEST_EXT_ARGS="--duration=960 --cache_type=clock_cache" blackbox_crash_test_with_atomic_flush
make -j24 USE_CLANG=1 COMPILE_WITH_TSAN=1 CRASH_TEST_EXT_ARGS="--duration=960 --cache_type=clock_cache" blackbox_crash_test_with_atomic_flush