Skip to content

Commit

Permalink
Refactor ShardedCache for more sharing, static polymorphism (#10801)
Browse files Browse the repository at this point in the history
Summary:
The motivations for this change include
* Free up space in ClockHandle so that we can add data for secondary cache handling while still keeping within single cache line (64 byte) size.
  * This change frees up space by eliminating the need for the `hash` field by making the fixed-size key itself a hash, using a 128-bit bijective (lossless) hash.
* Generally more customizability of ShardedCache (such as hashing) without worrying about virtual call overheads
  * ShardedCache now uses static polymorphism (template) instead of dynamic polymorphism (virtual overrides) for the CacheShard. No obvious performance benefit is seen from the change (as mostly expected; most calls to virtual functions in CacheShard could already be optimized to static calls), but offers more flexibility without incurring the runtime cost of adhering to a common interface (without type parameters or static callbacks).
  * You'll also notice less `reinterpret_cast`ing and other boilerplate in the Cache implementations, as this can go in ShardedCache.

More detail:
* Don't have LRUCacheShard maintain `std::shared_ptr<SecondaryCache>` copies (extra refcount) when LRUCache can be in charge of keeping a `shared_ptr`.
* Renamed `capacity_mutex_` to `config_mutex_` to better represent the scope of what it guards.
* Some preparation for 64-bit hash and indexing in LRUCache, but didn't include the full change because of slight performance regression.

Pull Request resolved: #10801

Test Plan:
Unit test updates were non-trivial because of major changes to the ClockCacheShard interface in handling of key vs. hash.

Performance:
Create with `TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16`

Test with
```
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X1000] -readonly -num=30000000 -bloom_bits=16 -cache_index_and_filter_blocks=1 -cache_size=610000000 -duration 20 -threads=16
```

Before: `readrandom [AVG 150 runs] : 321147 (± 253) ops/sec`
After: `readrandom [AVG 150 runs] : 321530 (± 326) ops/sec`

So possibly ~0.1% improvement.

And with `-cache_type=hyper_clock_cache`:
Before: `readrandom [AVG 30 runs] : 614126 (± 7978) ops/sec`
After: `readrandom [AVG 30 runs] : 645349 (± 8087) ops/sec`

So roughly 5% improvement!

Reviewed By: anand1976

Differential Revision: D40252236

Pulled By: pdillinger

fbshipit-source-id: ff8fc70ef569585edc95bcbaaa0386f61355ae5b
  • Loading branch information
pdillinger authored and facebook-github-bot committed Oct 19, 2022
1 parent e267909 commit 7555243
Show file tree
Hide file tree
Showing 14 changed files with 806 additions and 879 deletions.
8 changes: 4 additions & 4 deletions cache/cache_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1023,21 +1023,21 @@ TEST_P(CacheTest, DefaultShardBits) {
(GetParam() == kHyperClock ? 32U * 1024U : 512U) * 1024U;

std::shared_ptr<Cache> cache = NewCache(32U * min_shard_size);
ShardedCache* sc = dynamic_cast<ShardedCache*>(cache.get());
ShardedCacheBase* sc = dynamic_cast<ShardedCacheBase*>(cache.get());
ASSERT_EQ(5, sc->GetNumShardBits());

cache = NewCache(min_shard_size / 1000U * 999U);
sc = dynamic_cast<ShardedCache*>(cache.get());
sc = dynamic_cast<ShardedCacheBase*>(cache.get());
ASSERT_EQ(0, sc->GetNumShardBits());

cache = NewCache(3U * 1024U * 1024U * 1024U);
sc = dynamic_cast<ShardedCache*>(cache.get());
sc = dynamic_cast<ShardedCacheBase*>(cache.get());
// current maximum of 6
ASSERT_EQ(6, sc->GetNumShardBits());

if constexpr (sizeof(size_t) > 4) {
cache = NewCache(128U * min_shard_size);
sc = dynamic_cast<ShardedCache*>(cache.get());
sc = dynamic_cast<ShardedCacheBase*>(cache.get());
// current maximum of 6
ASSERT_EQ(6, sc->GetNumShardBits());
}
Expand Down
Loading

0 comments on commit 7555243

Please sign in to comment.