Skip to content

Improve LRU cache's performance#7611

Open
acelyc111 wants to merge 2 commits intofacebook:mainfrom
acelyc111:lru_optimize
Open

Improve LRU cache's performance#7611
acelyc111 wants to merge 2 commits intofacebook:mainfrom
acelyc111:lru_optimize

Conversation

@acelyc111
Copy link
Contributor

When LRUCache insert and evict a large number of entries, there are
frequently calls of LRUHandleTable::Remove(e->key, e->hash), it will
lookup the entry in the hash table. Now that we know the entry to
remove 'e', we can remove it directly from hash table's collision list
if it's a double linked list.
This patch refactor the collision list to double linked list, the simple
benchmark CacheTest.SimpleBenchmark shows that time cost reduced about
18% in my test environment.

@acelyc111 acelyc111 force-pushed the lru_optimize branch 4 times, most recently from cc8183d to 341b5fa Compare October 31, 2020 09:40
@acelyc111 acelyc111 marked this pull request as ready for review November 1, 2020 16:23
@acelyc111
Copy link
Contributor Author

@mrambacher could you please review this PR? thanks!

@pdillinger
Copy link
Contributor

If we took the space spent on reverse links and instead dedicated that to reducing load factor on the hash table, how much speed improvement is seen?

Btw, we know a lot of improvement in LRUCache should be possible, especially because it uses a lot of unpredictable branching and indirection.

Copy link
Contributor

@mrambacher mrambacher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ran some performance tests via db_bench (seekrandom, readrandom) using these changes. The performance improvement is about 1%.

Perf runs of the new versus original code shows:
New:
3.86% 0.12% ShardedCache::Insert
3.52% 0.13% LRUCacheShard::Insert
3.01% 0.02% LRUCacheShard::Lookup
2.83% 2.83% LRUHandleTable::FindPointer
1.66% 0.27% LRUCacheShard::EvictFromLRU
1.34% 1.34% LRUCacheShared::LRU_Remove

Original:
4.53% 0.13% ShardedCache::Insert
4.13% 0.15% LRUCacheShard::Insert
2.00% 0.02% LRUCacheShard::Lookup
2.31% 2.31% LRUHandleTable::FindPointer
1.99% 0.10% LRUCacheShard::EvictFromLRU
1.34% 1.34% LRUCacheShared::LRU_Remove

Note that in this test, perf is showing most of the time is in ZSTD compression.

I think making a change like this is probably worthwhile, but I wonder if the same thing can be accomplished without the extra "prev" pointer. For example, what if FindPointer would return both the current and prev entry -- could you accomplish the same thing? The size of the metadata for the cache entry should be kept to a minimum.

Have you done any experiments or calculations that show how the change in the metadata size affects the number of entries that can be stored in the cache? Are items evicted more frequently and at what rate/size?

I support this performance improvement, but would like to better understand the tradeoffs we are making here.


namespace ROCKSDB_NAMESPACE {

class LRUHandleTableTest : public testing::Test {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to put these tests in the existing lru_cache_test file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to keep these tests independent, lru_cache_test for LRUCacheShard and cache_hash_table_test for internal LRUHandleTable.

@siying
Copy link
Contributor

siying commented Dec 30, 2020

Do we supposed to resize the hash table to more than 1.5 X number of elements? There aren't supposed to be many hash table buckets with more than 1-2 elements, right? Can you measure how frequently we remove an elements that is in the third position or lower in your test that shows the improvement?

@mrambacher
Copy link
Contributor

Do we supposed to resize the hash table to more than 1.5 X number of elements? There aren't supposed to be many hash table buckets with more than 1-2 elements, right? Can you measure how frequently we remove an elements that is in the third position or lower in your test that shows the improvement?

For kicks, I am running some experiments now to test this out using the seekrandom test. I can see the depth of the list reaching 8 deep. Here is a couple samples of the depths. Each element in depth represents a count of how many lists had that depth for a given table (e. g., the first table had 2981 lists with no elements, 3062 with 1, 1511 with 2, etc).
length=8192 depth=2981/3062/1511/477/132/26/2/0
length=8192 depth=2988/3057/1491/500/129/24/3/0
length=8192 depth=3011/3012/1522/505/101/31/7/3
length=8192 depth=3015/3008/1523/480/138/23/3/2
length=8192 depth=3031/2967/1531/509/136/15/3/0
length=8192 depth=3022/2987/1536/493/125/21/6/2
length=8192 depth=3023/3007/1481/532/120/27/1/1

So a rough estimate is that 8-10% of the lists within a table have a depth of >=3 and 2% of the lists have a depth >= 4.

@siying
Copy link
Contributor

siying commented Dec 30, 2020

Do we supposed to resize the hash table to more than 1.5 X number of elements? There aren't supposed to be many hash table buckets with more than 1-2 elements, right? Can you measure how frequently we remove an elements that is in the third position or lower in your test that shows the improvement?

For kicks, I am running some experiments now to test this out using the seekrandom test. I can see the depth of the list reaching 8 deep. Here is a couple samples of the depths. Each element in depth represents a count of how many lists had that depth for a given table (e. g., the first table had 2981 lists with no elements, 3062 with 1, 1511 with 2, etc).
length=8192 depth=2981/3062/1511/477/132/26/2/0
length=8192 depth=2988/3057/1491/500/129/24/3/0
length=8192 depth=3011/3012/1522/505/101/31/7/3
length=8192 depth=3015/3008/1523/480/138/23/3/2
length=8192 depth=3031/2967/1531/509/136/15/3/0
length=8192 depth=3022/2987/1536/493/125/21/6/2
length=8192 depth=3023/3007/1481/532/120/27/1/1

So a rough estimate is that 8-10% of the lists within a table have a depth of >=3 and 2% of the lists have a depth >= 4.

Interesting. If we change these two thresholds: https://github.com/facebook/rocksdb/blob/master/cache/lru_cache.cc#L73-L75 and https://github.com/facebook/rocksdb/blob/master/cache/lru_cache.cc#L44 to keep the hash table larger, what would be the result?

@mrambacher
Copy link
Contributor

Another interesting point is that there is approximately 35% of the lists that contain no elements. From my print statements, it appears as if that percentage stays roughly constant as the size of the list increases (as more elements are added to the cache).

@acelyc111
Copy link
Contributor Author

I think making a change like this is probably worthwhile, but I wonder if the same thing can be accomplished without the extra "prev" pointer. For example, what if FindPointer would return both the current and prev entry -- could you accomplish the same thing? The size of the metadata for the cache entry should be kept to a minimum.

Seems not possible to implement it like that, I aim to reduce list iteration when call Remove(LRUHandle* h), and some callers like EraseUnRefEntries, EvictFromLRU and Release can remove the entry directly, FindPointer is not called in these palces, and that's what I want to avoid.

@acelyc111
Copy link
Contributor Author

acelyc111 commented Jan 4, 2021

Have you done any experiments or calculations that show how the change in the metadata size affects the number of entries that can be stored in the cache? Are items evicted more frequently and at what rate/size?

8 bytes of LRUHandle* prev_hash is introduced to original 72 bytes ( size of struct LRUHandle),
there is a simple formula as how capacity calculate in https://github.com/facebook/rocksdb/blob/master/cache/lru_cache.h#L134:

extra_space_rate = 8 / (72 - 1 + key_lenght + charge)   // charge is roughly the value length
                 ~= 8 / (71 + kv_lenght)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants