Adaptive TinyLFU #54

ben-manes · 2021-09-03T04:49:42Z

ben-manes
Sep 3, 2021

Goal

Simplify deployment requirements by reducing the need to select the best eviction policy, which is workload dependent and can change over the lifetime of an application. Instead, use simple algorithmic techniques to dynamically reconfigure the cache for the current workload. This allows the cache to robustly make near optimal eviction choices in a wider range of workload patterns.

Context

The TinyLFU implementation uses a static configuration between the tiny / main regions. Per the paper, this is defaulted to 1% / 99% to favor frequency-biased workloads, such as databases and search engines. However, some workloads are highly skewed towards recency such as blockchain mining and social networks. See for example Twitter's data sets where LRU is near optimal. In such cases the frequency filter can degrade the hit ratio, as shown below.

The implementation states that this static parameter does not need to be tuned. While some users might realize their workload bias and choose a different eviction policy, ideally the algorithm is intelligent to discover the optimal setting. In the Caffeine library, this is done by using hill climbing (short article, paper).

Suggested Approach

Use simple hill climbing to guess at a new configuration, sample the hit rate, calculate a new step size based on if the change was an improvement, adjust, and repeat. In Caffeine the initial step size is 6.25% and it decays at a rate of 0.98 so that the policy converges (rather than oscillates around) the best configuration. This process restarts if the hit rate changes by 5% or more. The sample period should be large enough to avoid a noisy hit rate and can piggyback on the reset interval for decaying the access frequency counters. As shown below, this approach can handle highly skewed workloads that change with the environment.

Success Metrics

Achieve a high hit ratio in LRU-biased workloads
Achieve a high hit ratio in LFU / MRU biased workloads, like databases and search engines.
Dynamically reconfigure when the workload pattern changes, throughout the lifetime of the cache, to optimize for the new environment.

Additional Suggestions

By default, the CountMinSketch uses uint32 counters. This can be reduced to 4-bit counters without degrading the hit rate, as the goal is to compare entries to determine if one is hotter than the other. That can be further reduced by incorporating the doorkeeper mechanism, a bloom filter, so that fewer counters are needed.
The concurrency model requires locking to update the eviction policy. If embedded as a local cache this might observe contention, hence a tryLock is used. A ring buffer to sample the access history is a more efficient way to this as it reduces the CASes required (short article).
The item's allocation size can be leveraged by the admission policy to improve the hit rate (paper).
Incorporate jitter if the policy is subject to hash flooding attacks.
Simulate workloads against the Java implementation to validate that the policy achieves a similar hit rate.

haowu14 · 2021-09-08T20:40:46Z

haowu14
Sep 8, 2021
Collaborator

Hi @ben-manes , thank you for the detailed post!

I think it's a good idea to add the support of hill climbing window size change into W-TinyLFU. It's unlikely that we can implement it any time soon since TinyLFU does not have a lot of use case in Facebook. But we are more than happy to review if you can send a pull request. We can also help supply traces for evaluation in cachebench.

Regarding the additional suggestions:

CacheLib supports swappable CMS implementations down to 8 bits. For 4 bits, it is quite a bit of complexity compared to the saved memory (of course this varies by systems). On the other hand I'm curious about bloom filter. Is there evidence showing bloom filter can be as good as, or achieves a good tradeoff compared to CMS? CacheLib does have a bloom filter implemented and we can add a type parameter to have the user specify whether they want to use bloom filter or which CMS.
The way CacheLib DRAM LRUs handle hot updates is to set an interval of lruRefreshTime. Accesses on the same key occurring within this time interval won't update the policy more than once. I think a ring buffer can complement this but we are able to get around with this time interval.
Currently CacheLib DRAM cache is slab based. So to adopt to size-aware memory management would probably be a very long term change that we need to look carefully into. Or did I misunderstand your point here?
I think this could be added as an option in TinyLFU.
Absolutely!

It's nice to see you here Ben! I also worked at Addepar a while back on Greenbaums team and had the chance to take over a lot of your code in the financial graph.

0 replies

ben-manes · 2021-09-08T21:44:11Z

ben-manes
Sep 8, 2021
Author

It's unlikely that we can implement it any time soon... But we are more than happy to review if you can send a pull request.

Unfortunately I am not much of a C++ programmer, so I'd leave this to someone else if interest. What is nice is that this avoids needing developers to think about the eviction policy, since those differences mostly goes away.

On that note, your 2Q policy is misnamed. The algorithm by Johnson & Shasha uses a non-resident queue, whereas cachelib implemented OpenBSD's TU-Q policy named after Ted Unangst.

I'm curious about bloom filter. Is there evidence showing bloom filter can be as good as, or achieves a good tradeoff compared to CMS?

This is actually in addition to the CMS. The idea is to avoid polluting the CMS with one-valued counters, which increases the error rate and requires more counters to compensate. Instead the CMS counters are only incremented if found in the BF. For large caches this reduces the CMS space, which at 100Ms+ entries could be a benefit. Caffeine doesn't use this trick, though, because as an application cache it is usually small enough to not be worth the trouble.

The way CacheLib DRAM LRUs handle hot updates is to set an interval of lruRefreshTime.

Ahh, thanks for pointing that out. This is what memcached does too. A flaw is that it is coupled to the entry, so operations like row scans would trigger many tryLock attempts. The docs do discuss lock contention problems, so if that does become a problem again then that is a technique to consider.

Currently CacheLib DRAM cache is slab based. So to adopt to size-aware memory management would probably be a very long term change that we need to look carefully into. Or did I misunderstand your point here?

Nope, thanks for pointing that out.

It's nice to see you here Ben! I also worked at Addepar a while back on Greenbaums team and had the chance to take over a lot of your code in the financial graph.

ha, what a mess. I hope you don't hold that against me, I inherited much of it. My "starter project" was to make the graph distributed instead of running all of Addepar on one big machine. I created a transaction log, replayed to "catch-up" the nodes to allow for snapshot reads, a persistent data structure for concurrent readers, and used select-for-update to exclusively row-level lock the graph during a write transaction. Since everything was dependent on that logic in a large code base, but which was designed as a SPOF, that solution gave enough breathing room to allow for an eventual rebuild. I hope by now that situation is a lot better. Anyway, small world! 🙂

0 replies

haowu14 · 2021-09-09T21:55:05Z

haowu14
Sep 9, 2021
Collaborator

I'll leave this open then.
The trick of bloom filter on top of CMS is interesting. We do have the interest to reduce the overhead of how access history is tracked in another application of CMS.

Anyway, small world! 🙂

Sure it is! AMP is a piece of art. Let's leave it there.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptive TinyLFU #54

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Adaptive TinyLFU #54

ben-manes Sep 3, 2021

Goal

Context

Suggested Approach

Success Metrics

Additional Suggestions

Replies: 3 comments

haowu14 Sep 8, 2021 Collaborator

ben-manes Sep 8, 2021 Author

haowu14 Sep 9, 2021 Collaborator

ben-manes
Sep 3, 2021

haowu14
Sep 8, 2021
Collaborator

ben-manes
Sep 8, 2021
Author

haowu14
Sep 9, 2021
Collaborator