Conditional/Delayed Eviction #147

UnknownEclipse · 2022-06-11T22:04:44Z

Hi! First, thanks so much for building moka! I recently encountered a situation where, when storing large cached values in Arc, the cache would evict an entry while it was still shared, so getting it again would result in a cache miss, despite that object still being alive elsewhere. A rough idea I had for countering this is to allow setting a conditional eviction callback in the cache builder with the signature Fn(&K, &T) -> bool. If an entry is set for eviction, the callback is invoked, and if false is returned that entry is kept in the cache. (In the common Arc case, this would be |_, v| 1 < Arc::strong_count(v)). Entries that skip eviction could be added to a separate list that is revisited each time an item is evicted from the cache, with the potential to make it adaptive (ex. items that are skipped repeatedly aren't checked with every iteration, saving the potential overhead for long-lived entries).

This is all a very rough sketch, but it could be helpful in high-performance scenarios.

The text was updated successfully, but these errors were encountered:

tatsuya6502 · 2022-06-22T10:00:53Z

Hi. Sorry for the late reply, and thank you for the request. Your idea seems feasible to me.

Before we talk about your idea, let me explain how Moka cache works today.

When cache's capacity is exceeded, there are two timings when entry can be evicted:

Right after the insertion: An entry can be evicted by the historic LFU (Least Frequently Used) filter.
- This filter prevents unpopular key to get into the cache.
Some time after the insertion: An entry can be evicted after dropping off from the LRU (Least Recently Used) queue.

Please see the TinyLFU diagram in this section of the wiki page.

TinyLFU policy may not work very well for certain access patterns. I wonder if you are hitting this problem. The access pattern is like the followings:

Inserting an entry whose key has never been accessed before.
But once inserted, the key will be accessed often for a while.

While the cache has enough capacity, everything will be okay because new entries will be admitted anyway. After the admission, they will be accessed so they will build up popularity. However, once the cache becomes full, this will become a problem. New entries will not be admitted because their keys were never accessed before; they are less popular than the old ones. They will be evicted immediately after the insertion.

In the future, we will upgrade TinyLFU to W-TinyLFU, which has an admission LRU window in front of the LFU filter. W-TinyLFU will work better for the access pattern above because the LRU window will help the new entries to build up the popularity.

If you are hitting this problem, W-TinyLFU will mitigate it. W-TinyLFU will work automatically for all users; no need to set the callback. So I wonder which one we should implement first: W-TinyLFU? or your conditional eviction idea?

One could argue that W-TinyLFU still does not guarantee that an entry is not evicted while it is alive. (e.g. the entry has very long life)

If we implement your idea, we could add two pending eviction queues to the cache: one for 1., and another for 2. If the callback returned false on an entry, that entry will be added to one of these queues and kept in the cache.

Also, there are other cases when entry will be removed from a cache, so we need to decide what to do for such cases:

An entry can be expired when TTL (time-to-live) or TTI (time-to-idle) timer exceeded:
- With the current spec, get operation never returns expired entry.
- Should the callback prevent the entry from expiring?
  - We could add a config flag to the cache builder to customize this behavior.
An entry can be replaced by another insert on the same key:
- With the current spec, insert always succeeds and replaces the old entry.
  - insert has no return value; no way to tell when insert is failed.
- So, should we keep the current insert behavior by ignoring the callback?
- Note that one can use get_with to avoid replacing an existing entry.
An entry can be invalidated by invalidate, invalidate_all, etc.:
- With the current spec, invalidate etc. always succeeds and removes the entry.
  - invalidate has no return value; no way to tell when invalidate is failed.
- So, should we keep the current invalidate behavior by ignoring the callback?

UnknownEclipse changed the title ~~Conditional Eviction Callback~~ Conditional Eviction Jun 11, 2022

UnknownEclipse changed the title ~~Conditional Eviction~~ Conditional/Delayed Eviction Jun 11, 2022

tatsuya6502 added the enhancement New feature or request label Jun 22, 2022

This was referenced Mar 31, 2023

Fix the silent panic in the cache eviction hook rendering it useless splitgraph/seafowl#334

Merged

Enable caching of S3 objects in the local FS splitgraph/seafowl#341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditional/Delayed Eviction #147

Conditional/Delayed Eviction #147

UnknownEclipse commented Jun 11, 2022

tatsuya6502 commented Jun 22, 2022 •

edited

Loading

Conditional/Delayed Eviction #147

Conditional/Delayed Eviction #147

Comments

UnknownEclipse commented Jun 11, 2022

tatsuya6502 commented Jun 22, 2022 • edited Loading

tatsuya6502 commented Jun 22, 2022 •

edited

Loading