Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional/Delayed Eviction #147

Open
UnknownEclipse opened this issue Jun 11, 2022 · 1 comment
Open

Conditional/Delayed Eviction #147

UnknownEclipse opened this issue Jun 11, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@UnknownEclipse
Copy link

Hi! First, thanks so much for building moka! I recently encountered a situation where, when storing large cached values in Arc, the cache would evict an entry while it was still shared, so getting it again would result in a cache miss, despite that object still being alive elsewhere. A rough idea I had for countering this is to allow setting a conditional eviction callback in the cache builder with the signature Fn(&K, &T) -> bool. If an entry is set for eviction, the callback is invoked, and if false is returned that entry is kept in the cache. (In the common Arc case, this would be |_, v| 1 < Arc::strong_count(v)). Entries that skip eviction could be added to a separate list that is revisited each time an item is evicted from the cache, with the potential to make it adaptive (ex. items that are skipped repeatedly aren't checked with every iteration, saving the potential overhead for long-lived entries).

This is all a very rough sketch, but it could be helpful in high-performance scenarios.

@UnknownEclipse UnknownEclipse changed the title Conditional Eviction Callback Conditional Eviction Jun 11, 2022
@UnknownEclipse UnknownEclipse changed the title Conditional Eviction Conditional/Delayed Eviction Jun 11, 2022
@tatsuya6502
Copy link
Member

tatsuya6502 commented Jun 22, 2022

Hi. Sorry for the late reply, and thank you for the request. Your idea seems feasible to me.

Before we talk about your idea, let me explain how Moka cache works today.

When cache's capacity is exceeded, there are two timings when entry can be evicted:

  1. Right after the insertion: An entry can be evicted by the historic LFU (Least Frequently Used) filter.
    • This filter prevents unpopular key to get into the cache.
  2. Some time after the insertion: An entry can be evicted after dropping off from the LRU (Least Recently Used) queue.

Please see the TinyLFU diagram in this section of the wiki page.

TinyLFU policy may not work very well for certain access patterns. I wonder if you are hitting this problem. The access pattern is like the followings:

  • Inserting an entry whose key has never been accessed before.
  • But once inserted, the key will be accessed often for a while.

While the cache has enough capacity, everything will be okay because new entries will be admitted anyway. After the admission, they will be accessed so they will build up popularity. However, once the cache becomes full, this will become a problem. New entries will not be admitted because their keys were never accessed before; they are less popular than the old ones. They will be evicted immediately after the insertion.

In the future, we will upgrade TinyLFU to W-TinyLFU, which has an admission LRU window in front of the LFU filter. W-TinyLFU will work better for the access pattern above because the LRU window will help the new entries to build up the popularity.

If you are hitting this problem, W-TinyLFU will mitigate it. W-TinyLFU will work automatically for all users; no need to set the callback. So I wonder which one we should implement first: W-TinyLFU? or your conditional eviction idea?

One could argue that W-TinyLFU still does not guarantee that an entry is not evicted while it is alive. (e.g. the entry has very long life)

If we implement your idea, we could add two pending eviction queues to the cache: one for 1., and another for 2. If the callback returned false on an entry, that entry will be added to one of these queues and kept in the cache.

Also, there are other cases when entry will be removed from a cache, so we need to decide what to do for such cases:

  • An entry can be expired when TTL (time-to-live) or TTI (time-to-idle) timer exceeded:
    • With the current spec, get operation never returns expired entry.
    • Should the callback prevent the entry from expiring?
      • We could add a config flag to the cache builder to customize this behavior.
  • An entry can be replaced by another insert on the same key:
    • With the current spec, insert always succeeds and replaces the old entry.
      • insert has no return value; no way to tell when insert is failed.
    • So, should we keep the current insert behavior by ignoring the callback?
    • Note that one can use get_with to avoid replacing an existing entry.
  • An entry can be invalidated by invalidate, invalidate_all, etc.:
    • With the current spec, invalidate etc. always succeeds and removes the entry.
      • invalidate has no return value; no way to tell when invalidate is failed.
    • So, should we keep the current invalidate behavior by ignoring the callback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants