Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add an eviction_callback parameter to the cache builder #128

Closed
fbernier opened this issue May 19, 2022 · 14 comments · Fixed by #145
Closed

Feature request: Add an eviction_callback parameter to the cache builder #128

fbernier opened this issue May 19, 2022 · 14 comments · Fixed by #145
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@fbernier
Copy link

Hey there, first and foremost thanks for moka!

I currently have this use case where I have other data structures I need to keep in sync with a moka cache. One way I though of doing this is the cache builder could support a new eviction_callback method which takes a closure that is called every time there is a cache eviction, with ideally both the key and value as parameters to the closure. Do you think that's feasible and if so, is this something you'd be interested in this being added to moka?

I envision something like:

pub fn eviction_callback<F>(&self, predicate: F)
where
    F: Fn(&K, &V) + Send + Sync + 'static,

or

pub fn eviction_callback<F>(&self, predicate: F)
where
    F: Fn(K, &V) + Send + Sync + 'static,
    K: Copy
@tatsuya6502 tatsuya6502 added the enhancement New feature or request label May 19, 2022
@tatsuya6502 tatsuya6502 self-assigned this May 19, 2022
@tatsuya6502
Copy link
Member

Hi. Yes, notification on evictions is on our road map ("feat: Notifications on eviction, etc."). I just moved it up to second on the road map list (from sixth). I will start to work on it in next week.

@tatsuya6502 tatsuya6502 added this to the v0.9.0 milestone May 26, 2022
@tatsuya6502 tatsuya6502 linked a pull request May 30, 2022 that will close this issue
@tatsuya6502
Copy link
Member

tatsuya6502 commented May 30, 2022

FYI. I started to work on this via the following pull request:

It is work in progress so still missing some notifications such as the one for expiration. But this unit test will illustrate how to use it.

@ben-manes
Copy link

ben-manes commented Jun 6, 2022

@tatsuya6502 I'm always excitedly curious to see what you do, so I took a peek at the draft.

I noticed that your eviction listener differs slightly in execution semantics to Caffeine's. I thought it might help if I clarify the history and intent of our api, though freely ignore our semantics.

The cache originally offered a RemovalListener for any cause, which is invoked outside of any cache locks. In CLHM / Guava this is on a calling thread by stashing into a mpmc queue during the write operation and all threads poll the queue at the end of their operation. The caller could defer that to a background pool themselves if expensive work. Caffeine defaults to be async by submitting to Java 8+ shared thread pool, so it avoids having an internal queue that threads might contend on. I'm not a fan of libraries silently creating threads, so this was taking advantage of new platform capability but might not be ideal in Rust. Regardless, when the event is processed and the order of notifications is unspecified.

A complaint is that sometimes the business logic requires key-based ordering of events that are performed synchronously with the operation. For example if removal deletes a local file then a race to add it back might find the file still exists and the listener leaves the cache in an inconsistent state (a cache entry but no file). Therefore, an ability to invoke the listener as part of the map operation was needed.

Since RemovalListener was already defined as non-blocking, we needed some new concept. At first I tried to borrow a concept from distributed caches, CacheWriter, but that was a bad fit, deprecated, and removed. Instead, I added a variant called EvictionListener that reuses the RemovalListener lambda interface and is synchronous. This variant is only called during an automatic removal, e.g. size eviction, and not on a manual (explicit) removal. For a callback on explicit removals, the Map.compute methods allow for rich functionality so a user can invoke their logic explicitly. This evolution minimized the new concepts and used familiar terms, but might not be ideal if designed with the api originally.

It might be that in a new api you would want to register a RemovalListener with a flag / builder method to specify if blocking or non-blocking. The non-blocking might optionally defer to a thread pool, either explicitly or implicitly depending on what is idiomatic to Rust. My guess is that you would prefer to avoid a threadpool, like Guava/CLHM, and require that users to defer it explicitly. Or you might have that pool anyway for async caching, so I am clueless here.

Anyway, I hope that helps you in your api design!

@tatsuya6502
Copy link
Member

Thanks @ben-manes for sharing your experience and insight. I am learning lots from them!

As you might have already seen, current implementation is similar (but not the same) to Caffeine's RemovalListener:

  • It is non-blocking.
  • It defers notifications to a thread pool.

and I noticed that unordered events for a key can cause the issue you said:

For example if removal deletes a local file then a race to add it back might find the file still exists and the listener leaves the cache in an inconsistent state (a cache entry but no file).

So I made the example in the doc to avoid the issue by generating a unique file name on every file creation. I was thinking to add some texts to explain why the file names have to be unique even for the same key.

It might be that in a new api you would want to register a RemovalListener with a flag / builder method to specify if blocking or non-blocking.

I think it is a great idea to have blocking or non-blocking option per cache. I have a feeling that adding blocking option is not very hard, though I have to add per-key writer lock to some places, which could be tricky. I will give it a try now.

The non-blocking might optionally defer to a thread pool, either explicitly or implicitly depending on what is idiomatic to Rust. My guess is that you would prefer to avoid a threadpool, like Guava/CLHM, and require that users to defer it explicitly.

Avoiding a thread pool by default (for non-blocking notification and other housekeeping tasks like entry expiration) is something I wanted to do on Moka cache. I will try to implement it for notification now (hope it is easy enough), and will deffer for other housekeeping tasks to a near future release.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Jun 13, 2022

HI @fbernier, Which cache are you using? (sync::Cache or future::Cache?)

I found that supporting blocking notification in future::Cache will require non-trivial amount of work. I will have to duplicate a couple thousands code with regular fns and sync locks and convert them into async fns and async locks. I am still checking if there are any other ways to minimize the changes though.

If you are using sync::Cache, I could soon release a version only with the eviction listener for it, and then a few weeks or a month later, I will release another version for future::Cache.

EDIT
Or if you are using future::Cache but you are okay with non-blocking notification, I could release such a version soon too.

@fbernier
Copy link
Author

hey sorry for the delay, my use-case with the eviction callback is currently with the sync cache, so all good for me on that front. Thanks!

@MrCroxx
Copy link

MrCroxx commented Jun 14, 2022

Hi folks, thanks for your efforts on building such a good lib. 🥰 I'm also a user that needs this feature with future::Cache. I'd like to try it first. Is using the branch of #145 okay now?

@tatsuya6502
Copy link
Member

Hi @fbernier,

my use-case with the eviction callback is currently with the sync cache, so all good for me on that front.

It is great to hear! Thanks.

@tatsuya6502
Copy link
Member

tatsuya6502 commented Jun 14, 2022

Hi @MrCroxx,

I'm also a user that needs this feature with future::Cache. I'd like to try it first. Is using the branch of #145 okay now?

Yes, the branch eviction-listener should be ready for you. I will add few more changes in next few days but it will be good enough now already.

moka = { git = "https://github.com/moka-rs/moka", branch = "eviction-listener", features = ["future"] }

To use the eviction listener, see the "Eviction Listener" example in the doc comment: https://github.com/moka-rs/moka/blob/eviction-listener/src/future/builder.rs#L55

You may have already read, but in coming release (v0.9.0), future::Cache will only support non-blocking notification mode. With this mode, notifications will be arrived in short delay after the time when the evictions actually occurred, and it will not work for some use cases very well. It may leave the cache and the external resources inconsistent.

For example:

Non-blocking notification mode

This mode does not guarantee that the notifications for a specific key are ordered.

(Tn means wall clock time)

  1. T1: Create a cache with time-to-live configuration.
  2. T2: Insert a key, and create an external resource.
  3. T3: The key has been expired, but the key may have not been removed from the cache.
  4. T4: Get the key and no value returned, but the key may have not been removed from the cache.
  5. T5: Insert the key again, and create the external resource again.
    • But it may notice the external resource from T2 still exists, or it does not notice and overwrites the resource.
    • This may replace the key (T2) in the cache with another (T5).
  6. T6: The eviction listener is called with the expiration event at T3 (actually triggered at T5).
    • The listener may remove the external resource created at T2 (correct) or T5 (wrong).
  7. T7: The key has been expired again.
  8. T8: The housekeeping thread removes the key from the cache.
  9. T9: The eviction listener is called with the expiration event at T7 (actually triggered at T8).
    • The listener may remove the external resource created at T5, but it may have been already removed at T6?

Blocking notification mode

This mode guarantees that the notifications for a specific key are ordered.

  1. T1: Create a cache with time-to-live configuration.
  2. T2: Insert a key, and create an external resource.
  3. T3: The key has been expired, but the key may have not been removed from the cache.
  4. T4: Get the key, and no value returned, but the key may have not been removed from the cache.
  5. T5: Insert the key again...
  6. T6: Before returning from the insertion at T5, the eviction listener is called with the expiration event at T3.
    • The listener will remove the external resource created at T2 (correct).
  7. T7: The insertion at T5 will return after the listener returned.
    • So it will create the external resource again.
  8. T8: The key has been expired again.
  9. T9: The housekeeping thread removes the key from the cache.
  10. T10: The eviction listener is called with the expiration event at T8 (actually triggered at T9).
  • The listener will remove the external resource created at T7 (correct).

To avoid the issues in above non-blocking example, you could assign unique names to the external resource created at T2 and T5. (See the example in the doc)

@ben-manes
Copy link

I had forgotten about async caches and synchronous removal listening in my earlier message. That’s a great reason to not support a generic synchronous listener for explicit removals. For evictions an in-flight future should probably be skipped so a synchronous callback is viable. That is how Caffeine gets around this, where futures are ineligible for eviction until complete.

@tatsuya6502
Copy link
Member

Sorry for some delay. I could not find enough time for working on this during weekdays.

But finally, I think #145 is now ready for merge. I will self-review it in tomorrow morning (I live in China mainland UTC+0800 and its 10pm now) and will merge it if I do not see any issue.

I implemented the blocking and non-blocking modes to the sync caches, and implemented only non-blocking mode to the future cache. To configure an eviction listener with a mode, I created an enum DeliveryMode with two variants Immediate, meaning blocking, and Queued, meaning non-blocking.

I could not implement non-worker threads version of the immediate/queued modes just because lack of time. So currently both rely on the worker threads.

@tatsuya6502
Copy link
Member

Closing this issue as I merged PR #145 into the next branch, and then merged the next branch into the master branch.

I will run some pre-release testing, and if everything goes well, I will publish Moka v0.9.0 to crates.io.

@tatsuya6502
Copy link
Member

@fbernier @MrCroxx

I have published Moka v0.9.0 to crates.io.

Documents:

@MrCroxx
Copy link

MrCroxx commented Jul 5, 2022

Thanks a lot! Happy to hear that and can't wait to try! 🥰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants