Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] 📢 Backplane #11

Closed
jodydonetti opened this issue Mar 29, 2021 · 16 comments
Closed

[FEATURE] 📢 Backplane #11

jodydonetti opened this issue Mar 29, 2021 · 16 comments
Assignees
Labels
enhancement New feature or request

Comments

@jodydonetti
Copy link
Collaborator

jodydonetti commented Mar 29, 2021

Scenario

In a multi-node scenario the typical multi-level cache configuration uses a different number of local memory caches and one distributed cache, used to share entries between the different nodes.

When an entry is set on a local memory cache it is also set in the distributed cache, so that other nodes will get the entry from there when they see it's not in their local memory cache.

Problem

A problem may arise when an entry is already in one or more nodes' memory cache and the entry is overwritten on another node: in this situation the memory cache for which the Set method has been called will be updated and the same can be said for the distributed cache, but the other nodes with the old entries would still use those old entries until they expire.

There are 2 ways to alleviate this situation:

  1. use a very low cache duration, but that in turn may increase the load of the data source (eg: a database)

  2. use a lower duration for the memory cache and a higher one for the distributed cache so that the shared (updated) entries are frequently read by the nodes, but that is not (currently) possible in FusionCache and may also lead to a potentially higher load on the distributed cache (instead of on the datasource), on top of still using stale data even if for shorter amount of time

Both of these solutions may be good in some use cases, and thanks to FusionCache combo of fail-safe and advanced timeouts with background factory completion the result for your end users would be good, but it's not a real solution to the problem.

Solution

The idea is to introduce the concept of a backplane which would allow a communication between all the nodes involved about update/removal of entries, so that they can stay up to date about the state of the system.

Design proposal

A new IFusionCacheBackplane interface to model a generic backplane, which could then be implemented in various ways on top of different systems.

It should contain a couple of core methods to notify the change or removal of an entry with 2 different semantics for them because:

  • an explicit remove on a node (eg: a call to Remove(key, ...)) should actually remove the entries on the other nodes to avoid finding a value that should not be there anymore

  • an update (eg: a call to Set(key, ...)) should not remove the entries on the other nodes but just mark those entries - if there - as "logically expired" (eg: change their FusionCacheEntryMetadata.LogicalExpiration) so that at the next access the factory would be executed to get the new value, while still keeping the ability to use the stale value in case of problems or timeouts during the factory execution, which is an added bonus of using FusionCache

It should be noted that directly sending the updated values with the notifications themselves is not considered for various reasons:

  • it would send the new value to every node, even the ones that do not have that entry in their memory cache
  • doing so would consume a lot of bandwidth unnecessarily
  • the same can be said for memory consumption
  • even the nodes which have that entry in their memory cache may not actually require the new value, since it may not be needed anymore before expiring completely

Additionally a small circuit-breaker like the one already present in FusionCache when talking to the distributed cache would be a nice addition, since the same problems of intermittent conection can potentially happen with the backplane.

Ideally I would also explore a form of batching to allow sending an invalidation notification for multiple keys at once, to save some bandwidth (but that may introduce a higher complexity in the codebase which I would like to keep as readable as possible).

First implementation

The first implementation would be on Redis, because:

  • it is already a ubiquitous and rock solid key component in a lot of infrastructures out there
  • when a distributed cache is needed, Redis is typically the one being used
  • it natively contains a pub/sub mechanism which would be the backbone of the implementation

One thing to know about the pub/sub mechanism in Redis is that any message sent will be received by all the nodes connected, including the sender itself. To avoid the eviction of the entry in the same node that originated the notification a form of sender identifier (like a UUID/ULID or similar) should be included in the message payload.

Also the design should be evolvable, to avoid a situation in the future where a new protocol design would break the system when introduced into a live system where nodes are communicating with the v1 and v2 is being introduced.

Of course other implementations may be done with different tecnologies.

@fmendez89
Copy link

Hi Jody,
that functionality would be awesome.
I have implemented something similar using IMemoryCache, Redis and RabbitMq. The idea is similar but in this case just Redis, which is great, becasue of the already usage of redis, so no more external services needed.

Thank you for this project, I'm currently thinking about using it in the project I'm involved.

Im the meantime, would this functionality be implemented with an action triggered on removed and set event?

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Nov 7, 2021

Hi @fmendez89

that functionality would be awesome. I have implemented something similar using IMemoryCache, Redis and RabbitMq. The idea is similar but in this case just Redis, which is great, becasue of the already usage of redis, so no more external services needed.

Yep, that is the idea, one tool (in this case Redis) to do both things, distributed cache and backplane: less stuff to maintain.

I'll add that the backplane would not be tied to any specific technology: the Redis one would just be the first implementation, but someone may decide to implement it using something else (like RabbitMq, as you mentioned).

Thank you for this project I'm currently thinking about using it in the project I'm involved.

Thank you for considering using it! If you end up doing that please let me know how it went, I would be interested to know.

Im the meantime, would this functionality be implemented with an action triggered on removed and set event?

Basically yes, that is the idea. I'm playing with different designs (as a normal plugin, as something else more specific, etc) but in the end it will listen for local events to push remote events, and at the same time it would listen for remote events and modify the cache locally.
I'm playing with some other small features that I think would be great to have, but the gist of it is what you described.

@fmendez89
Copy link

Thanks @jodydonetti for the response.
For sure, will let you know if we end up using it ;)

@jodydonetti jodydonetti changed the title 🔀 Backplane 📡 Backplane Nov 30, 2021
@jodydonetti jodydonetti changed the title 📡 Backplane 📢 Backplane Dec 1, 2021
@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Jan 27, 2022

Hi there, I'm happy to say that I've finally been able to complete the design and implementation of the backplane feature.

Please take a look here, try it out and let me know what you think so I can move forward with the final version.

Thanks everybody 🙏

📦 It's a pre-release!

Note that the Nuget package is marked as pre-release so please be sure to enable the related filter otherwise you would not see them:

image

@jodydonetti
Copy link
Collaborator Author

Meanwhile I published an (hopefully) even better alpha2 release.

@gabrielmaldi
Copy link

gabrielmaldi commented Jan 31, 2022

Hi, @jodydonetti. First of all I wanted to thank you for building this great library! And particularly for putting so much effort into docs, even for alpha releases; they are incredibly helpful 💪🏻.

Even though my use case is simple (two nodes and one Redis instance for distributed cache, which now also acts as the backplane), I wanted to give you feedback: onboarding to the backplane was very straightforward. Last week I tried the alpha1 release and now I just upgraded to alpha2, and everything seems to work great! This is the output when calling IFusionCache.RemoveAsync(key):

Screen Shot 2022-01-31 at 10 30 25

Speaking about removing: does the backplane enable any new ways of getting all the cache keys or clearing everything? Right now I'm tracking the keys this way:

Screen Shot 2022-01-31 at 10 57 07

So that I can expose an API that returns all the keys, which then allows to manually call another API with one of those keys to invalidate an entry from the cache. But this feels fragile.

It would be great to have IFusionCache.GetAllKeys() and IFusionCache.Clear/Flush(). I know this would be complicated because it depends on the implementation of IDistributedCache, but maybe if I know that I'm using Redis (or some other store that has similar support for KEYS and FLUSHDB) I could cast to a specific type and have access to those methods? This would greatly simplify clearing the cache manually through an admin UI by a "normal" user without direct access to the Redis instance. What are your thoughts on this? (I didn't want to create a new issue yet, but if you think it's better to have a separate discussion, I will).

Thank you again!

@jodydonetti
Copy link
Collaborator Author

First of all I wanted to thank you for building this great library! And particularly for putting so much effort into docs, even for alpha releases; they are incredibly helpful 💪🏻.

Hi @gabrielmaldi , thanks for trying FusionCache and taking the time to give me feedback, I appreciate it!
Also I'm glad you're liking it and you've found the docs nice: having spent a good amount of time on them it's nice to know somebody found them helpful 🙂

onboarding to the backplane was very simple [...] and everything seems to work great!

Thanks, it's nice to hear that and important to know! If you have any suggestion please don't hesitate.
I hope to release the final version this week or the next, after a little bit more testing.

This is the output when calling IFusionCache.RemoveAsync(key):

First of all thanks, by looking at your sreenshot I just noticed I forgot to put some of the new FusionCacheEntryOptions props into the log string 😅 !
Second: do you feel like something is missing there? If you are tihnking about that final null it's because that is the distributeOptions and in a remove operation it's not needed (so I don't allocate it). Do you feel it should be clearer?

Speaking about removing: does the backplane enable any new ways of getting all the cache keys or clearing everything?

Well, yes and no:

  • locally no, because if you wanted to you could have already listened to remove events for that, and this does not change that
  • globally yes, because if you want to have a local list of cache keys and one of them is being removed on another node and you want to know that, than yes in theory you may use the backplane for that

The problem with the second point is that right now there's no way to discern between the two.
But now that you mention it I may have something interesting to play with, that may change the backplane design a little bi before releasing it 🤔

Stay tuned!

It would be great to have IFusionCache.GetAllKeys() and IFusionCache.Clear/Flush().

Eheh, this is not the first time it comes up 😅
See below for more...

I know this would be complicated because it depends on the implementation of IDistributedCache, but maybe if I know that I'm using Redis (or some other store that has similar support for KEYS and FLUSHDB) I could cast to a specific type and have access to those methods?

I thought about this. Even more, I thought about introducing a new abstraction so that I know beforehand what a specific cache impl can do or not.

There are a couple of problems tough, things like:

  • enumerating all the keys on demand is typically very expensive to do in any possible implementation, and something you probably would like to avoid
  • with Redis in SaaS mode (like on Azure, AWS, etc) some commands like KEYS are typically disabled, because they can be very expensive and make the system unstable
  • it's something that can result in potentially a lot of items (keys) and that is typically handled via paging or a streaming of some sort, so it would end up being non trivial to use (I can imagine people just wanting something that returns a List<string> or something like that)

All of this for the "get all keys" case, than there's the "clear" case, which may have different but similar things to solve (even though I think the "clear" case is way more doable, at least in theory).

What are your thoughts on this? (I didn't want to create a new issue yet, but if you think it's better to have a separate discussion, I will).

I think the right thing to do would be to create a separate issue to reason about it, make a list of pros/cons, propos a design and wahtnot.
I'll do that asap, thanks!

@jodydonetti
Copy link
Collaborator Author

Oh, I almost forgot: would you care to explain what are your needs behind the "get all keys" and "clear" methods, separately?

Is it just to allow an hypothetical admin UI to clear the entire cache (so the "get all keys" would be needed just to be able to do a for loop with a remove call per each key) or there's more?

I'm asking because, for each of the 2 methods ("get all keys" and "clear") there may be solutions around them and/or different design that may get you to the same result but in a differnt way.

A very quick example (NOT AN OFFICIAL PROPOSAL): if all your cache access is via FusionCache and you care about "clearing the cache" logically but do not care to actually remove stuff from the underlying cache (eg: Redis), one idea may be to have a kind of a "clear" method that just save the current date. From there on, on every "get" operation, FusionCache may check if the entry has been saved before that threshold date and, if so, discard it because after that there has been a "clear" operation. Inside Redis the cache entries would still be there and they would gradually expire one after another and/or be overwritten by new entries, but via FusionCache they would not be there, and all of this would require basically no big computation, it would be basically instantaneous.

@gabrielmaldi
Copy link

by looking at your sreenshot I just noticed I forgot to put some of the new FusionCacheEntryOptions props into the log string 😅 !

Great that we could get something useful out of my post 😊.

Second: do you feel like something is missing there? If you are thinking about that final null it's because that is the distributeOptions and in a remove operation it's not needed (so I don't allocate it). Do you feel it should be clearer?

I think I wouldn't include the null in that log entry just in case someone reads it and thinks something's wrong; but it's a nit, keeping it is also fine with me 😊.


Regarding the "keys and clear" discussion: as you mentioned, all I care about is clearing the cache, and the "get all keys" is just a way to get there. Right now having all the keys allows clearing entries individually, but I wouldn't mind losing that at all in exchange for a big red "Clear Cache" button that nukes everything.

So your idea of a ClearedOnDate would work for this scenario 💪🏻. And now that the backplane exists, it can coordinate that date between all the nodes, so calling Clear() in one would propagate it to all. Perhaps it would be desirable to actually free the local memory in each node, because I guess people would expect that to happen (and not just make cached entries "invisible" to FusionCache).

If you open new tickets for any of these things please let me know so I can get involved. Thanks again for everything!

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Feb 2, 2022

Great that we could get something useful out of my post 😊.

🎉

I think I wouldn't include the null in that log entry just in case someone reads it and thinks something's wrong; but it's a nit, keeping it is also fine with me 😊.

It seems to make sense honestly, but I don't remember if there was a reason for that (I don't think so).
I'll think about it and will probably change to reflect your observation.

Regarding the "keys and clear" discussion: as you mentioned, all I care about is clearing the cache, and the "get all keys" is just a way to get there.
Right now having all the keys allows clearing entries individually, but I wouldn't mind losing that at all in exchange for a big red "Clear Cache" button that nukes everything.

Nice, so I can exclude the "get all keys" feature, at least for now!

So your idea of a ClearedOnDate would work for this scenario 💪🏻
And now that the backplane exists, it can coordinate that date between all the nodes, so calling Clear() in one would propagate it to all.

Exactly!

Perhaps it would be desirable to actually free the local memory in each node, because I guess people would expect that to happen (and not just make cached entries "invisible" to FusionCache).

This also makes sense: sadly though, in general I'm limited to the api available on the IMemoryCache interface or the MemoryCache class. Let me see if I can think about something...

If you open new tickets for any of these things please let me know so I can get involved.

Absolutely, will do after a minimum of thinking about it.

Thanks again for everything!

Thanks to you for being a part fo this 💪

@jodydonetti
Copy link
Collaborator Author

jodydonetti commented Feb 11, 2022

Hi all, just wanted to update you on the next version: I released right now the BETA2 for the next big release, which includes among other small things a big fix for the DI setup part.

Right now I'm:

  • using it in production on a couple of environments and observe how it is working
  • finishing a couple of docs

This will probably be the last release before the official one: except for some unexpected bug I think I'll release it officially this weekend or early next week.

@jodydonetti
Copy link
Collaborator Author

Hi all, yesterday I released the BETA 3.

Unless some big problem comes up, this will be the very last release before the official one, which will be in the next few days 🎉

@jodydonetti
Copy link
Collaborator Author

And here we go: v0.9 is finally out 🥳🎉

image

Thanks everybody for the involvement, and if you try it out let me know!

@JoeShook
Copy link
Contributor

Wow, I am late to this party! 🎉
I pulled in the v0.9 version into FusionCache.Plugins.Metrics. And failed a some tests for on specific assumption that I believe was wrong in the first place. I created a PR if anyone is interested. PR listed above this comment.

Probably just merge it later tomorrow.

Looks like I need to take another pass on my metrics work to include backplane events. I better get the lab spun backup.

@jodydonetti
Copy link
Collaborator Author

Thanks @JoeShook , you are in fact correct: with v0.9 I slightly changed the internal behaviour to better reflect what I consider to be the right one. In a GetOrSet call a SET event should be raised only if the factory is executed successfully.

Looks like I need to take another pass on my metrics work to include backplane events

That is a good idea, now that there are backplane events too. Thanks!

@jodydonetti jodydonetti changed the title 📢 Backplane [FEATURE] 📢 Backplane Dec 6, 2022
@rafael-canelas
Copy link

Regarding the clear cache method... were you able to implement something?
If i would want to reset a named instance only, is that possible?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants