feat: add ability to clear cache #54

ct16k · 2022-12-17T15:59:13Z

While the LRU package has the ability to purge all items from cache, this functionality was not available to ProtoGetter, making it imposibile to clear the cache without restarting all peers. This change adds a Clear() method to ProtoGetter, that enables clearing the cache with no downtime.

thrawn01 · 2022-12-20T15:04:48Z

Thanks for the PR! What is the use case for this feature? We already have remove functionality, why would we need to purge the entire cache?

We should note in the documentation for this feature that this is a point in time purge, such that it's possible a value from before the purge could survive if one of the other instances in the cluster has the value, but hasn't been purged it yet, then that value is requested by an instance that has already been purged. If this functionality is expected to purge all values without the chance for re-propagation of the values before the purge then a different solution is needed.

This can be confusing, so here is a sequence diagram

sequenceDiagram
    actor User
    User->>Node1: Purge
    Node1->>Node2: Purge
    Node1->>Node3: Request a value from Node3
    Node1->>Node3: Purge

In this way, a value from Node3 survives the purge.

In the case of a single Remove() call, we delete the value from the value owner first, then tell all others to clear the value.

While the LRU package has the ability to purge all items from cache, this functionality was not available to `ProtoGetter`, making it imposibile to clear the cache without restarting all peers. This change adds a `Clear()` method to `ProtoGetter`, that enables clearing the cache with no downtime.

Running a Purge and a Get at the same time could allow an old key to be pulled back into the cache if the request was initiated by a node that had been cleared and sent to a node that hadn't. This patch tries to mitigate this, by adding an additional `generation` field, that keeps track of the number of purges issued. Requests are fulfilled successfully only if boths ends of a request are on the same generation.

ct16k · 2023-01-22T09:35:39Z

My use-case is related to mistakes I've done in past lives with caching layers, by either setting a longer TTL on a key than intended or poisoning the cache by choosing the wrong hash key. While the first scenario can be mitigated with the current remove functionality, if there are too many keys to remove, the ability to flush the cache completely without restarting the service is a useful tool to have around. As for the second use-case, there's no way around it than to flush the cache.

To mitigate an old value surviving the purge, I've added a new field I've uninspiredly called generation. The server will honor requests only if they're coming from a client on the same generation. I'm not 100% certain the way I implemented it doesn't still allow keys to transcend a flush, so I'm trying to write a proper test for this. And it may be that the extra field and complexity is not worth it. It's like cache invalidation is a hard problem. :)

thrawn01 · 2023-03-03T15:37:26Z

Sorry for not comming back to this PR for a while, you wouldn't believe the 2023 I've had so far 😭

I think I follow the use case of the generation.

My understanding is that generation is used when getting a value? For example, If the generation from the requestor doesn't match the generation of the peer then the value should not be retrieved from the peer's cache, but instead be fetched by the getter. This avoids a peer that has already be purged from re-propagating data from a cache that is on an older generation.

However, what happens if there is a peer in the cluster that is on the wrong generation, in that case all requests to that peer will always come from the getter and never from the cache. There is no mechanism to ensure that all peers in the cluster are always on the same generation until a purge is requested. For example, if a purge is requested in an existing cluster and everyone is now on generation 1 and a new peer is added to the cluster. The new peer will be on generation 0 and has no idea that everyone else is on generation 1.

I understand the use case for this type of thing, but these are hard problems to solve. It's one of the reasons Brad originally made this library very simple as apposed to his previous work in memcached. We get away with Set() and Remove() because we added TTL which means eventually the cluster becomes consistent again. Synchronization is a hard problem to solve when performance matters.

The best thing to do when we pollute the cache with bad values is to redeploy the entire application.

I could be convinced we should merge this feature anyway (without the generation stuff) as long as we CLEARLY document that this purge is not guaranteed to purge all cache values in flight. It might be useful for an operator to send purge requests multiple times in hopes of purging in flight data or if the application can pause processing while the purge completes.

Thoughts?

ghost · 2023-06-08T15:18:12Z

http.go

+	defer res.Body.Close()
+
+	if res.StatusCode != http.StatusOK {
+		body, err := ioutil.ReadAll(res.Body)


Suggested change

body, err := ioutil.ReadAll(res.Body)

body, err := io.ReadAll(res.Body)

thrawn01 · 2023-06-09T13:25:36Z

I know it's been a while, but there is renewed interest in this feature. I'm not sure about support for "generations" is needed. In the case I'm thinking about, we should just purge the cache multiple times and hope we got everything. 😆

Also, it would make implementation simpler.

@ct16k Are you interested in this PR anymore? We might make a new PR with your changes and credit you.

ct16k added 2 commits December 23, 2022 21:46

ct16k force-pushed the wip-clear branch from 8e3936a to 7f8db4d Compare January 22, 2023 09:09

ghost reviewed Jun 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add ability to clear cache #54

feat: add ability to clear cache #54

ct16k commented Dec 17, 2022

thrawn01 commented Dec 20, 2022

ct16k commented Jan 22, 2023

thrawn01 commented Mar 3, 2023

ghost Jun 8, 2023

thrawn01 commented Jun 9, 2023 •

edited

Loading

	body, err := ioutil.ReadAll(res.Body)
	body, err := io.ReadAll(res.Body)

feat: add ability to clear cache #54

Are you sure you want to change the base?

feat: add ability to clear cache #54

Conversation

ct16k commented Dec 17, 2022

thrawn01 commented Dec 20, 2022

ct16k commented Jan 22, 2023

thrawn01 commented Mar 3, 2023

ghost Jun 8, 2023

Choose a reason for hiding this comment

thrawn01 commented Jun 9, 2023 • edited Loading

thrawn01 commented Jun 9, 2023 •

edited

Loading