New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalidate entire cache #38
Comments
Michael Bayer (zzzeek) wrote: there are two great ways, one is http://dogpilecache.readthedocs.org/en/latest/api.html#dogpile.cache.region.CacheRegion.invalidate and the other is the recipe or variants of http://dogpilecache.readthedocs.org/en/latest/usage.html#invalidating-a-group-of-related-keys . |
jvanasco (jvanasco) wrote: i thought invalidate only works on a key. it works on the entire region? SWEET! |
Michael Bayer (zzzeek) wrote: but theres a catch. it only works in that process. |
jvanasco (jvanasco) wrote: Yeah, i'm concerned with resetting the cache(s) of a long-running process(es) without restarting the process(es). With a DBM based cache, if I want to drop the cache it seems I can just delete the dir and then run a script to re-generate the cache files. That doesn't seem to cause too many error. Not sure how to handle memcached, etc. Cycling the cache backend tends to cause errors. The best way to handle invalidating memcached without dogpile errors seems to be site-stop, memcached off, memcached on, sleep(5), site-start |
Morgan Fainberg () wrote: In theory, it would be possible to have the backend support a site-wide invalidate without too much extra code. Just make the current CacheRegion.invalidate() check to see if the backend has a similar method, call that. Have the backend store (in it's actual store) a special key that indicates that anything older than is invalid. I think that would be a reasonable feature add. It would add another lookup (perhaps something that could be done on an interval and is stored in a local var) to verify cache validity. |
Michael Bayer (zzzeek) wrote: OK seems like you're talking about two things. "if the backend has a similar method", I guess you mean if the backend is a dict, we want dict.clear() type of thing, we have an existing convention for backend-specific features which is that you call it from the backend directly:
now, if the feature is instead "Have the backend store (in it's actual store) a special key that indicates that anything older than is invalid", that's not specific to a backend, that could be done agnostically with CacheRegion. What I don't like about it is that it's slow, adds an extra cache hit to all operations. If we turn it off, we're cluttering up CacheRegion with ever more conditionals to suit use cases that are extremely rare (I'd never need a feature like this). I'd like to explore first how CacheRegion could allow extensibility in ways like this without cluttering it up, then the "augment all cache operations with an explicit invalidation key check" can be an extension feature in a separate module. |
Morgan Fainberg () wrote: I was actually thinking of the same mechanism as the current CacheRegion invalidate. With regards to something like dict.clear(), I think it is useful to pass that on as a utility for cache invalidation on that backend, but I see that as a one-off not as a globally acceptable mechanism (based upon how the back ends work). But that being said, I agree, you don't want the overhead of having to do that lookup every time. The mechanism to load in that specific "invalidate" information would need to be smarter than "check if invalidate is set, load, then check cache". I'm not yet sure how I would approach this in a universally acceptable way. Allowing elegant extension use is never a bad idea (in my opinion). |
Michael Bayer (zzzeek) wrote: you either have to check that invalidate key every time, or you can "box" it by having a function that looks at the current time, and on a per-region basis only checks the "invalidate" key every N seconds. So a very active cache region would not be doing this second hit more than every N seconds. A not very active region would be doing the hit for a majority of accesses, but it's not active so not a big deal. its definitely logic I'd want to have "somewhere else", and nicely tested in isolation against a mock backend. |
Michael Bayer (zzzeek) wrote: a simple hook to CacheRegion here would be that it consults some injected function in order to get at the "invalidation time" value. |
n01s3 (n01s3) wrote: I'm not sure if this is the best place to ask, but I'm using async_runner to repopulate my cache (memory backend) in the background, and found that calling region.invalidate() forces the next call to do a synchronous/blocking repopulate. I've hunted around but can't find a good way to invalidate the whole region in a way that will continue to allow serving stale data while repopulating via async_runners. Is this possible with the current implementation? |
Michael Bayer (zzzeek) wrote: that's a great point, as invalidate() was written to just force a regen immediately. I've broken it out into "hard" and "soft" options in 138d3d7 where you can see that a "soft" invalidation does the invalidate by faking the creation time to be "now - expiration time", rather than raising a NeedRegen or returning a hard "0" value for creation time. I haven't tested this in an integration context (e.g. with multiple threads), please let me know if this flag solves this issue for you. |
n01s3 (n01s3) wrote: Wow, that was fast! Thanks so much, I'll test this out shortly and let you know. |
n01s3 (n01s3) wrote: That worked beautifully and saves me a bunch of work. Thanks again for the quick fix. For anyone who later finds this, the use-case I'm using it for:
|
Michael Bayer (zzzeek) wrote: OK might be time for a release. |
Michael Bayer (zzzeek) wrote: the "soft" flag has resolved this. |
Changes by Michael Bayer (zzzeek):
|
zoomorph (zoomorph) wrote: When running multiple forked processes, you have to invalidate in every process because it doesn't actually delete or invalidate the keys from the backend. Would it be possible to delete an entire region from the backend, and if so could a flag or separate method be added to accomplish this? |
Michael Bayer (zzzeek) wrote: @zoomorph it sounds like you're going back around to the beginning of the ticket here. Backends like memcached or redis don't have a keys() function that we could use to "delete the entire region". Hence we do it with invalidation timestamps instead. Those are currently local to a specific Python process that sets that up, but the notion here is, hey lets get that invalidation time from the server instead. great ! but how do we do that and not double our cache accesses, how do we do it without messying up the dogpile internals too much? one answer right now is that each app queries the datastore periodically, like with a background thread, for a single "invalidation" timestamp, and sets it up as needed using region.invalidate(). So this can be rolled entirely on the outside - though that doesn't mean we can't add some helpers or at least examples in the recipes section that talk about this. |
zoomorph (zoomorph) wrote: Thanks for the explanation. |
zoomorph (zoomorph) wrote: Thanks n01s3. I'm using uWSGI so I'm using their signal framework to handle invalidation on all workers. |
jvanasco (jvanasco) wrote: A while back I thought about handling this with a custom ProxyBackend- Create an 'invalidation' ProxyBackend; calls to 'get' first check for This value could probably only be hit it periodically, and cached into The tricky part though, is this proxy backend would have to hit a
I think the logic would be something like : APP - ProxyBackend - I ended up not implementing this, because it was easier to construct The only time /we/ would necessarily need to refresh an entire region |
Morgan Fainberg () wrote: Lets revisit this and allow for passing in an override to the _hard_invalidate and _soft_invalidate that can work on the backends. The default can be only within the region, but we just hit this exact issue within Keystone (OpenStack) and we're willing to take the overhead hit of asking the backend for the "expiration time" each time for the benefit of not hitting SQL. 2x Memcache hits will still be better than inconsistent results. For clarity, the idea is that the region-wide .invalidate would make some calls on an Abstract Base Class (or similar) instead of just setting the values on the region itself. This allows a developer to override but the default can remain local to the in-process. |
Michael Bayer (zzzeek) wrote: @morgan_fainberg you mean you want an extension to use a second "get" from the backend to "get' an invalidation token, right? the idiomatic approach within region.py is that callable objects can be passed in; right now for example you can pass to get_or_create a "should_cache_fn". It seems like we'd add the ability for a "should_invalidate_fn" or similar. |
Morgan Fainberg () wrote: @zzzeek Correct, something like that. The only concern I have is that it also needs to hook into the region.invalidate to be as transparent to the developer as possible. The way I had to (temp until we have something in dogpile) I patched the ._hard_invalidated and ._soft_invalidated with an property that did the work with a setter/deleter. So as long as we can hook the .invalidate method into whatever should_invalidate_fn does, we should be good. |
Michael Bayer (zzzeek) wrote: What I am very much hoping not to do is to introduce a plethora of ad-hoc "abstract" classes all over the place as a means of arbitrary extension, because then you have a mess, and also inconsistent vs. the many current region hooks that are currently sent as arbitrary callables. Where we do have an "abstract" class as an extension point is CacheBackend. f we made "invalidate" a hook that consulted the backend, we have the ProxyBackend which allows you to inject "middleware" of sorts in between region and the actual backend. I wonder if this kind of thing could happen there. |
jvanasco (jvanasco) wrote: FWIW this touches on an earlier attempt at a PR I had, and allowing the logic of the cache validator to be configurable. As a quick refresher, the current system validates the cache by managing a dict payload that includes a dogpile API version and timestamp. If that functionality were pluggable, the validity could be based on other factors. |
Morgan Fainberg () wrote: @zzzeek Totally fair. I really would alos prefer to not use abstract classes if we can get there without it. I would be happy to have invlidate do the same thing mutex does, let the cachebackend (or via proxy) easily then cover the needs of region wide invalidation. Defaults can stay the same as today, but it becomes extensible. |
jvanasco (jvanasco) wrote:
if defined, the value is piped into it, and the return values are: • True - valid |
Morgan Fainberg () wrote: Reopening this based on the conversation in the comments. |
Changes by Morgan Fainberg ():
|
Michael Bayer (zzzeek) wrote: Make cache region invalidation pluggable Introduce class RegionInvalidationStrategy that performs region Fixes: #38 → d521db7 |
Changes by Michael Bayer (zzzeek):
|
Migrated issue, originally created by jvanasco (jvanasco)
there's no good way to invalidate an entire cache / region.
it would be nice if there were.
The text was updated successfully, but these errors were encountered: