Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add decorated_fn.refresh(*args) feature #36

Open
sqlalchemy-bot opened this issue Jun 18, 2013 · 25 comments
Open

add decorated_fn.refresh(*args) feature #36

sqlalchemy-bot opened this issue Jun 18, 2013 · 25 comments
Labels

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by lxyu (lxyu)

do dogpile.cache have this feature yet:

one thread refresh the cached value while others directly return the old?

so I can update the cache in the background without slowing down the response time.

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

it would be great if I can call a refresh with this, some_decorated_func.refresh(*args).

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

I assume the "one thread" here is some background thread that's not directly returning to a user. you can build this approach using async_creation_runner here: http://dogpilecache.readthedocs.org/en/latest/api.html#module-dogpile.cache.region .

As far as dogpile.cache having the background thread system in place, that's out of scope. Users need to roll themselves how the background system should work (thread, message queue, etc.).

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

already implemented as async_creation_runner

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (zzzeek):

  • changed status to closed

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

didn't notice it already exists!

It works great except one thing: it seems I can't manually expire a value.

Maybe we can change the invalidate behavior to expire rather than delete? Or may be add a expire func?

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

The problem I'm solving is this:

I'm working on a server-client model based on thrift, sometimes I need to force ignore the cache and get a real-time value when some events happen, and if I got a real-time value, why not just refresh the cache in succession.

So how about a more specific feature, to get a non-cached value from a decorated function and refresh the cache in succession?

maybe new_value = some_decorated_func(*args, refresh=True) or just new_value = some_decorated_func.refresh(*args)?

@sqlalchemy-bot
Copy link
Author

Changes by lxyu (lxyu):

  • changed status to reopened

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (zzzeek):

  • removed labels: feature
  • added labels: feature

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (zzzeek):

  • changed title from "About "refresh" feature" to "add decorated_fn.refresh(*args) feature"

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

pullreq ?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

767dd6a

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (zzzeek):

  • changed status to closed

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

wow you're sooo fast!

I've prepared to give a pull request today, and you're faster than me again. ;)

And, glad to see the repo migrate to git!

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

Hi, I've monitored that in some rare cases, the refresh may not work as expected under distributed env.

Take the following example, suppose we have 2 thread calling refresh.

thread-a comes at 0s and somehow returns the value at 3s.

thread-b comes at 1s and returns value at 2s.

Then what if the value changed at 0.5s? The final cache is flushed to an expired one.

So maybe it require a lock in refresh too?

@sqlalchemy-bot
Copy link
Author

Changes by lxyu (lxyu):

  • changed status to reopened

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

I think we may change the _value behavior here.

Currently the _value recored when the value generated with time.time().

#!python
def _value(self, value):
        """Return a :class:`.CachedValue` given a value."""
        return CachedValue(value, {
                            "ct": time.time(),
                            "v": value_version
                        })

While the enter-time may make more sense here. For example we entered at 0s and takes 2s to generate the value, we may set the ct to 0s here.

In this way, we know which value is newer under concurrency.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

have "time" be an optional argument to _value() and go for it.

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

ok, since the pullreq 1 declined, I'll resolve the #2 part later. But as creation_time is only needed in refresh, I think we may need more detail discussion about the refresh feature first.

About why mutex in refresh. Refer to comments above, the refresh may have concurrent issue when the first-enter returns later, which cause the correct cache be replaced with a wrong old one. So it's not simple as set, what we shall do here, is check the creation_time, if the creation_time is older, don't write back to cache. While this is not an atomic operation, we need lock. And it's not a "do a get_or_create with a guaranteed expiration", it's a force refresh.

And the refresh don't support class method, I still can't find a solution, invalidate seems to have this issue as well.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

About why mutex in refresh. Refer to comments above, the refresh may have concurrent issue when the first-enter returns later, which cause the correct cache be replaced with a wrong old one.

I'd really like all concurrency issues addressed by dogpile.cache to go through get_or_create(). you can send an expiration time of "now" or "the past" and that would guarantee a regeneration. Anything that's in the "decorator" method is just convenience on top of this. As far as "force", in your pull req, it looks like you call the creator unconditionally, but then you don't actually do the "set" if someone else got to it already. So that just seems like a more broken form of what get_or_create already does (runs only one creator guaranteed).

And the refresh don't support class method, I still can't find a solution, invalidate seems to have this issue as well.

I think you're talking about https://bitbucket.org/zzzeek/dogpile.cache/issue/24/cache-invalidation-for-class-or-instance. separate issue.

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

do you mean toggle a refresh by a get_or_create with expiration_time=0? ok I'll try that then. Just dived too deep in cache_on_arguments decorator and didn't think of this.

If in this way, the creation_time is no longer needed. yah, seems great.

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

have tried that, but it not working the same.

when use get_or_create in refresh, it works as dogpile, and forbid parallel calling. If we have 10 concurrent refresh calling and the first one somehow blocked or timed out, all the following calling timed out.

what I want to achieve is, refresh always return the non-cached real value, and auto refresh the cache to the newest state.

While we may have many calling of refresh at the same time, it's not dogpile refresh, but a refresh that support concurrent calling.

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

it looks like you call the creator unconditionally, but then you don't actually do the "set" if someone else got to it already.

it's not ' if someone else got to it already', it compares the enter time of refresh calling to solve this problem:

Take the following example, suppose we have 2 thread calling refresh.
thread-a comes at 0s and somehow returns the value at 3s.
thread-b comes at 1s and returns value at 2s.
then the cache will be overwritten by thread-a at 3s.
Then what if the value changed at 0.5s? The final cache is flushed to an expired one.

So this concurrent is a different approach compare to dogpile.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

This isn't a use case I care to support directly. if you have two or three threads all calling the same "creation" function at roughly the same time, you have no idea which one has the "fresher" value - concurrency is non-deterministic, the creator that started second could finish first, etc., there's no way to determine it. The point of the dogpile library is to prevent ever wastefully running the same creation function concurrently.

If you are trying to get multiple threads to purposefully pile up on the same creation function with the same arguments all at the same time and then just pick a winner, you can roll that yourself on the outside.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

let me know if you worked out some system of doing this that works, and if there are any potential API features or adjustments needed on the dogpile system to help it along.

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

yah have to say I have not worked out a really good solution on this. In my specific situation, the later started thread will always get the newest value, so I added an enter_time to cache value, so that the result from later started thread will always be left in cache, no matter when it finished.

It's working fine for months since, but the system itself is not perfect and I don't know if this situation will also applied to other systems.

And I'm agree with your opinion about "The point of the dogpile library is to prevent ever wastefully running the same creation function concurrently. " I'll try to refine the system outside dogpile.cache, maybe later. If I got anything new, I'll open pull request here. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant