-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_or_create
makes two GETs on a cache fail because of an oddity in Lock
#122
Comments
Michael Bayer (zzzeek) wrote: the race that this extra get solves is that there is no value, every thread races to reach _enter_create(), then one of them gets the lock, the rest wait; then when complete, the next one waiting to be the "creator" gets into the block, sees that someone just created, and then can return that. the emphasis is on not calling the creation function twice which is assumed to be vastly more expensive than get(). so then in the other case, we have the value, we acquired the mutex with block=False, which means we are definitely the ones chosen to create the new value, so we shouldnt need to check someone else did it. I need to re-learn what the async thing does but this sort of should work:
|
Michael Bayer (zzzeek) wrote: the async_creator thing makes this more intricate than that. i wonder if this issue was introduced by async_creator. |
Michael Bayer (zzzeek) wrote: so https://gerrit.sqlalchemy.org/#/c/zzzeek/dogpile.cache/+/740 will run through the test suite, though it would be nice to add a test that demonstrates the double-get behavior in the old version and confirms it's fixed. |
jvanasco (jvanasco) wrote: I can put together a test. i tracked this down via a "wrapper", which I thought was misconfigured. I ended up tearing apart dogpile from 11pm to 4am trying to work this out ;) I remember the race condition you mentioned above well. I think this is just an artifact of dogpile being modified over the years - i'm pretty sure it only triggered a second get when necessary when I looked at this in the past. |
Michael Bayer (zzzeek) wrote: have you reviewed and tested https://gerrit.sqlalchemy.org/#/c/zzzeek/dogpile.cache/+/740 yet? it's ready to go. no hurry, jsut want to make sure you look at it since you're more focused on this than me. |
jvanasco (jvanasco) wrote: i got a bit frustrated with gerrit so just patched it myself and am still writing tests. fyi, i have a few small cpython optimizations that i'm going to suggest in a separate pr too. |
Michael Bayer (zzzeek) wrote: when you're on the gerrit there's a "download" dropdown in the upper-right, you can bypass git and get a patchfile directly: |
jvanasco (jvanasco) wrote: This looks okay. I had one failure on one test run, but I haven't been able to replicate it. I think my computer froze and a cache value expired. I haven't figured out how to write the test case for the old behavior yet.
|
Michael Bayer (zzzeek) wrote: Don't call getter twice for existing, expired value Fixed a condition in the :class: Change-Id: I7b4f529a191130f99bc15763a446d1d441c50cbe → 5b75a9f |
Changes by Michael Bayer (zzzeek):
|
Migrated issue, originally created by jvanasco (jvanasco)
I originally thought this was a documentation issue (#89) and desired, but now I think this is just an artifact of code changing over time. I started looking at this deeper when an app was slamming Redis when it's cache wasn't prime.
looking at
lock.py
lines 86-94https://bitbucket.org/zzzeek/dogpile.cache/src/e0c1f63907e63ed1678f44ec2443314808ada1c3/dogpile/lock.py#lines-86:94
in line 87, dogpile calls
value_fn()
which is reallyself.value_and_created_fn
to set thegenerated
andcreatedtime
values.in line 94, dogpile calls
._enter_create(createdtime)
in that method, it looks like dogpile immediately tries to create a mutex, then calls
self.value_and_created_fn
again (line 131).the two calls to
get
happen immediately after each other, within a microsecond or two.i don't know all the scenarios that are needed in this area, but one 'fix' that works on tests is the following:
(value, createdtime)
instead of justcreatedtime
i'm not sure how that would affect the async model, or the call to value_fn in line 101 either.
The text was updated successfully, but these errors were encountered: