Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About get_or_create_multi feature #33

Closed
sqlalchemy-bot opened this issue May 31, 2013 · 16 comments
Closed

About get_or_create_multi feature #33

sqlalchemy-bot opened this issue May 31, 2013 · 16 comments
Labels

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by lxyu (lxyu)

I need a feature to get/create multi in one call.

Take this code for example

#!python

def get_order(order_id):
    return DBSession().query(Order).get(order_id)


def mget_order(order_ids):
    orders = DBSession().query(Order).filter(Order.id.in_(order_ids))
    return {order.id: order for order in orders}

The cache_on_arguments works great for the fist get_order function. But for the later one, if I mget [1, 2, 3] and [1, 2], they'll generate 2 cached value, and if a single element in the ids expired, the whole cache expired.

So I want to let the later one make use of the first function's cache. Then mget [1, 2, 3, 4, 5] will directly make use of cache generated in 'get', and if say [2, 3] expired, we only need to regenerate [2, 3]. It'll be much more fast and efficient when ids list grow larger.

While I go through the source code, the cache_on_arguments use a Lock from dogpile.core with this code, which seems not very suitable to do the multiple way in my situation.

#!python
with Lock(
        self._mutex(key),
        gen_value,
        get_value,
        expiration_time,
        async_creator) as value:
    return value

So do you have any suggestions on how to accomplish this goal?


Attachments: 33.patch

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

Add it's a little confusing to me as some codes check value version while some not:

#!python
if value is NO_VALUE or \
    value.metadata['v'] != value_version or \
        (self._invalidated and
        value.metadata["ct"] < self._invalidated):

This checks version while get_multi do not.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

it might be possible, and I think as far as the lock, you'd use the async_creation_runner feature. this delivers the mutex held by Lock to the actual creator function. The expiration times and dogpile locks are per-key, so the function here would have a list of locks which it would need to close out once it has retrieved all values.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

get_multi() checks the expiration time using _unexpired_value_fn().

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

yes I know, but get_multi don't check the version like value.metadata['v'] != value_version

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

oh that. That's a separate issue, that's just a check in case we upgrade how dogpile.cache stores values in the cache, we'd bump "value_version" to 2. So we'd want to add the value_version check everywhere its missing now sure, but that only matters if and when we bump "value_version". that logic has no meaning right now.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

possible system for #33, not tested with concurrency.

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (zzzeek):

  • attached file 33.patch

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

try out that patch. I've only tested it in a non-concurrent case. I'd need to think pretty deeply about it in order to see if it actually maintains "dogpile" behavior, it might not, because it does the get_multi() outside of the mutex. the tricky thing here is that the mutexes have to remain per-key because you want all kinds of combinations of get(x, y, z, ...) to coordinate on keys.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

silly test:

#!python

from dogpile.cache import make_region
from itertools import count

counter = count()

region = make_region(
    ).configure(
        "dogpile.cache.memory"
    )

def creator(keys):
    print "creator:", keys
    return [k + next(counter) for k in keys]

values = region.get_or_create_multi([2, 5, 6, 7, 8], creator)
print values

region.delete(5)
region.delete(8)

values = region.get_or_create_multi([2, 5, 6, 7, 8], creator)
print values

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

Thanks, I'll test the performance tomorrow.

IMHO, we use get_multi major for its performance(do 1 request instead of n request), so the performance is the first thing to consider, even with slightly less accuracy.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

I think its a great feature, but with mutexing etc. I'm worried about deadlocks. this stuff deadlocks easily when it's not correct

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

in fact we might need to sort the keys to prevent this.

consider one call against (3, 4) and another against (4, 3). If get_multi calls go out against both simultaneously, you can deadlock on A->waiting for 4 B->waiting for 3. I need to look more closely, maybe produce a test, to see if that happens here.

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

Just tested the code modified a litter, you can check it here: https://gist.github.com/lxyu/5712915

Changes:

  1. The get_multi returns dict, so I think the get_or_create_multi should keep the same behavior.
  2. Add should_cache_fn support
  3. Check keys_to_get before call creator
  4. Add key sorting.

And another consideration is, the creator function differs from get_or_create too, as it accept the keys as args.

So it becomes a little complex to add a namespace to keys, see the following code for example.

#!python

def get(order_id):

    def _creator():
        return DBSession().query(Order).get(order_id)

    _get_key = lambda x: "cache:order:{}".format(x)
    return region.get_or_create(
        key=_get_key(order_id),
        creator=_creator,
        should_cache_fn=_dont_cache_none)


def mget(order_ids):

    def _creator(order_keys):
        order_ids = map(_extract_key, order_keys)
        orders = session.query(Order).filter(Order.id.in_(order_ids)).all()
        return {_get_key(order.id): order for order in orders}

    _get_key = lambda x: "cache:order:{}".format(x)
    _extract_key = lambda x: int(x[x.rfind(':')+1:])

    keys = map(_get_key, order_ids)
    orders = region.get_or_create_multi(
        keys=keys,
        creator=_creator,
        should_cache_fn=_dont_cache_none)

    return {i: orders[_get_key(i)] for i in order_ids}

Any idea on how to refine this situation? and I'm also wondering if this feature can be written as a multi decorator?"

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

I've created the function decorator method, addressed the namespace issue as well as produced some real tests in b323546. The method still needed a lot of changes in order to function fully. This will now be 0.5.0. Let me know that this implementation addresses your issues and reopen if you have problems, thanks.

@sqlalchemy-bot
Copy link
Author

Changes by Michael Bayer (zzzeek):

  • changed status to closed

@sqlalchemy-bot
Copy link
Author

lxyu (lxyu) wrote:

Yes this resolved my problems.

I actually implement a decorator my own and just thinking for a pull request, then find out that your implementation is better than mine. hah!

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant