About get_or_create_multi feature #33

sqlalchemy-bot · 2013-05-31T07:59:31Z

Migrated issue, originally created by lxyu (lxyu)

I need a feature to get/create multi in one call.

Take this code for example

#!python

def get_order(order_id):
    return DBSession().query(Order).get(order_id)


def mget_order(order_ids):
    orders = DBSession().query(Order).filter(Order.id.in_(order_ids))
    return {order.id: order for order in orders}

The cache_on_arguments works great for the fist get_order function. But for the later one, if I mget [1, 2, 3] and [1, 2], they'll generate 2 cached value, and if a single element in the ids expired, the whole cache expired.

So I want to let the later one make use of the first function's cache. Then mget [1, 2, 3, 4, 5] will directly make use of cache generated in 'get', and if say [2, 3] expired, we only need to regenerate [2, 3]. It'll be much more fast and efficient when ids list grow larger.

While I go through the source code, the cache_on_arguments use a Lock from dogpile.core with this code, which seems not very suitable to do the multiple way in my situation.

#!python
with Lock(
        self._mutex(key),
        gen_value,
        get_value,
        expiration_time,
        async_creator) as value:
    return value

So do you have any suggestions on how to accomplish this goal?

Attachments: 33.patch

The text was updated successfully, but these errors were encountered:

sqlalchemy-bot · 2013-05-31T08:26:57Z

lxyu (lxyu) wrote:

Add it's a little confusing to me as some codes check value version while some not:

#!python
if value is NO_VALUE or \
    value.metadata['v'] != value_version or \
        (self._invalidated and
        value.metadata["ct"] < self._invalidated):

This checks version while get_multi do not.

sqlalchemy-bot · 2013-05-31T14:47:53Z

Michael Bayer (zzzeek) wrote:

it might be possible, and I think as far as the lock, you'd use the async_creation_runner feature. this delivers the mutex held by Lock to the actual creator function. The expiration times and dogpile locks are per-key, so the function here would have a list of locks which it would need to close out once it has retrieved all values.

sqlalchemy-bot · 2013-05-31T14:48:57Z

Michael Bayer (zzzeek) wrote:

get_multi() checks the expiration time using _unexpired_value_fn().

sqlalchemy-bot · 2013-05-31T15:15:42Z

lxyu (lxyu) wrote:

yes I know, but get_multi don't check the version like value.metadata['v'] != value_version

sqlalchemy-bot · 2013-05-31T15:33:05Z

Michael Bayer (zzzeek) wrote:

oh that. That's a separate issue, that's just a check in case we upgrade how dogpile.cache stores values in the cache, we'd bump "value_version" to 2. So we'd want to add the value_version check everywhere its missing now sure, but that only matters if and when we bump "value_version". that logic has no meaning right now.

sqlalchemy-bot · 2013-05-31T15:33:43Z

Michael Bayer (zzzeek) wrote:

possible system for #33, not tested with concurrency.

sqlalchemy-bot · 2013-05-31T15:33:43Z

Changes by Michael Bayer (zzzeek):

attached file 33.patch

sqlalchemy-bot · 2013-05-31T15:35:03Z

Michael Bayer (zzzeek) wrote:

try out that patch. I've only tested it in a non-concurrent case. I'd need to think pretty deeply about it in order to see if it actually maintains "dogpile" behavior, it might not, because it does the get_multi() outside of the mutex. the tricky thing here is that the mutexes have to remain per-key because you want all kinds of combinations of get(x, y, z, ...) to coordinate on keys.

sqlalchemy-bot · 2013-05-31T15:35:36Z

Michael Bayer (zzzeek) wrote:

silly test:

#!python

from dogpile.cache import make_region
from itertools import count

counter = count()

region = make_region(
    ).configure(
        "dogpile.cache.memory"
    )

def creator(keys):
    print "creator:", keys
    return [k + next(counter) for k in keys]

values = region.get_or_create_multi([2, 5, 6, 7, 8], creator)
print values

region.delete(5)
region.delete(8)

values = region.get_or_create_multi([2, 5, 6, 7, 8], creator)
print values

sqlalchemy-bot · 2013-05-31T17:21:35Z

lxyu (lxyu) wrote:

Thanks, I'll test the performance tomorrow.

IMHO, we use get_multi major for its performance(do 1 request instead of n request), so the performance is the first thing to consider, even with slightly less accuracy.

sqlalchemy-bot · 2013-05-31T17:25:06Z

Michael Bayer (zzzeek) wrote:

I think its a great feature, but with mutexing etc. I'm worried about deadlocks. this stuff deadlocks easily when it's not correct

sqlalchemy-bot · 2013-05-31T17:27:11Z

Michael Bayer (zzzeek) wrote:

in fact we might need to sort the keys to prevent this.

consider one call against (3, 4) and another against (4, 3). If get_multi calls go out against both simultaneously, you can deadlock on A->waiting for 4 B->waiting for 3. I need to look more closely, maybe produce a test, to see if that happens here.

sqlalchemy-bot · 2013-06-05T10:30:46Z

lxyu (lxyu) wrote:

Just tested the code modified a litter, you can check it here: https://gist.github.com/lxyu/5712915

Changes:

The get_multi returns dict, so I think the get_or_create_multi should keep the same behavior.
Add should_cache_fn support
Check keys_to_get before call creator
Add key sorting.

And another consideration is, the creator function differs from get_or_create too, as it accept the keys as args.

So it becomes a little complex to add a namespace to keys, see the following code for example.

#!python

def get(order_id):

    def _creator():
        return DBSession().query(Order).get(order_id)

    _get_key = lambda x: "cache:order:{}".format(x)
    return region.get_or_create(
        key=_get_key(order_id),
        creator=_creator,
        should_cache_fn=_dont_cache_none)


def mget(order_ids):

    def _creator(order_keys):
        order_ids = map(_extract_key, order_keys)
        orders = session.query(Order).filter(Order.id.in_(order_ids)).all()
        return {_get_key(order.id): order for order in orders}

    _get_key = lambda x: "cache:order:{}".format(x)
    _extract_key = lambda x: int(x[x.rfind(':')+1:])

    keys = map(_get_key, order_ids)
    orders = region.get_or_create_multi(
        keys=keys,
        creator=_creator,
        should_cache_fn=_dont_cache_none)

    return {i: orders[_get_key(i)] for i in order_ids}

Any idea on how to refine this situation? and I'm also wondering if this feature can be written as a multi decorator?"

sqlalchemy-bot · 2013-06-10T23:23:01Z

Michael Bayer (zzzeek) wrote:

I've created the function decorator method, addressed the namespace issue as well as produced some real tests in b323546. The method still needed a lot of changes in order to function fully. This will now be 0.5.0. Let me know that this implementation addresses your issues and reopen if you have problems, thanks.

sqlalchemy-bot · 2013-06-10T23:23:01Z

Changes by Michael Bayer (zzzeek):

changed status to closed

sqlalchemy-bot · 2013-06-13T15:08:18Z

lxyu (lxyu) wrote:

Yes this resolved my problems.

I actually implement a decorator my own and just thinking for a pull request, then find out that your implementation is better than mine. hah!

Thank you very much!

sqlalchemy-bot closed this as completed Jun 10, 2013

sqlalchemy-bot added the feature label Nov 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About get_or_create_multi feature #33

About get_or_create_multi feature #33

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented Jun 5, 2013

sqlalchemy-bot commented Jun 10, 2013

sqlalchemy-bot commented Jun 10, 2013

sqlalchemy-bot commented Jun 13, 2013

About get_or_create_multi feature #33

About get_or_create_multi feature #33

Comments

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented May 31, 2013

sqlalchemy-bot commented Jun 5, 2013

sqlalchemy-bot commented Jun 10, 2013

sqlalchemy-bot commented Jun 10, 2013

sqlalchemy-bot commented Jun 13, 2013