Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async python support: aiopynamodb #802

Open
kamadorueda opened this issue Jun 21, 2020 · 12 comments
Open

Async python support: aiopynamodb #802

kamadorueda opened this issue Jun 21, 2020 · 12 comments

Comments

@kamadorueda
Copy link

https://github.com/aio-libs/aiobotocore
https://github.com/terrycain/aioboto3

@garrettheel
Copy link
Member

The discussion here is relevant: #525 (comment)

I'd like to support asyncio natively in the library, but I'm still a little hesitant to adopt aiobotocore right as it's not maintained by AWS. We don't rely on all that much of botocore right now, so one option would be to drop that altogether and provide a separate async interface

@dwatkinsweb
Copy link

Any idea when this might happen? We could really use this feature right now. I've been attempting to do this myself but I've been having to duplicate a lot of your code for a few small changes.

@kamadorueda
Copy link
Author

There is another approach that is used by many libraries out there (keep reading for examples):

When a library exposes a high-latency function, for instance:

for item in TestModel.view_index.query(1):
    print("Item queried from index: {0}".format(item))

One can wrap the calls in a sub-thread via loop.run_in_executor.

Since that's is a little verbose there are nice libraries to make it human-friendly, for example aioextensions

So the syntax would be something like:

from aioextensions import in_thread

for item in await in_thread(TestModel.view_index.query, 1):
    print("Item queried from index: {0}".format(item))

Which would run the high-latency thing in a sub-thread that allows for concurrency.

It's a very minimalistic interface and requires no work from pynamodb since it's on the consumer side to do the wrapping:

from aioextensions import in_thread, collect

# Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4)
one_query = await in_thread(pynamodb_func, arg_1, arg_2, kwarg_a=3, kwarg_b=4)

# Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) but all queries concurrently (overlapping in time) and fast!!
many_queries = await collect([
    in_thread(pynamodb_func, arg_1, arg_2, kwarg_a=kwarg_a, kwarg_b=kwarg_b)
    for arg_1, arg_2, kwarg_a, kwarg_b in [long list of things to fetch]
])

There is another alternative and is providing _async versions of the functions, which internally could use the mentioned wrappers hiding them from the final user:

def pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) -> Data:
    ....

async def async_pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) -> Data:
   return await in_thread(pynamodb_func, arg_1, arg_2, kwarg_a=kwarg_a, kwarg_b=kwarg_b)

The library also offers some nice helpers that we could find useful like workers, batching and rate limits.

I think I'm volunteering to implement the async wrappers if you think it's a nice approach, you tell me! @garrettheel

These are examples of the mentioned sub-thread wrapping:

I've personally used it in production and the benefits from concurrency are worth the small overhead it adds to every call

It's common to use a_, async_ or _async notation when both flavors are offered by a library

@garrettheel
Copy link
Member

garrettheel commented Sep 28, 2020

loop.run_in_executor is an interesting approach, but I have tried this before and seen performance issues with high-throughput applications trying this. Introducing threads also introduces new and interesting failure modes that didn't exist before. I'd be concerned about going down that path, especially since the vast majority of users would still use the sync interface and pay that tax

I've been experimenting with a different approach in #853, which could be characterized as a hackier version of the above suggestion (to the benefit of not requiring threads).

@brunobelloni
Copy link

Can also be done using asyncio. Will already be prepared for an eventual real async PynamoDB
Working on Python 3.9.14+

asyncio.to_thread uses ThreadPoolExecutor under the hood

import asyncio


async def main():
    # Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4)
    one_query = await asyncio.to_thread(pynamodb_func, arg_1, arg_2, kwarg_a=3, kwarg_b=4)

    # Equivalent to pynamodb_func(arg_1, arg_2, kwarg_a=3, kwarg_b=4) but all queries concurrently (overlapping in time) and fast!!
    many_queries = await asyncio.gather([
        asyncio.to_thread(pynamodb_func, arg_1, arg_2, kwarg_a=kwarg_a, kwarg_b=kwarg_b)
        for arg_1, arg_2, kwarg_a, kwarg_b in [long list of things to fetch]
    ])


if __name__ == '__main__':
    asyncio.run(main())

@aaronclong
Copy link

aaronclong commented Feb 7, 2023

Would it be possible to create a separate async module in this library and create a similar but async api for people to use?

There are a few of third party async dyanmo/boto3 libraries available for use. It could be used until Amazon finally updates boto3 to support asyncio (😔 cries from botocore maintainer).

I think this approach has a lot of benefits. PynamoDB will have a working async module when boto3 supports it, and if designed correctly, could be swapped out with these third party libs dynamically. Would the maintainer be okay with that?

@aaronclong
Copy link

@tasn I notice you tried to do this with threading: #968

@abend-arg
Copy link

abend-arg commented Feb 9, 2023

I am working on a project that we will benefit from adding async support to this package. We will implement our solution basically wrapping everything you have using Gevent. Why Gevent? Because you do not need to worry about async/await syntax, you do not need to rewrite everything defining async methods.

We will probably implement this before June, so as soon as I get some results from it, I will come back with a PR implementing it.

In the meantime, I would really appreciate some feedback providing you with more context. Gevent is great but for example, the support for Windows is limited:

http://www.gevent.org/install.html#supported-platforms

Probably it will narrow the supported Python versions that your library already supports as well.

@aaronclong
Copy link

@AbendGithub I think long-term async/await is the future of python, though. Gevent isn't native or widely used by most python programmers.

@ikonst
Copy link
Contributor

ikonst commented Feb 11, 2023

We use pynamodb with gevent pretty much everywhere at Lyft without any modifications to this library (with standard gevent monkey-patching).

There's been a lot of community interest in adding an asyncio layer to this library over the years. It's not entirely trivial and will probably result in lots of duplication (seen this in redis-py) which is probably why we haven't yet.

I'd also see it as a negative testimony to the asyncio approach (aka blue/green functions), but this train left the station and most of us are invested into one of those two approaches, so I can definitely see the value in an asyncio layer.

@aaronclong
Copy link

Yeah, I know the blue/green function debate is quite polarizing. However, as you said, the language is natively adopting the once approach. Eventually, I feel like even boto3 will be forced to adopt asyncio.

@dbfreem
Copy link

dbfreem commented Sep 11, 2024

Hey just curious if this ever caught traction. I feel like asyncio is one of the easiest ways I find to improve io bound apps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants