Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an async variant of lru_cache for coroutines. #90780

Closed
uranusjr mannequin opened this issue Feb 3, 2022 · 23 comments
Closed

Add an async variant of lru_cache for coroutines. #90780

uranusjr mannequin opened this issue Feb 3, 2022 · 23 comments
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@uranusjr
Copy link
Mannequin

uranusjr mannequin commented Feb 3, 2022

BPO 46622
Nosy @rhettinger, @asvetlov, @serhiy-storchaka, @1st1, @uranusjr, @achimnol
PRs
  • bpo-46622: Async support for cached_property #31314
  • bpo-46622: Async support for lru_cache (and cache) #31495
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-02-03.07:28:28.631>
    labels = ['type-feature', 'library', '3.11']
    title = 'Add an async variant of lru_cache for coroutines.'
    updated_at = <Date 2022-02-24.16:03:50.476>
    user = 'https://github.com/uranusjr'

    bugs.python.org fields:

    activity = <Date 2022-02-24.16:03:50.476>
    actor = 'rhettinger'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2022-02-03.07:28:28.631>
    creator = 'uranusjr'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46622
    keywords = ['patch']
    message_count = 16.0
    messages = ['412422', '412906', '412985', '412987', '412988', '413187', '413222', '413223', '413225', '413226', '413764', '413790', '413791', '413816', '413819', '413923']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'asvetlov', 'serhiy.storchaka', 'yselivanov', 'uranusjr', 'achimnol']
    pr_nums = ['31314', '31495']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue46622'
    versions = ['Python 3.11']

    @uranusjr
    Copy link
    Mannequin Author

    uranusjr mannequin commented Feb 3, 2022

    Currently, decorating a coroutine with cached_property would cache the coroutine itself. But this is not useful in any way since a coroutine cannot be awaited multiple times.

    Running this code:

        import asyncio
        import functools
    
        class A:
            @functools.cached_property
            async def hello(self):
                return 'yo'
    
        async def main():
            a = A()
            print(await a.hello)
            print(await a.hello)
    
        asyncio.run(main())

    produces:

        yo
        Traceback (most recent call last):
          File "t.py", line 15, in <module>
            asyncio.run(main())
          File "/lib/python3.10/asyncio/runners.py", line 44, in run
            return loop.run_until_complete(main)
          File "/lib/python3.10/asyncio/base_events.py", line 641, in run_until_complete
            return future.result()
          File "t.py", line 12, in main
            print(await a.hello)
        RuntimeError: cannot reuse already awaited coroutine

    The third-party cached_property package, on the other hand, detects a coroutine and caches its result instead. I feel this is a more useful behaviour. pydanny/cached-property#85

    @uranusjr uranusjr mannequin added 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 3, 2022
    @merwok merwok removed 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes labels Feb 8, 2022
    @asvetlov
    Copy link
    Contributor

    asvetlov commented Feb 9, 2022

    Pull Request is welcome!
    I would say that the PR should not use asyncio directly but 'async def', 'await', and inspect.iscoroutine() / inspect.iscoroutinefunction() instead.
    The property should work with any async framework, not asyncio only.

    @uranusjr
    Copy link
    Mannequin Author

    uranusjr mannequin commented Feb 10, 2022

    should not use asyncio directly but 'async def', 'await', and inspect.iscoroutine() / inspect.iscoroutinefunction() instead.

    Hmm, this introduces some difficulties. Since a coroutine can only be awaited once, a new coroutine needs to be returned (that simply return the result when awaited) each time get is called. But this means we need a way to somehow get the result in get. If there’s a separate cached_property_async it’s possible to make get a coroutine function, but personally I’d prefer the interface to match the PyPI cached_property.

    @asvetlov
    Copy link
    Contributor

    You can return a wrapper from __get__ that awaits the inner function and saves the result somewhere.

    @serhiy-storchaka
    Copy link
    Member

    Something like:

    _unset = ['unset']
    class CachedAwaitable:
        def __init__(self, awaitable):
            self.awaitable = awaitable
            self.result = _unset
        def __await__(self):
            if self.result is _unset:
                self.result = yield from self.awaitable.__await__()
            return self.result

    @asvetlov
    Copy link
    Contributor

    I have a design question.
    Does print(await a.hello) look awkward?
    I'm not speaking about correction.

    In asyncio code I have seen before, await val means waiting for a future object. await func() means async function call.

    await obj.attr looks more like a future waiting but, in fact, it is not.
    Should we encourage such syntax?
    I have no strong opinion here and like to hear other devs.

    @uranusjr
    Copy link
    Mannequin Author

    uranusjr mannequin commented Feb 14, 2022

    I agree that print(await a.hello) does look awkward, although I know some would disagree. (Context: I submitted this BPO after a colleague of mine at $WORK pointed out the behavioural difference between functools and `cached_property to me.)

    Personally I’d feel this more natural:

    class Foo:
        @functools.cache
        async def go(self):
            print(1)
    
    async def main():
        foo = Foo()
        await foo.go()
        await foo.go()

    Although now I just noticed this actually does not work either.

    Perhaps we should fix this instead and add a line in the documentation under cached_property to point people to the correct path?

    @asvetlov
    Copy link
    Contributor

    Agree. Let's start from async functions support by functools.lru_cache.

    If we will have an agreement that cached_property is an important use-case we can return to this issue.

    I have a feeling that async lru_cache is much more important. https://pypi.org/project/async_lru/ has 0.5 million downloads per month: https://pypistats.org/packages/async-lru

    @serhiy-storchaka
    Copy link
    Member

    Note that there is a similar issue with cached generators.

    >>> from functools import *
    >>> @lru_cache()
    ... def g():
    ...     yield 1
    ... 
    >>> list(g())
    [1]
    >>> list(g())
    []

    I am not sure that it is safe to detect awaitables and iterables in caching decorators and automatically wrap them in re-awaitable and re-iterable objects. But we can add explicit decorators and combine them with arbitrary caching decorators. For example:

    @lru_cache()
    @reiterable
    def g():
        yield 1

    @asvetlov
    Copy link
    Contributor

    From my point of view, both sync and async functions can be cached.

    Sync and async iterators/generators are other beasts: they don't return a value but generate a series of items. The series can be long and memory-consuming, I doubt if it should be cached safely.

    @asvetlov asvetlov changed the title Support decorating a coroutine with functools.cached_property Support decorating a coroutine with functools.lru_cache Feb 22, 2022
    @asvetlov asvetlov changed the title Support decorating a coroutine with functools.cached_property Support decorating a coroutine with functools.lru_cache Feb 22, 2022
    @rhettinger
    Copy link
    Contributor

    If this goes forward, my strong preference is to have a separate async_lru() function just like the referenced external project.

    For non-async uses, overloading the current lru_cache makes it confusing to reason about. It becomes harder to describe clearly what the caches do or to show a pure python equivalent. People are already challenged to digest the current capabilities and are already finding it difficult to reason about when references are held. I don't want to add complexity, expand the docs to be more voluminous, cover the new corner cases, and add examples that don't make sense to non-async users (i.e. the vast majority of python users). Nor do I want to update the recipes for lru_cache variants to all be async aware. So, please keep this separate (it is okay to share some of the underlying implementation, but the APIs should be distinct).

    Also as a matter of fair and equitable policy, I am concerned about taking the core of a popular external project and putting in the standard library. That would tend to kill the external project, either stealing all its users or leaving it as something that only offers a few incremental features above those in the standard library. That is profoundly unfair to the people who created, maintained, and promoted the project.

    Various SC members periodically voice a desire to move functionality *out* of the standard library and into PyPI rather than the reverse. If a popular external package is meeting needs, perhaps it should be left alone.

    As noted by the other respondants, caching sync and async iterators/generators is venturing out on thin ice. Even if it could be done reliably (which is dubious), it is likely to be bug bait for users. Remember, we already get people who try to cache time(), random() or other impure functions. They cache methods and cannot understand why references is held for the instance. Assuredly, coroutines and futures will encounter even more misunderstandings.

    Also, automatic reiterability is can of worms and would likely require a PEP. Every time subject has come up before, we've decided not to go there. Let's not make a tool that makes user's lives worse. Not everything should be cached. Even if we can, it doesn't mean we should.

    @rhettinger rhettinger changed the title Support decorating a coroutine with functools.lru_cache Add a async variant of lru_cache for coroutines. Feb 23, 2022
    @rhettinger rhettinger changed the title Support decorating a coroutine with functools.lru_cache Add a async variant of lru_cache for coroutines. Feb 23, 2022
    @rhettinger rhettinger changed the title Add a async variant of lru_cache for coroutines. Add an async variant of lru_cache for coroutines. Feb 23, 2022
    @rhettinger rhettinger changed the title Add a async variant of lru_cache for coroutines. Add an async variant of lru_cache for coroutines. Feb 23, 2022
    @asvetlov
    Copy link
    Contributor

    Thanks, Raymond.

    I agree that caching of iterators and generators is out of the issue scope.

    Also, I agree that a separate async cache decorator should be added. I prefer the async_lru_cache (and maybe async_cache for the API symmetry). We have contextmanager and asynccontextmanager in contextlib already along with closing / aclosing, ExitStack / AsyncExitStack etc.

    async_lru_cache should have the same arguments as accepted by lru_cache but work with async functions.

    I think this function should be a part of stdlib because the implementation shares internal _lru_cache_wrapper that does all dirty jobs (and has C accelerator). A third-party library should either copy all these implementation details or import a private function from stdlib and keep fingers crossed in hope that the private API will keep backward compatibility in future Python versions.

    Similar reasons were applied to contextlib async APIs.

    Third parties can have different features (time-to-live, expiration events, etc., etc.) and can be async-framework specific (work with asyncio or trio only) -- I don't care about these extensions here.

    My point is: stdlib has built-in lru cache support, I love it. Let's add exactly the as we have already for sync functions but for async ones.

    @uranusjr
    Copy link
    Mannequin Author

    uranusjr mannequin commented Feb 23, 2022

    Another thing to point out is that existing third-party solutions (both alru_cache and cached_property) only work for asyncio, and the stdlib version (as implemented now) will be an improvement. And if the position is that the improvements should only be submitted to third-party solutions---I would need to disagree since both lru_cache and cached_property have third-party solutions predating their stdlib implementations, and it is double-standard IMO if an async solution is kept out while the sync version is kept in.

    @serhiy-storchaka
    Copy link
    Member

    I think that it would be simpler to add a decorator which wraps the result of an asynchronous function into an object which can be awaited more than once:

    def reawaitable(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            return CachedAwaitable(func(*args, **kwargs))
        return wrapper

    It can be combined with lru_cache and cached_property any third-party caching decorator. No access to internals of the cache is needed.

    @lru_cache()
    @reawaitable
    async def coro(...):
        ...
    
    @cached_property
    @reawaitable
    async def prop(self):
        ...

    @serhiy-storchaka
    Copy link
    Member

    async_lru_cache() and async_cached_property() can be written using that decorator. The implementation of async_lru_cache() is complicated because the interface of lru_cache() is complicated. But it is simpler than using _lru_cache_wrapper().

    def async_lru_cache(maxsize=128, typed=False):
        if callable(maxsize) and isinstance(typed, bool):
            user_function, maxsize = maxsize, 128
            return lru_cache(maxsize, typed)(reawaitable(user_function))
    
        def decorating_function(user_function):
            return lru_cache(maxsize, typed)(reawaitable(user_function))
    
        return decorating_function
    
    def async_cached_property(user_function):
        return cached_property(reawaitable(user_function))

    @rhettinger
    Copy link
    Contributor

    [Andrew Svetlov]

    A third-party library should either copy all these
    implementation details or import a private function from stdlib

    OrderedDict provides just about everything needed to roll lru cache variants. It simply isn't true this can only be done efficiently in the standard library.

    [Serhiy]

    it would be simpler to add a decorator which wraps the result
    of an asynchronous function into an object which can be awaited
    more than once:

    This is much more sensible.

    It can be combined with lru_cache and cached_property any third-party
    caching decorator. No access to internals of the cache is needed.

    Right. The premise that this can only be done in the standard library was false.

    async_lru_cache() and async_cached_property() can be written
    using that decorator.

    The task becomes trivially easy :-)

    [Andrew Svetlov]

    Pull Request is welcome!

    ISTM it was premature to ask for a PR before an idea has been thought through. We risk wasting a user's time or committing too early before simpler, better designed alternatives emerge.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @kstrauser
    Copy link

    Just as a data point, as of today the async_lru package doesn't work with Python 3.10. I stumbled across this issue just now when converting some existing code to be async, and from an end user's POV, it initially seems like I can't have both at async and lru_cache at the same time.

    (I know that's not the case, and I could just use the fixed async_lru commit for my own code, but that might be a bridge too far for a newer Python user.)

    @rhettinger
    Copy link
    Contributor

    Marking this a closed. If needed, a new issue can be opened with Serhiy's reawaitable decorator which would be a much cleaner and more universal solution.

    @NewUserHa
    Copy link
    Contributor

    But should stblib support lru_cache results of async functions or not?

    @wouterdb
    Copy link

    wouterdb commented Sep 4, 2024

    For future reference, the solution that can be pieced together from this thread is

    # Async support for @functools.lrucache
    # From https://github.com/python/cpython/issues/90780
    from functools import wraps, lru_cache
    
    _unset = ['unset']
    class CachedAwaitable:
        def __init__(self, awaitable):
            self.awaitable = awaitable
            self.result = _unset
        def __await__(self):
            if self.result is _unset:
                self.result = yield from self.awaitable.__await__()
            return self.result
    
    def reawaitable(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            return CachedAwaitable(func(*args, **kwargs))
        return wrapper
    
    
    def async_lru_cache(maxsize=128, typed=False):
        if callable(maxsize) and isinstance(typed, bool):
            user_function, maxsize = maxsize, 128
            return lru_cache(maxsize, typed)(reawaitable(user_function))
    
        def decorating_function(user_function):
            return lru_cache(maxsize, typed)(reawaitable(user_function))
    
        return decorating_function

    With pytest testcase

    async def test_async_lru():
        hit_count = []
        @async_lru_cache
        async def coro(arg: str) -> str:
            hit_count.append(arg)
            return arg
    
        assert "A" == await coro("A")
        assert len(hit_count) == 1
        assert "A" == await coro("A")
        assert len(hit_count) == 1
        assert "B" == await coro("B")
        assert len(hit_count) == 2
    

    @wouterdb
    Copy link

    wouterdb commented Sep 5, 2024

    Follow up: the code above has a bug

    If I update my testcase a bit

    async def test_async_lru():
        hit_count = []
    
        @async_lru_cache
        async def coro(arg: str) -> str:
            hit_count.append(arg)
            await asyncio.sleep(0.01)
            return arg
    
        async def work(arg: str) -> str:
            return await coro("A")
    
        a_fut_1 = asyncio.create_task(work("A"))
        a_fut_2 = asyncio.create_task(work("A"))
        assert "A" == await a_fut_1
        assert len(hit_count) == 1
        assert "A" == await a_fut_2
        assert len(hit_count) == 1
        assert "A" == await coro("A")
        assert "B" == await coro("B")
        assert len(hit_count) == 2

    I get

    self = <inmanta.util.async_lru.CachedAwaitable object at 0x7fd86921a090>
    
        def __await__(self) -> Generator[Any, None, T]:
            if self.result is _unset:
    >           self.result = yield from self.awaitable.__await__()
    E           RuntimeError: cannot reuse already awaited coroutine
    

    If I patch it up to this

    class CachedAwaitable(Awaitable[T]):
        def __init__(self, awaitable: Awaitable[T]) -> None:
            self.awaitable = awaitable
            self.result: Future[T] | None = None
    
        def __await__(self) -> Generator[Any, None, T]:
            if self.result is None:
                fut = asyncio.get_event_loop().create_future()
                self.result = fut
                result = yield from self.awaitable.__await__()
                fut.set_result(result)
            if not self.result.done():
                yield from self.result
            return self.result.result()

    But I can only just understand this code, so it would be nice if someone could confirm this is correct?

    @jmehnle
    Copy link

    jmehnle commented Sep 23, 2024

    @wouterdb wrote:

    class CachedAwaitable(Awaitable[T]):
        def __init__(self, awaitable: Awaitable[T]) -> None:
            self.awaitable = awaitable
            self.result: Future[T] | None = None
    
        def __await__(self) -> Generator[Any, None, T]:
            if self.result is None:
                fut = asyncio.get_event_loop().create_future()
                self.result = fut
                result = yield from self.awaitable.__await__()
                fut.set_result(result)
            if not self.result.done():
                yield from self.result
            return self.result.result()

    This implementation is problematic because it can only work with asyncio but not other async frameworks.

    @wouterdb
    Copy link

    Ok, I don't understand why, but it does confirm that this way too subtle for me. thx

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants