Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document that lru_cache uses hard references #88476

Closed
wouterdb mannequin opened this issue Jun 4, 2021 · 21 comments
Closed

Document that lru_cache uses hard references #88476

wouterdb mannequin opened this issue Jun 4, 2021 · 21 comments
Assignees
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@wouterdb
Copy link
Mannequin

wouterdb mannequin commented Jun 4, 2021

BPO 44310
Nosy @rhettinger, @serhiy-storchaka, @cryvate, @pablogsal, @miss-islington, @nanjekyejoannah, @@wouterdb, @Fidget-Spinner
PRs
  • bpo-44310: added docs to functools to warn for memory leaks #26528
  • bpo-44310: Note that lru_cache keep references to both arguments and results #26715
  • [3.10] bpo-44310: Note that lru_cache keep references to both arguments and results (GH-26715) #26716
  • bpo-44310: Add a FAQ entry for caching method calls #26731
  • [3.10] bpo-44310: Add a FAQ entry for caching method calls (GH-26731) #26777
  • [3.9] bpo-44310: Add a FAQ entry for caching method calls (GH-26731) #26778
  • bpo-44310: Remove dubious suggestion in cache example #26789
  • Superseder
  • bpo-19859: functools.lru_cache keeps objects alive forever
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2021-06-14.05:54:45.410>
    created_at = <Date 2021-06-04.11:45:58.109>
    labels = ['3.11', 'library', '3.9', '3.10', 'performance']
    title = 'Document that lru_cache uses hard references'
    updated_at = <Date 2021-06-20.15:11:01.892>
    user = 'https://github.com/wouterdb'

    bugs.python.org fields:

    activity = <Date 2021-06-20.15:11:01.892>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2021-06-14.05:54:45.410>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2021-06-04.11:45:58.109>
    creator = 'Wouter De Borger2'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44310
    keywords = ['patch']
    message_count = 21.0
    messages = ['395075', '395087', '395088', '395125', '395144', '395149', '395152', '395157', '395158', '395336', '395774', '395796', '395887', '395912', '396016', '396017', '396018', '396019', '396020', '396021', '396178']
    nosy_count = 9.0
    nosy_names = ['rhettinger', 'python-dev', 'serhiy.storchaka', 'cryvate', 'pablogsal', 'miss-islington', 'nanjekyejoannah', 'Wouter De Borger2', 'kj']
    pr_nums = ['26528', '26715', '26716', '26731', '26777', '26778', '26789']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = '19859'
    type = 'resource usage'
    url = 'https://bugs.python.org/issue44310'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @wouterdb
    Copy link
    Mannequin Author

    wouterdb mannequin commented Jun 4, 2021

    # Problem

    the functools.lru_cache decorator locks all arguments to the function in memory (inclusing self), causing hard to find memory leaks.

    # Expected

    I had assumed that the lru_cache would keep weak-references and that when an object is garbage colected, all its cache entries expire as unreachable. This is not the case.

    # Solutions

    1. I think it is worth at least mentioning this behavior in de documentation.
    2. I also think it would be good if the LRU cache actually uses weak references.

    I will try to make a PR for this.

    @wouterdb wouterdb mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.10 only security fixes 3.11 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir performance Performance or resource usage labels Jun 4, 2021
    @pablogsal
    Copy link
    Member

    Using a weak dictionary is not a correct solution as the cache must take string ownership of the arguments and return value to do it's job properly. Moreover, there are many types in Python that don't support weak references so this will be a backwards incompatible change and limiting the cache quite a lot.

    @Fidget-Spinner
    Copy link
    Member

    @wouter
    Hmm, I thought most use cases of lru_cache benefit from strong references for predictable hit rates? I'm not an expert in this area, so I nosied-in someone else who is.

    However, I noticed that the current doc doesn't mention the strong reference behavior anywhere. So I think your suggestion to amend the docs is an improvement, thanks!

    @terryjreedy terryjreedy removed 3.7 (EOL) end of life 3.8 only security fixes labels Jun 4, 2021
    @rhettinger
    Copy link
    Contributor

    Also note that many important objects in Python are not weak referenceable, tuples for example.

    @rhettinger rhettinger changed the title lru_cache memory leak Document that lru_cache uses hard references Jun 4, 2021
    @rhettinger rhettinger changed the title lru_cache memory leak Document that lru_cache uses hard references Jun 4, 2021
    @rhettinger
    Copy link
    Contributor

    I'm thinking of a more minimal and targeted edit than what is in the PR.

    Per the dev guide, we usually word the docs in an affirmative and specific manner (here is what the tool does and an example of how to use it). Recounting a specific debugging case or misassumption usually isn't worthwhile unless it is a common misconception.

    For strong versus weak references, we've had no previous reports even though the lru_cache() has been around for a long time. Likely, that is because the standard library uses strong references everywhere unless specifically documented to the contrary. Otherwise, we would have to add a strong reference note to everything stateful object in the language.

    Another reason that it likely hasn't mattered to other users is that an lru cache automatically purges old entries. If an object is not longer used, it cycles out as new items are added to the cache. Arguably, a key feature of an LRU algorithm is that you don't have to think about the lifetime of objects.

    I'll think it a for a while and will propose an alternate edit that focuses on how the cache works with methods. The essential point is that the instance is included in the cache key (which is usually what people want). Discussing weak vs strong references is likely just a distractor.

    @pablogsal
    Copy link
    Member

    Agreed! I will let the PR to you :)

    @rhettinger
    Copy link
    Contributor

    It may useful to link back to @cached_property() for folks wanting method caching tied to the lifespan of an instance rather than actual LRU logic.

    @serhiy-storchaka
    Copy link
    Member

    This is a full duplicate of bpo-19859. Both ideas of using weak references and changing documentation were rejected.

    @nanjekyejoannah
    Copy link
    Member

    I saw the thread but the idea was rejected by @rhettinger who seems to suggest the changes in the documentation this time himself.

    Maybe he has changed his mind, in which case he can explain the circumstances of his decisions if he wants.

    @Cryvate
    Copy link
    Mannequin

    Cryvate mannequin commented Jun 8, 2021

    Reading this bug thread last week made me realize we had made the following error in our code:

    class SomethingView():
        @functools.lru_cache()
        def get_object(self):
            return self._object

    Now, as this class was instantiated for every (particular kind of) request to a webserver and this method called (a few times), the lru_cache just kept filling up and up. We had been having a memory leak we couldn't track down, and this was it.

    I think this is an easy mistake to make and it was rooted, not so much in hard references though (without that though, it would have not leaked memory) but because of the fact the cache lives on the class and not the object.

    @rhettinger
    Copy link
    Contributor

    New changeset fafcfff by Raymond Hettinger in branch 'main':
    bpo-44310: Note that lru_cache keep references to both arguments and results (GH-26715)
    fafcfff

    @rhettinger
    Copy link
    Contributor

    New changeset 809c3fa by Miss Islington (bot) in branch '3.10':
    bpo-44310: Note that lru_cache keep references to both arguments and results (GH-26715) (GH-26716)
    809c3fa

    @rhettinger
    Copy link
    Contributor

    See PR 26731 for a draft FAQ entry. Let me know what you think.

    @Cryvate
    Copy link
    Mannequin

    Cryvate mannequin commented Jun 16, 2021

    PR 26731 looks very good to me. My only comment, which I am not sure is worthy of adding/is a general lru_cache thing, that "instances
    are kept alive until they age out of the cache or until the cache is
    cleared", if you are creating instances and calling this method all the time, will lead to an infinite memory leak.

    Not sure whether that's too specific to the problem we encountered and we are all consenting adults and should infer this or it is helpful: leave it up to your/other people's judgement.

    P.S. In the programming.rst there is also the "Why are default values shared between objects?" section which actually uses default values to make its own poor version of a cache. It should probably at least mention lru_cache could be used (unless you particularly need callers to be able to pass their own cache).

    @rhettinger
    Copy link
    Contributor

    if you are creating instances and calling this method
    all the time, will lead to an infinite memory leak.

    Your words aren't making any sense to me. The default
    lru_cache will never hold more than maxsize entries.
    The default maxsize is 128. How is that "infinite"?

    @Cryvate
    Copy link
    Mannequin

    Cryvate mannequin commented Jun 17, 2021

    I clearly was missing some words there Raymond. I meant, if one has set maxsize=None.

    @Cryvate
    Copy link
    Mannequin

    Cryvate mannequin commented Jun 17, 2021

    (but consenting adults, setting max_size=None for "efficiency", you better be sure what you are doing in a long running process and making sure it cannot grow unbounded.)

    @rhettinger
    Copy link
    Contributor

    I meant, if one has set maxsize=None.

    The docs already say, "If maxsize is set to None, the LRU feature is disabled and the cache can grow without bound."

    @rhettinger
    Copy link
    Contributor

    New changeset 7f01f77 by Raymond Hettinger in branch 'main':
    bpo-44310: Add a FAQ entry for caching method calls (GH-26731)
    7f01f77

    @rhettinger
    Copy link
    Contributor

    New changeset 77eaf14 by Miss Islington (bot) in branch '3.10':
    bpo-44310: Add a FAQ entry for caching method calls (GH-26731) (GH-26777)
    77eaf14

    @rhettinger
    Copy link
    Contributor

    Adding a weak referencing recipe here just so I can find it in the future.

    --------------------------------------------------------------------------

    import functools
    import weakref
    
    def weak_lru(maxsize=128, typed=False):
        """LRU Cache decorator that keeps a weak reference to "self".
    Only provides benefit if the instances are so large that
    it is impractical to wait for them to age out of the cache.
    
    When the instance is freed, the cache entry still remains
    but will be unreachable.
    
    If new instances will be created that are equal to the ones
    retired by the weak reference, we lose all the benefits of
    having cached the previous call.  
    
    If the class defines __slots__, be sure to add '__weakref__'
    to make the instances weak referenceable.
    
    """
    
        def decorator(func):
    
            ref = weakref.ref
    
            @functools.lru_cache(maxsize, typed)
            def _func(_self, /, *args, **kwargs):
                return func(_self(), *args, **kwargs)
    
            @functools.wraps(func)
            def wrapper(self, /, *args, **kwargs):
                return _func(ref(self), *args, **kwargs)
    
            return wrapper
    
        return decorator

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants