Skip to content

Document behavior of functools.lru_cache with concurrent access #141831

@narmstrong-tl

Description

@narmstrong-tl

Documentation

In #93179, lru_cache documentation was updated to specify that the function can be called in multiple threads. However, it was not made clear how return values with multiple calls with the same arguments are handled. It should be made clear that without outside synchronization, the function in question can be called from multiple threads and result in different calls to the function returning different instances with the same cache key.

The existing docs are just:

It is possible for the wrapped function to be called more than once if another thread makes an additional call before the initial call has been completed and cached.

The docs should be updated to specify something like:

In particular, if the function is called more than once, the return values will be distinct.

And possibly explicitly call out that it should not be used for singleton instantiation.

Justification

I have seen many codebases that use @lru_cache(maxsize=1) as an approach for singleton object creation, which is not threadsafe. Doing a quick code search on Github, I see 28.8k hits for code that contains @lru_cache(maxsize=1).

Explanation of the problem

The source code contains (abbreviated) the following sequence for handling a cache miss.

with lock:
  update_datastructure()
result = user_function(*args, **kwargs)
with lock:
  if key in cache:
    # Getting here means that this same key was added to the
    # cache while the lock was released.  Since the link
    # update is already done, we need only return the
    # computed result and update the count of misses.
    pass
return result

This can result in very surprising behavior, illustrated by the following test:

from functools import lru_cache
from time import sleep
import threading


class Foo:
    pass


@lru_cache(maxsize=1)
def my_very_expensive_function() -> Foo:
    sleep(3)
    return Foo()


def test_threading_identities():
    results = [None, None, None]

    def call_expensive_and_store(idx):
        results[idx] = my_very_expensive_function()

    threads = [
        threading.Thread(target=call_expensive_and_store, args=(i,)) for i in range(3)
    ]

    for t in threads:
        t.start()
    for t in threads:
        t.join()

    foo_ids = [id(obj) for obj in results]
    # assert that all ids are equal
    assert foo_ids[0] == foo_ids[1] == foo_ids[2]

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dirtriagedThe issue has been accepted as valid by a triager.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions