-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C implementation of functools.lru_cache #58581
Comments
functools.lru_cache is optimized to the point that it may benefit from a C implementation. |
Thank you for working on this. I look forward to reviewing it. |
Updated patch to fix a crash if maxsize isn't given, and add a unit test for that. Possible issues:
|
I've just started looking at this. Nice job and good attention to detail on the error checking. Expect to have a few high-level suggestions and a ton of minor edits. Here are a couple of quick thoughts:
|
I've fixed the commenting, and cache_info use. I've left the element management in pure C as it reduces memory use (56 bytes for 4 element list, vs. 16 for lru_cache_elem), and avoids ref counting overhead (3 refs per link, plus GC). The difference might become quite marked for very large caches. There's also a nice invariant that links the key to the cache dict, and the result object to the lru_cache_elem. I'm happy to change this if it doesn't matter. My only concern now is the wrapping of the lru cache object. In the Python version, @wraps allows the lru_cache to masquerade as the wrapped function wrt str/repr. The C version is wrapped, but str/repr remain unchanged. Not sure if this is a problem. |
Matt, I'm attaching a pure python version to serve as a better map for how to implement this in C.
|
Updated to show how to handle the kwd_mark and the try/except. |
The existing C patch already does this.
The existing patch already did this.
I'm familiar with C. I've retained PyDict_GetItemWithError so as not to suppress typing errors from keys constructed from bad arguments from the user.
The existing patch is already using the GIL, it didn't have any locks.
The existing patch already had this.
I incorporated this.
So did the old code.
I've left this as is, the linked list in C is idiomatic, avoids a lot of branching and reference counting and is also more readable than the equivalent C/Python. |
I look forward to your feedback Ezio. |
I wonder if we should keep the original Python implementation alongside the new C version. If we do, it would be nice to see a 100% coverage of the Python version and have the tests check both the implementations. |
See also bpo-12428. |
I have added a lot of comments in Rietveld. In general the patch looks good, but I have some style nitpicks and some more strong comments. Matt, please update the patch. Tests should test both Python and C implementations. I doubt if get rid of locking is right. Hash calculation and comparison of arguments can call Python code and this code can recursive use the wrapped function. Some benchmarks will be interesting. I think this acceleration must necessarily be in 3.4. |
Updated patch with next points:
|
Thanks, Alexey. However most of my comments were not considered. I have repeated them and added some new comments. |
Serhiy, thank you for review. Working further on fixes. |
Fixed my previous patch according to all comments from review, except removing keyword arguments from lru_cache_new function. |
Added additional Py_DECREF(s) for key and value. |
LGTM. Thank you, Matt and Alexey. Here is a simple benchmark. On my computer it shows 10-25x speedup using the C accelerator. |
Antoine reminded me about a lock. In Python implementation it needed because linked list modifications are not atomic. In C implementation linked list modifications are atomic. However dict operations can call Python code and therefore they are not atomic. I don't know what bad things can happened with concurrent cache updating, however using lock will be safer and cheap enought. Please add lock field to lru_cache_object and use it as in Python implementation. If no one can prove that a lock is not needed. |
Just for the record, I've compiled Raymond's roadmap version in Cython (with only slight changes to make 'self.maxsize' a Py_ssize_t and using an external .pxd for typing) and ran Serhiy's benchmark over it (Ubuntu 12.10, 64bit). This is what I get in Py3.4: 0.022 untyped_cy(i) 0.143 untyped_py(i) Looking at the factors, that's about the same speedup that the dedicated hand tuned C implementation presented according to Serhiy's own runs (he reported 10-25x). Makes me wonder why we should have two entirely separate implementations for this. Here's the lru_cache_class.pxd file I used: """ cdef make_key(tuple args, dict kwds, bint typed, tuple kwd_mark) @cython.final
@cython.internal
cdef class c_lru_cache:
cdef dict cache
cdef Py_ssize_t hits
cdef Py_ssize_t misses
cdef Py_ssize_t maxsize
cdef bint typed
cdef object user_function
cdef object cache_info_type
cdef tuple kwd_mark
cdef list root
""" |
Hmm. Judging by the numbers for the Python version, my machine appears If I adjust the results for the machine differences, the C version |
I've managed to build the Cython version now. It's in fact between 4 and 6 |
Yep, I basically didn't do any optimisation, it's the plain Python code compiled, just with the class being converted into an extension type. |
Thread-safe implementation for cache cleanup. |
Unfortunately the patch caused a crash in test_ipaddress. Larry granted special exclusion for adding this feature after feature freeze if it will be fixed before beta 2. |
Here are the recent version of the patch and minimal crash reproducer. |
The problem is in property descriptor getter. It uses cached tuple for args and can unexpectedly modify it. Opened bpo-24276 for fixing this bug. Another way is to remove fast path in lru_cache_make_key() and always copy args tuple. |
Backed out the backout in cb30db9cc029. |
This changed caused a problem in Django's test suite (bisected to 0d0989359bbb0). File "/home/tim/code/django/django/db/models/options.py", line 709, in _populate_directed_relation_graph The get_models() method is decorated with @lru_cache.lru_cache(maxsize=None) |
Here is a patch that adds __get__ for lru_cache object. |
Thanks, that does resolve the issue. |
test_lru_cache_threaded is randomly failed on PPC64 PowerLinux buildbot: http://buildbot.python.org/all/builders/PPC64%20PowerLinux%203.x/builds/3573/steps/test/logs/stdio ====================================================================== Traceback (most recent call last):
File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_functools.py", line 1129, in test_lru_cache_threaded
self.assertEqual(hits, 45)
AssertionError: 43 != 45 The tests looks incorrect, it should fail when cached function is called simultaneously from different threads. Unfortunately it is hard reproduce the failure on other platform. Proposed patch fixes the test and adds other threaded test, specially for simultaneous call. |
New changeset 19dbee688a30 by Serhiy Storchaka in branch '3.5': New changeset da331f50aad4 by Serhiy Storchaka in branch 'default': New changeset eff50d543c79 by Serhiy Storchaka in branch '3.5': New changeset f4b785d14792 by Serhiy Storchaka in branch 'default': |
New changeset c52f381fe674 by Serhiy Storchaka in branch '3.5': New changeset 73acb8c187d4 by Serhiy Storchaka in branch 'default': |
If the C version is to remain in Python3.5, please make sure it provides all of the carefully designed features of the pure python version:
A number of features of the lru_cache were designed for space savings over speed (lru is all about eviction to make space for a new entry), for thread safety and to not fall apart during reentrancy. It also provides a guarantee that the hash function is not called more than once per element and is called *before* any of the lru structure updates or lookups (this makes reasoning about correctness *much* easier). In these capabilities can't be reliably reproduced for 3.5, I think the whole thing should be reverted and deferred to 3.6. |
Will be satisfied by bpo-24483. I think all other capabilities are satisfied. The GIL is used to satisfy |
I'm witnessing a crash in the C implementation during garbage collection. Interestingly, it only shows in the Py3.6 branch, not in Py3.5 (both latest). Here's the gdb session: Program received signal SIGSEGV, Segmentation fault. IIUC correctly, there is only one entry and the code doesn't expect that. An additional "is empty?" NULL check may or may not be the right fix. It might also be the linking itself that's incorrect here. The code seems to expect a cyclic data structure which is apparently not maintained that way. |
self->root.next and self->root.prev should never be NULL. Could you please provide minimal example of code that produces a crash? |
It's not actually my own code using the lru_cache here. From a quick grep However, I ran test_functools.py of the same installation and it passes, so |
test_fnmatch.py also passes, BTW. |
I've also seen a crash in lru_cache_tp_traverse but with the 3.5.0b3 release build for the OS X 64-bit/32-bit installer. I just stumbled across the segfault by bringing up the interactive interpreter and typing "import ssl". After a lot of playing around, I reduced the failing case to: 1. have an "import pprint" in a startup file referred to by PYTHONSTARTUP *and* 2. "import ssl" must be the very first command entered in the interactive interpreter. Odd! Unfortunately, the release build is a non-debug build and, so far, I have not been able to reproduce the segfault with any other build, debug or non-debug. So, whatever the problem is, it's very build dependent. Here is the OS X system traceback from the segfault: Path: /Library/Frameworks/Python.framework/Versions/3.5.0b3_10_6/Resources/Python.app/Contents/MacOS/Python Date/Time: 2015-07-10 19:57:16.086 -0700 Time Awake Since Boot: 800000 seconds Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) VM Regions Near 0x18: Thread 0 Crashed:: Dispatch queue: com.apple.main-thread Thread 0 crashed with X86 Thread State (64-bit): |
Stefan, Ned, can you reproduce the same crash with following patch? |
Serhiy, I'll try making a 3.5.0b3+patch installer build in the same manner as the 3.5.0b3 installer build. But I may not get to it for several days. |
Sorry about the delay in testing the patch. I just confirmed (1) that I am still able to produce a segfault on OS X as described above under the specific conditions with a 10.6-like installer built with the current 3.5 tip and (2) that, with clru_cache_new.patch applied to the same current 3.5 tip, I no longer see the segfault. So it looks like a fix. |
New changeset 5345e5ce2eed by Serhiy Storchaka in branch '3.5': New changeset 9c6d11d22801 by Serhiy Storchaka in branch 'default': |
Thank you Ned. Is anything left to do with this issue or it can be closed? |
I suspect this change is implicated in bpo-25447. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: