-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the @cached_property decorator #65344
Comments
cached properties are widely used in various prominent Python projects such as Django and pip (and many more). Possible benefits:
|
There's currently an example of a cached property decorator implementation in the wiki, although it doesn't leverage functools: |
It could make sense to add clean, working recipes to e.g. the functools documentation. The cached_property in the wiki uses a TTL, other like Pyramid’s reify decorator make properties that ensure the fget function is called only once per instance, and there may be subtly different variants out there. I don’t know if there’s a universally useful variant that should be added to the sdlib right now. (I don’t think a C implementation is needed.) On a related note, the Python docs about desciptors may be missing entry-level explanations, as described here: http://me.veekun.com/blog/2012/05/23/python-faq-descriptors/ |
The default implementation should simple: 2014-04-05 1:11 GMT+04:00 Éric Araujo <report@bugs.python.org>:
|
I just checked and werkzeug uses the same implementation as Django & pip. 2014-04-06 14:31 GMT+04:00 Omer Katz <report@bugs.python.org>:
|
For what it's worth, I just released cached-property on PyPI and someone suggested I join the discussion here. Package: https://pypi.python.org/pypi/cached-property Notes:
|
Does this work for you? >>> class X:
... @property
... @functools.lru_cache(None)
... def get_x(self):
... print("computing")
... return 5
...
>>> x = X()
>>> x.get_x
computing
5
>>> x.get_x
5 |
Will that implementation cause a memory leak? Won't the lru_cache have a dict mapping {self: result}, meaning that |
Oh, you're right. Sorry for the noise. |
Can we make this happen for 3.6? |
May be. If somebody provide a patch. |
I like the idea of an helper to build a property on-demand, but I dislike the TTL idea, it seems too specific. If you need TTL, implement your own decorator, or use a regular property and implement your own logic there. |
Most implementations these days support TTL because they require it. בתאריך יום ב׳, 15 בפבר׳ 2016 ב-18:33 מאת STINNER Victor <
|
The TTL idea is completely outlandish in a general-purpose library. Let's keep things simple and not try to build a kitchen-sink decorator. |
In that case, most of the users won't use the standard library On Mon, Feb 15, 2016, 19:51 Antoine Pitrou <report@bugs.python.org> wrote:
|
I'm sure many people don't need a TTL on a cached property. Please stop arguing about that. |
"Most implementations these days support TTL because they require it." I used this pattern a lot in my old Hachoir project, but I never needed the TTL thing. In my case, data come from the disk and are never invalidated. Example with the description property: |
I've used the cached_property pattern across many different projects, and never yet wanted a TTL. The simple "cache for the lifetime of the instance" behavior is easy to implement, easy to understand, and useful for a wide range of scenarios where instances are effectively immutable. +1 for adding this to the stdlib (not just as a docs recipe); I'll see about providing a patch. |
Attaching a patch with the simplest version of cached_property (tehnique is not original, similar code is found in Django, Bottle, Flask, the cached_property lib, and many other places). |
The decorator should support classes with __slots__. |
How do you propose that slots should be supported? Part of the value of cached_property is that cached access is a normal Python attribute access with no function call overhead. I am not interested in adding support for slots if it loses that benefit. I would not use such an implementation myself. I may be missing some option, but I can't see how to add slots support without losing that benefit, because it requires the ability to store an instance attribute under the same name as the descriptor, taking advantage of the fact that instance dict overrides a non-data descriptor. This implementation of cached_property has been in wide use in multiple very popular projects for many years. The fact that none of those implementations have ever needed to add slots support suggests that it isn't actually that important. If you have an idea for how to support slots without making cached_property less valuable for the common case, please share it and I am willing to implement it. Otherwise, I propose that this implementation which is already proved in wide usage should be added to the stdlib; I can add a documentation note that objects with slots are not supported. If there is demand for cached_property_with_slots, it can be added separately. |
Carl's patch looks good to me, but my one request in relation to the __slots__ situation would be to give it a custom error message better indicating that lazy initialization of pre-assigned instance slots isn't supported. Currently that case just lets the underlying AttributeError escape, which is going to be thoroughly cryptic for folks that try it and may look like an accidental oversight rather than a deliberate design decision: >>> class NoDict:
... __slots__ = ()
...
>>> NoDict().__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoDict' object has no attribute '__dict__' Suggested error message:
Essentially, this line: val = instance.__dict__[self.func.__name__] = self.func(instance) would become: attrname = self.func.__name__
try:
cache = instance.__dict__
except AttributeError:
msg = f"No '__dict__' attribute on {type(instance).__name__!r} instance to cache {attrname!r} property."
raise TypeError(msg) from None
val = cache[attrname] = self.func(instance) I believe a future C implementation could potentially be reworked to be __slots__ compatible, but I'd have to go read the source code to be sure, and I don't think that's necessary. Note: the class machinery itself already detects actual name conflicts between slot and method definitions: >>> class SlotConflict:
... __slots__ = ("attr")
... @property
... def attr(self):
... return 42
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'attr' in __slots__ conflicts with class variable The requested runtime check for the |
Makes sense, Nick, thanks. The current error message for that situation is pretty cryptic indeed. I'll update the patch with your suggestion. Do you think a documentation note about slots is also warranted? |
Please no need to rush. We have 1.5 years to do this right. I believe supporting __slots__ is very important. But this is not easy task. Seems this requires C implementation. We should also design the behavior in case of setting or deleting cached property. How about just methods without arguments? Some APIs use methods without arguments instead of properties. Wouldn't be better to design a decorator that memoizes both properties and methods without arguments? |
Uploaded a patch updated per Nick's comment. Not opposed to waiting to see if someone is motivated to implement a version in C that supports __slots__, but if that doesn't happen by the Python 3.7 feature deadline, I don't think it should block this proven version. It also occurred to me that we could probably support __slots__ in pure Python without harming the non-slots case by implementing a fallback cache in the descriptor itself, keyed by instance in a WeakKeyDictionary. I don't love having the behavior differ so much between the slots and non-slots case, but maybe it's better than not supporting slots at all. Re setting and deleting: under the current patch, if you set or delete a cached property, you set or delete the cached value. I think this is fine and useful behavior, but it could perhaps be added explicitly to the documentation. |
(We wouldn't be able to match the set/delete behavior in a slots-supporting fallback implemented in Python.) |
I'm delighted to see a patch submitted, but I'm concerned that it isn't thread safe. This was implemented in the cached-property package I maintain: |
Thanks, Danny. Uploaded a version of the patch that adds thread-safety (with a test). Unlike in your lib, I didn't make it a separate version of the decorator; since the lock is not checked on cached access, its slight overhead on the initial computation is probably not an issue, likely outweighed by the cost of the computation itself. If someone decides to pursue a C version with slots support, hopefully at least these tests are still useful :-) |
Speaking of this hypothetical C version, what's the current policy on shipping stdlib C code that can't be emulated in pure Python? (I'm thinking of e.g. PyPy). |
I realised that PEP-487's set_name can be used to detect the def __set_name__(self, owner, name):
try:
slots = owner.__slots__
except AttributeError:
return
if "__dict__" not in slots:
msg = f"'__dict__' attribute required on {owner.__name__!r} instances to cache {name!r} property."
raise TypeError(msg) It also occurred to me that at the expense of one level of indirection in the runtime lookup, PEP-487's __set_name__ hook and a specified naming convention already permits a basic, albeit inefficient, "cached_slot" implementation: class cached_slot:
def __init__(self, func):
self.func = func
self.cache_slot = func.__name__ + "_cache"
self.__doc__ = func.__doc__
self._lock = RLock()
def __set_name__(self, owner, name):
try:
slots = owner.__slots__
except AttributeError:
msg = f"cached_slot requires '__slots__' on {owner!r}"
raise TypeError(msg) from None
if self.cache_slot not in slots:
msg = f"cached_slot requires {self.cache_slot!r} slot on {owner!r}"
raise TypeError(msg) from None
def __get__(self, instance, cls=None):
if instance is None:
return self
try:
return getattr(instance, self.cache_slot)
except AttributeError:
# Cache not initialised yet, so proceed to double-checked locking
pass
with self.lock:
# check if another thread filled cache while we awaited lock
try:
return getattr(instance, self.cache_slot)
except AttributeError:
# Cache still not initialised yet, so initialise it
setattr(instance, self.cache_slot, self.func(instance))
return getattr(instance, self.cache_slot)
def __set__(self, instance, value):
setattr(instance, self.cache_slot, value)
def __delete__(self, instance):
delattr(instance, self.cache_slot) It can't be done as a data descriptor though (and can't be done efficiently in pure Python), so I don't think it makes sense to try to make cached_property itself work implicitly with both normal attributes and slot entries - instead, cached_property can handle the common case as simply and efficiently as possible, and the cached_slot case can be either handled separately or else not at all. The "don't offer cached_slot at all" argument would be that, given slots are used for memory-efficiency when handling large numbers of objects and lazy initialization is used to avoid unnecessary computations, a "lazily initialised slot" can be viewed as "64 bits of frequently wasted space", and hence we can expect demand for the feature to be incredibly low (and the community experience to date bears out that expectation). |
Having just said that I don't see much use for lazily initialized slots, it does occur to me that __weakref__ is essentially such a slot, and __dict__ itself could usefully be such a slot for types where instances mostly have a fixed set of attributes, but still allow for arbitrary additional attributes at runtime. However, I do think any such proposal should be considered as its own issue, rather than being treated as part of this one - the assumed absence of the instance dict creates too many differences in the design trade-offs involved and the potential use cases. |
So it sounds like the current approach here is good to move forward? If I update the patch during the PyCon sprints, we could merge it? |
I believe so. בתאריך יום ב׳, 14 במאי 2018 ב-19:30 מאת Carl Meyer <
|
FWIW, over the past decade, I've used variants of CachedProperty a number of times and have often had issues that later required messing with its internals (needing a way to invalidate or clear the cache, mock patching the underlying function for testing, consistency between multiple cached properties cached in different threads, inability to run the method through a debugger, inadvertently killing logging or other instrumentation, moving the cache valued from an instance variables to an external weakref dictionary etc). I proposed the idea of a CachedProperty in descriptor tutorials over a decade ago. Since then, I've grown wary of the idea of making them available for general use. Instead, we're better with a recipe that someone can use to build their understanding and then customize as necessary. The basic recipe is simple so there isn't much of a value add by putting this in the standard library. If we want to add another property() variant, the one I've had the best luck with is CommonProperty() which lets you re-use the same getter and setter methods for multiple properties (the name of the property variable gets passed in as the first argument). |
This is already supported by the simple implementation in the patch, it's spelled
This is easy to do with the current implementation, you can replace the cached-property descriptor on the class with
The patch attached here is already thread-safe and will be consistent between threads.
If you
This would be a totally different descriptor that doesn't share much implementation with the one proposed here, so I don't see how providing the common version inhibits anyone from writing something different they need for their case.
It's simple once you understand what it does, but it's quite subtle in the way it relies on priority order of instance-dict attributes vs non-data descriptors. My experience over the past decade is different from yours; I've found that the simple |
I agree this would be a useful addition to the stdlib. The code might seem reasonably short, but implementing new descriptors is an advanced topic (I'd rather avoid it myself). |
Sent a PR with the patch. Nick, I tried your |
I think it would make sense to remove the exception wrapping from the __set_name__ calls - I don't think we're improving the ease of understanding the tracebacks by converting everything to a generic RuntimeError, and we're hurting the UX of descriptor validation cases like this one. [1] https://github.com/python/cpython/blob/master/Objects/typeobject.c#L7263 |
I filed https://bugs.python.org/issue33576 to cover removing the exception wrapping from __set_name__ errors. |
Makes sense to me. Sounds like a separate issue and PR; I filed bpo-33577 and will work on a patch. |
Oops, never mind; closed mine as dupe. |
This has now been merged. Thanks for the multiple iterations on the implementation Carl, and thanks for the original proposal, Omer! |
Thanks everyone for the thoughtful and careful reviews! Patch is much improved from where it started. And thanks Nick for merging. |
FYI there is discussion in bpo-34995 about the usage of @cached_property with abstract methods. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: