-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose cache validity checking support in ABCMeta #61036
Comments
ABCMeta uses an internal counter to invalidate the negative caches when a register() call changes the virtual inheritance graph. (A global count of register() calls is stored on ABCMeta, which ABCMeta.__instancecheck__ and ABCMeta.__subclasscheck__ then compare to a copy cached on the individual ABC) To properly handle register() method calls on ABCs, generic function implementations also need a way to invalidate their own caches when that graph changes. It seems like the simplest way to handle this would be to expose a read-only "ABCMeta.cache_token" property. Generic function implementations could then work the same way as ABCMeta itself: store explicit implementation registrations and a cache of derived implementations separately, and if a saved copy of the cache token doesn't match the current value of ABCMeta.cache_token, clear the derived cache. |
This will be useful for PEP-443. From Nick's comment on python-dev (http://mail.python.org/pipermail/python-dev/2013-May/126535.html): "Given the global nature of the cache invalidation, it may be better as |
Rather than exposing the "cache token" (which looks like an implementation detail), you may allow third-party code to register a handler which will be called when an ABC's registrations are modified: def abc_handler(abc):
"""
Called when the concrete class registrations for ABC `abc`
are updated.
""" or even: def abc_handler(abc, added, removed):
"""
Called when the concrete class registrations for ABC `abc`
are updated. `added` is an iterable of concrete classes which
have been registered on the ABC, `removed` is an iterable of
concrete classes which have been unregistered.
""" |
I thought about that originally, but there's only ever one object graph for the process, and as soon as you break any one edge in that graph you pretty much invalidate anything based on caching traversal results. (More accurately: it's almost always going to be cheaper to blow away and rebuild your cache than it is to determine whether or not your cache was actually affected by the graph change, particularly given the fact that object graph changes will almost always happen while the caches are still cold during application startup). So it's not really an implementation detail, it's a promise to provide a unique token for the current state of the object graph that can be used reliably for cache invalidation. Thus, I favour exposing a cache token as the simplest thing that could possibly work - building a mechanism for "tell me when the object graph changes" is a complex solution when the only question we need to answer is "is my cache still valid?". So the recommended idiom with this design would be: new_token = abc.get_cache_token()
if new_token != cached_token:
cache.clear()
cached_token = new_token
else:
# Do something with the cache
# Handle a cache miss A callback based system is actually *harder* to use, because you can't simply ask the question "Is my cache still valid?" - you have to register a callback that sets a flag, or something similar. Your lookup code ends up being: if cache_invalidated:
cache.clear()
cache_invalidated = False
else:
# Do something with the cache
# Handle a cache miss However, now you have a potential problem, because your state update (setting the flag) is decoupled from checking the flag. That makes it harder to ensure correct sequencing than is the case with a simple inline check of a cache validity token. Choosing a solution that is harder to implement and harder to use is definitely not a good idea :) |
And when I say "originally" I mean "after I saw how ABCs already implemented this capability" :) |
Ah, I was forgetting that a ABC registration also affects the subclasses
Sounds fair.
Well, the point of a callback based system is for the callback to It's a basic cost/benefit compromise: the callback system makes Besides, a callback-based system can be more future-proof than exposing |
Ah, but that's the other trick: we *don't know* if we need to recalculate our whole cache when the object graph changes.
The cache invalidation design in the ABCs is a smart compromise, since it leaves recreating the cache entries to the last possible moment: the next time a type is looked up after the object graph has changed. It doesn't matter if there is one graph change or a thousand in that time, we only do the fresh lookup once. And if the type is never looked up again, we don't even need to do that much. In a typical application, the expected usage model for both ABCs and generic functions is the same:
The ABC design takes the view that the extra runtime check for cache validity in the absence of direct inheritance or registration is worth it to avoid repeated cache clearing during application startup for ABCs that may never even be used by the application. I believe exactly the same reasoning holds true for generic functions: triggering callbacks when the object graph changes means we may end up doing a lot of work clearing caches that the application isn't even using. Delaying the check to call time means that only the caches that matter will ever be touched. |
Yup, hence the "cost/benefit compromise" ;-) My assumption here was that ABC registrations change quite infrequently |
Review the patch, please. Committing in 10... 9... 8... |
New changeset 5b17d5ca2ff1 by Łukasz Langa in branch 'default': |
New changeset d9828c438889 by Łukasz Langa in branch 'default': |
Please expose this as an attribute of the class or module, not as a function. A function is orders of magnitude slower than attribute access, and the entire point of exposing this is to allow caches to be invalidated. In order to be useful for cache invalidation, the token has to be read on *every* access to a cache, thus adding unnecessary overhead to every cache access. |
-1. Exposing a function allows to modify the underlying implementation |
Trading correctness for speed is almost never a good idea. If people are worried about speed to that level, they can either bypass the public API and access the private attribute directly (after profiling their application to ensure the cache validity checks are the bottleneck), or else migrate to PyPy to eliminate the function call overhead (amongst many other speed improvements). |
Antoine Pitrou added the comment:
This doesn't make any sense. Once you've exposed an API that gives So there is actually zero improvement in encapsulation: it's purely a from abc import object_graph
if object_graph.version != old_version:
... Nick Coghlan added the comment:
How is "correctness" relevant to the question of whether a readable |
All that being said, I took out some time to get actual numbers, and And even on a relatively old machine, that 3 times slower amounts to |
Because it may be computed whenever the function call is done, rather
That's a possibility indeed. |
The reason I switched from suggesting an attribute/property to a module level function is because we don't really support properties for process global state. That's why sys has so many getters in it - to make them properties, you would have to add a class object to act as a holder for the property. In this case, we appear to have a singleton that could hold the property (ABCMeta), which is why my original suggestion was to use an ABCMeta attribute. However, what I realised yesterday is that because ABCMeta is a type subclass, it means the only way to make the attribute a property in the future would be to add a metametaclass to hold the property definition. Once I realised that... "if the implementation is hard to explain, it's a bad idea" tipped me over the edge into preferring a simple module level getter function that supported exactly the use cases we care about :) |
About future-proofing: do we want to document that the token is an opaque integer (as is done now) or just an opaque object that supports equality comparisons? |
The latter is probably better (it should just need a slight tweak to |
New changeset a9f73b44ea0e by R David Murray in branch 'default': |
I don't see that there's anything other than the doc change that needs done here, so I'm closing this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: