-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in memory use of shared key dictionaries for "compact dicts" #84297
Comments
The current implementation of dicts prevents keys from being shared when the order of attribute differs from the first instance created. Consider the class: class C:
opt = DEFAULT
def __init__(self, attr, optional=None):
if optional:
self.opt = optional
self.attr = attr This is a reasonable way to write a class, but has unpredictable memory use. The language specification says that the dicts maintain insertion order, but the wording implies that this only to explicit dictionaries, not instance attribute or other namespace dicts. Either we should allow key sharing in these cases, or clarify the documentation. |
But the following class should not lead to unlimited memory consumption when create new instances: class C:
count = 0
def __init__(self):
count = self.__class__.count
self.__class__.count = count + 1
setattr(self, f'a{count}', count) |
Indeed it shouldn't. |
Just to clarify. class AlwaysShared:
opt = DEFAULT
def __init__(self, attr, optional=None):
self.attr = attr
if optional:
self.opt = optional
class SometimesShared:
opt = DEFAULT
def __init__(self, attr, optional=None):
if optional:
self.opt = optional
self.attr = attr The class AlwaysShared always has shared keys, whereas the class SometimesShared is not shared if |
I think the current behavior is a guard against such pitfall. If you allow to add new keys when not all other keys are set, you can end with sharing growing set of keys. |
You can't end up with a growing set of keys. |
This can be mitigated, if not entirely fixed, by storing an ordering bit vector in the values. This way all instances of the class SometimesShared in the example above can share the keys. The keys might be ("optional", "attr") For any instances with only "attr" as an attibute, the values would be (NULL, value) and the order would be (1,) The downsides of this approach are:
Overall, I expect the improved sharing to more than compensate for the disadvantages. |
I expect the opposite. This makes all dicts pay a price (in space, initialization time, and complexity) for a micro-optimization of an uncommon case (the normal case is for __init__ to run and set all the keys in a consistent order). It is unlikely that the "benefits" to never be felt in real-word applications, but "disadvantages" would affect every Python program.
That is a quite liberal reading of the spec. I would object to making instance and namespace dicts behave differently. That would be a behavior regression and we would forever have to wrestle with the difference. |
On 22.09.2021 21:02, Raymond Hettinger wrote:
I agree. Keeping the insertion order is essential for many common I think for the case you mention, a documentation patch would be |
Raymond, Only split dicts need the extra field. Classes where many instances do not have exactly the same set of attributes may be more common than you think. PR 28520 actually makes the dictionary code a bit simpler, as we don't need to maintain the invariant that value arrays cannot have holes. Maintaining an order is simple and cheap: order = (order<<4) | insertion_index There are pros and cons to both schemes: PR 28520 and the current implementation. The problem with the current scheme is that it only works well for classes where all instances are initialized with exactly the same attributes, and in the same order. The PR 28520 scheme can handle those cases where order and key set differ a bit, but has a maximum size of 16 before the dict must be combined. We need as many instances as possible to have split dictionaries, to get faster-cpython/ideas#72 working well as it will make the cost of not sharing even greater, relatively. |
There are several common idioms which do not work well with shared dictionaries.
|
I found regression caused by #72706.
|
PyDict_Keys(), PyDict_Values(), and PyDict_Items() don't respect insertion order too. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: