New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need way to expose incremental size of key sharing dicts #72694
Comments
In many Python programs much of the memory utilization is due to having many instances of the same object. We have key-sharing dicts that reduce the cost by storing only in the incremental values. It would be nice to have visibility to the savings. One possible way to do this is to have sys.getsizeof(d) report only the incremental space. That would let users make reasonable memory estimates in the form of n_instances * sizeof(vars(inst)). |
Isn't this already implemented? |
>>> class C:
... def __init__(self):
... for i in range(682):
... setattr(self, 'a%d'%i, None)
...
>>> sys.getsizeof(C().__dict__) / len(C().__dict__)
4.058651026392962 |
Get the same question. dict.__sizeof__ can identify shared dicts. |
No. >>> class A:
pass
>>> d = dict.fromkeys('abcdefghi')
>>> a = A()
>>> a.__dict__.update(d)
>>> b = A()
>>> b.__dict__.update(d)
>>> import sys
>>> [sys.getsizeof(m) for m in [d, vars(a), vars(b)]]
[368, 648, 648]
>>> c = A()
>>> c.__dict__.update(d)
>>> [sys.getsizeof(m) for m in [d, vars(a), vars(b), vars(c)]]
[368, 648, 648, 648] There is no benefit reported for key-sharing. Even if you make a thousand of these instances, the size reported is the same. Here is the relevant code: _PyDict_SizeOf(PyDictObject *mp)
{
Py_ssize_t size, usable, res; size = DK_SIZE(mp->ma_keys);
usable = USABLE_FRACTION(size); res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values)
res += usable * sizeof(PyObject*);
/* If the dictionary is split, the keys portion is accounted-for
in the type object. */
if (mp->ma_keys->dk_refcnt == 1)
res += (sizeof(PyDictKeysObject)
- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
+ DK_IXSIZE(mp->ma_keys) * size
+ sizeof(PyDictKeyEntry) * usable);
return res;
} It looks like the fixed overhead is included for every instance of a split-dictionary. Instead, it might make sense to take the fixed overhead and divide it by the number of instances sharing the keys (averaging the overhead across the multiple shared instances): res = _PyObject_SIZE(Py_TYPE(mp)) / num_instances; Perhaps use ceiling division:
|
Hmm, seems no dict here is shared-key dict. |
Yes. That seems to be the case. Apparently, doing an update() to the inst dict cause it to recombine. |
>>> from sys import getsizeof
>>> class A:
def __init__(self, a, b, c, d, e, f):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
self.f = f
>>> a = A(10, 20, 30, 40, 50, 60)
>>> b = A(10, 20, 30, 40, 50, 60)
>>> c = A(10, 20, 30, 40, 50, 60)
>>> d = A(10, 20, 30, 40, 50, 60)
>>> [getsizeof(vars(inst)) for inst in [a, b, c, d]]
[152, 152, 152, 152]
>>> [getsizeof(dict(vars(inst))) for inst in [a, b, c, d]]
[368, 368, 368, 368] |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: