Need way to expose incremental size of key sharing dicts #72694

rhettinger · 2016-10-22T17:35:43Z

BPO	28508
Nosy	@rhettinger, @serhiy-storchaka, @zhangyangyu

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-10-22.23:13:20.420>
created_at = <Date 2016-10-22.17:35:42.643>
labels = ['interpreter-core', '3.7', 'invalid']
title = 'Need way to expose incremental size of key sharing dicts'
updated_at = <Date 2016-10-22.23:15:16.162>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2016-10-22.23:15:16.162>
actor = 'rhettinger'
assignee = 'none'
closed = True
closed_date = <Date 2016-10-22.23:13:20.420>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2016-10-22.17:35:42.643>
creator = 'rhettinger'
dependencies = []
files = []
hgrepos = []
issue_num = 28508
keywords = []
message_count = 8.0
messages = ['279207', '279208', '279209', '279211', '279227', '279229', '279230', '279231']
nosy_count = 3.0
nosy_names = ['rhettinger', 'serhiy.storchaka', 'xiang.zhang']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue28508'
versions = ['Python 3.7']

rhettinger · 2016-10-22T17:35:42Z

In many Python programs much of the memory utilization is due to having many instances of the same object. We have key-sharing dicts that reduce the cost by storing only in the incremental values. It would be nice to have visibility to the savings.

One possible way to do this is to have sys.getsizeof(d) report only the incremental space. That would let users make reasonable memory estimates in the form of n_instances * sizeof(vars(inst)).

serhiy-storchaka · 2016-10-22T18:04:53Z

Isn't this already implemented?

serhiy-storchaka · 2016-10-22T18:06:57Z

>>> class C:
...     def __init__(self):
...         for i in range(682):
...             setattr(self, 'a%d'%i, None)
... 
>>> sys.getsizeof(C().__dict__) / len(C().__dict__)
4.058651026392962

zhangyangyu · 2016-10-22T18:25:43Z

Isn't this already implemented?

Get the same question. dict.__sizeof__ can identify shared dicts.

rhettinger · 2016-10-22T22:16:58Z

Isn't this already implemented?

No.

    >>> class A:
            pass

    >>> d = dict.fromkeys('abcdefghi')
    >>> a = A()
    >>> a.__dict__.update(d)
    >>> b = A()
    >>> b.__dict__.update(d)
    >>> import sys
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b)]]
    [368, 648, 648]
    >>> c = A()
    >>> c.__dict__.update(d)
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b), vars(c)]]
    [368, 648, 648, 648]

There is no benefit reported for key-sharing. Even if you make a thousand of these instances, the size reported is the same. Here is the relevant code:

    _PyDict_SizeOf(PyDictObject *mp)
    {
        Py_ssize_t size, usable, res;

        size = DK_SIZE(mp->ma_keys);
        usable = USABLE_FRACTION(size);

        res = _PyObject_SIZE(Py_TYPE(mp));
        if (mp->ma_values)
            res += usable * sizeof(PyObject*);
        /* If the dictionary is split, the keys portion is accounted-for
           in the type object. */
        if (mp->ma_keys->dk_refcnt == 1)
            res += (sizeof(PyDictKeysObject)
                    - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
                    + DK_IXSIZE(mp->ma_keys) * size
                    + sizeof(PyDictKeyEntry) * usable);
        return res;
    }

It looks like the fixed overhead is included for every instance of a split-dictionary. Instead, it might make sense to take the fixed overhead and divide it by the number of instances sharing the keys (averaging the overhead across the multiple shared instances):

     res = _PyObject_SIZE(Py_TYPE(mp)) / num_instances;

Perhaps use ceiling division:

 res = -(- _PyObject_SIZE(Py_TYPE(mp)) / num_instances);

serhiy-storchaka · 2016-10-22T23:00:11Z

Hmm, seems no dict here is shared-key dict.

rhettinger · 2016-10-22T23:13:20Z

Hmm, seems no dict here is shared-key dict.

Yes. That seems to be the case. Apparently, doing an update() to the inst dict cause it to recombine.

rhettinger · 2016-10-22T23:15:16Z

>>> from sys import getsizeof
>>> class A:
	def __init__(self, a, b, c, d, e, f):
		self.a = a
		self.b = b
		self.c = c
		self.d = d
		self.e = e
		self.f = f
		
>>> a = A(10, 20, 30, 40, 50, 60)
>>> b = A(10, 20, 30, 40, 50, 60)
>>> c = A(10, 20, 30, 40, 50, 60)
>>> d = A(10, 20, 30, 40, 50, 60)
>>> [getsizeof(vars(inst)) for inst in [a, b, c, d]]
[152, 152, 152, 152]
>>> [getsizeof(dict(vars(inst))) for inst in [a, b, c, d]]
[368, 368, 368, 368]

rhettinger added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 22, 2016

rhettinger closed this as completed Oct 22, 2016

rhettinger added the invalid label Oct 22, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need way to expose incremental size of key sharing dicts #72694

Need way to expose incremental size of key sharing dicts #72694

rhettinger commented Oct 22, 2016

rhettinger commented Oct 22, 2016

serhiy-storchaka commented Oct 22, 2016

serhiy-storchaka commented Oct 22, 2016

zhangyangyu commented Oct 22, 2016

rhettinger commented Oct 22, 2016

serhiy-storchaka commented Oct 22, 2016

rhettinger commented Oct 22, 2016

rhettinger commented Oct 22, 2016

Need way to expose incremental size of key sharing dicts #72694

Need way to expose incremental size of key sharing dicts #72694

Comments

rhettinger commented Oct 22, 2016

rhettinger commented Oct 22, 2016

serhiy-storchaka commented Oct 22, 2016

serhiy-storchaka commented Oct 22, 2016

zhangyangyu commented Oct 22, 2016

rhettinger commented Oct 22, 2016

serhiy-storchaka commented Oct 22, 2016

rhettinger commented Oct 22, 2016

rhettinger commented Oct 22, 2016