Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need way to expose incremental size of key sharing dicts #72694

Closed
rhettinger opened this issue Oct 22, 2016 · 8 comments
Closed

Need way to expose incremental size of key sharing dicts #72694

rhettinger opened this issue Oct 22, 2016 · 8 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@rhettinger
Copy link
Contributor

BPO 28508
Nosy @rhettinger, @serhiy-storchaka, @zhangyangyu

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-10-22.23:13:20.420>
created_at = <Date 2016-10-22.17:35:42.643>
labels = ['interpreter-core', '3.7', 'invalid']
title = 'Need way to expose incremental size of key sharing dicts'
updated_at = <Date 2016-10-22.23:15:16.162>
user = 'https://github.com/rhettinger'

bugs.python.org fields:

activity = <Date 2016-10-22.23:15:16.162>
actor = 'rhettinger'
assignee = 'none'
closed = True
closed_date = <Date 2016-10-22.23:13:20.420>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2016-10-22.17:35:42.643>
creator = 'rhettinger'
dependencies = []
files = []
hgrepos = []
issue_num = 28508
keywords = []
message_count = 8.0
messages = ['279207', '279208', '279209', '279211', '279227', '279229', '279230', '279231']
nosy_count = 3.0
nosy_names = ['rhettinger', 'serhiy.storchaka', 'xiang.zhang']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue28508'
versions = ['Python 3.7']

@rhettinger
Copy link
Contributor Author

In many Python programs much of the memory utilization is due to having many instances of the same object. We have key-sharing dicts that reduce the cost by storing only in the incremental values. It would be nice to have visibility to the savings.

One possible way to do this is to have sys.getsizeof(d) report only the incremental space. That would let users make reasonable memory estimates in the form of n_instances * sizeof(vars(inst)).

@rhettinger rhettinger added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 22, 2016
@serhiy-storchaka
Copy link
Member

Isn't this already implemented?

@serhiy-storchaka
Copy link
Member

>>> class C:
...     def __init__(self):
...         for i in range(682):
...             setattr(self, 'a%d'%i, None)
... 
>>> sys.getsizeof(C().__dict__) / len(C().__dict__)
4.058651026392962

@zhangyangyu
Copy link
Member

Isn't this already implemented?

Get the same question. dict.__sizeof__ can identify shared dicts.

@rhettinger
Copy link
Contributor Author

Isn't this already implemented?

No.

    >>> class A:
            pass

    >>> d = dict.fromkeys('abcdefghi')
    >>> a = A()
    >>> a.__dict__.update(d)
    >>> b = A()
    >>> b.__dict__.update(d)
    >>> import sys
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b)]]
    [368, 648, 648]
    >>> c = A()
    >>> c.__dict__.update(d)
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b), vars(c)]]
    [368, 648, 648, 648]

There is no benefit reported for key-sharing. Even if you make a thousand of these instances, the size reported is the same. Here is the relevant code:

    _PyDict_SizeOf(PyDictObject *mp)
    {
        Py_ssize_t size, usable, res;
        size = DK_SIZE(mp->ma_keys);
        usable = USABLE_FRACTION(size);
        res = _PyObject_SIZE(Py_TYPE(mp));
        if (mp->ma_values)
            res += usable * sizeof(PyObject*);
        /* If the dictionary is split, the keys portion is accounted-for
           in the type object. */
        if (mp->ma_keys->dk_refcnt == 1)
            res += (sizeof(PyDictKeysObject)
                    - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
                    + DK_IXSIZE(mp->ma_keys) * size
                    + sizeof(PyDictKeyEntry) * usable);
        return res;
    }

It looks like the fixed overhead is included for every instance of a split-dictionary. Instead, it might make sense to take the fixed overhead and divide it by the number of instances sharing the keys (averaging the overhead across the multiple shared instances):

     res = _PyObject_SIZE(Py_TYPE(mp)) / num_instances;

Perhaps use ceiling division:

 res = -(- _PyObject_SIZE(Py_TYPE(mp)) / num_instances);

@serhiy-storchaka
Copy link
Member

Hmm, seems no dict here is shared-key dict.

@rhettinger
Copy link
Contributor Author

Hmm, seems no dict here is shared-key dict.

Yes. That seems to be the case. Apparently, doing an update() to the inst dict cause it to recombine.

@rhettinger
Copy link
Contributor Author

>>> from sys import getsizeof
>>> class A:
	def __init__(self, a, b, c, d, e, f):
		self.a = a
		self.b = b
		self.c = c
		self.d = d
		self.e = e
		self.f = f
		
>>> a = A(10, 20, 30, 40, 50, 60)
>>> b = A(10, 20, 30, 40, 50, 60)
>>> c = A(10, 20, 30, 40, 50, 60)
>>> d = A(10, 20, 30, 40, 50, 60)
>>> [getsizeof(vars(inst)) for inst in [a, b, c, d]]
[152, 152, 152, 152]
>>> [getsizeof(dict(vars(inst))) for inst in [a, b, c, d]]
[368, 368, 368, 368]

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs)
Projects
None yet
Development

No branches or pull requests

3 participants