-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
sizeof set after set_merge() is doubled from 3.5 #74135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
(original thread is https://mail.python.org/pipermail/python-list/2017-March/720391.html) this commit doubles sizeof set object created by set_merge(). $ /usr/bin/python3
Python 3.5.2+ (default, Sep 22 2016, 12:18:14)
[GCC 6.2.0 20160927] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> s = set(range(10))
>>> sys.getsizeof(frozenset(s))
736 $ python3
Python 3.6.0 (default, Dec 30 2016, 20:49:54)
[GCC 6.2.0 20161005] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> s = set(range(10))
>>> sys.getsizeof(frozenset(s))
1248 |
https://gist.github.com/methane/8faf12621cdb2166019bbcee65987e99 But I think it is still larger than idiomatic. See this code: code: When original minused is X, newsize will be about 2X ~ 4X. For set.add(), preserving extra space for further add() make sense. How do you think, Raymond? |
I'll look at this over the next week or two. I don't really like the proposed patch at all but will carefully think through the speed/space trade-offs. |
Hmm, I wonder why I'm not seeing the same sizes you are seeing. $ cat setsize.py
from sys import getsizeof
print( [getsizeof(frozenset(range(n))) for n in range(20)] )
$ python3.4 setsize.py
[224, 224, 224, 224, 224, 224, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736]
$ python3.5 setsize.py
[224, 224, 224, 224, 224, 224, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736]
$ python3.6 setsize.py
[224, 224, 224, 224, 224, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736, 736]
$ python3.4 --version
Python 3.4.4
$ python3.5 --version
Python 3.5.3
$ python3.6 --version
Python 3.6.1 |
See set_update_internal(). This happens only when iterable is set or dict. >>> import sys
>>> sys.getsizeof(set(range(10)))
736
>>> sys.getsizeof(set(set(range(10))))
1248
>>> sys.getsizeof(set(dict.fromkeys(range(10))))
1248 |
$ ./python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(frozenset(range(n))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )' 3.5: [(0, 112), (6, 368), (22, 1136), (86, 4208), (342, 16496), (1366, 65648), (5462, 262256)] $ ./python -c 'N = 6000; from sys import getsizeof; s = [getsizeof(frozenset(set(range(n)))) for n in range(N)]; print( [(n, s[n]) for n in range(N) if not n or s[n] != s[n-1]] )' 3.5: [(0, 112), (6, 240), (8, 368), (16, 624), (32, 1136), (64, 2160), (128, 4208), (256, 8304), (512, 16496), (1024, 32880), (2048, 65648), (4096, 131184)] frozenset(range(n)) is slightly larger in 3.7 than in 3.6. It is 4 times larger for about 10% of sizes. frozenset(set(range(n))) is 2 times larger in 3.6 than in 3.5 for all sizes >= 16. |
This is intensional: load factor is reduced from 66% to 60%. (-10%) |
I think the best thing to do is to undo the refactoring in 4897300 . It is was intended be neutral but did affect set_update_internal() for small sets. |
I agree. Before thinking about rebalance between size and speed, |
Do you want to prepare a PR for me? I not yet set-up with the ways of Github. Please limit the PR to just unwinding the refactoring in the simplest possible way.. If in the future you want to chat about speed/space trade-offs, we can do that offline. I've spent years thinking about this, interacting with users, discussing with other devs, speaking on the topic, and working through use cases for sets. The original reasons for the choices made largely are still true today. I would be happy to walk you through the history (the tracker isn't a good place to do this, IRC would serve us better). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: