-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve list() pre-sizing for inputs with known lengths #77415
Comments
The list() constructor isn't taking full advantage of known input lengths or length hints. Ideally, it should pre-size and not over-allocate when the input size is known or can be reasonably estimated. Python 3.8.0a0 (heads/master:091e95e900, Apr 5 2018, 09:48:33)
[Clang 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sys import getsizeof
>>> getsizeof([0] * 10)
144
>>> getsizeof(list([0] * 10))
200
>>> getsizeof(list(range(10)))
200 |
Calling PyObject_LengthHint() adds an overhead. It is significant for short sequences. I work on a patch that reduces it. PR 6493 adds the second call of PyObject_LengthHint() and increases the overhead. As for this issue, in-place repeat overallocates too. >>> a = [0]; a *= 10; getsizeof(a)
200 I think it would be better to make it not preallocating. And maybe it would be worth to avoid overallocating if newsize > allocated + allocated/8 or something like. |
Microbenchmarked with @vstinner's perf module: python -m perf timeit "list([0]*10)" -o new.json checkout master and build python -m perf timeit "list([0]*10)" -o old.json |
See also http://bugs.python.org/issue26814#msg264003 and bpo-26828. |
Should we shrink the list of the length hint was way too big? For example, if the length hint was 100 but the final list of only 10. Should we shrink in that case? I'm asking because it seems like Pablo's PR doesn't shrink the list in that case. I wasn't sure so I checked, del shrinks the list if needed: >>> x=list(range(1000))
>>> import sys
>>> sys.getsizeof(x)
9112
>>> del x[1:]
>>> sys.getsizeof(x)
96 |
@serhiy.storchaka @rhettinger @vstinner Should we better make the pre-allocation if the length of the iterable is known (so we call PyObject_Length and not PyObject_LengthHint)? This will keep the logic simpler (so not shrinking if PyObject_LengthHint gives more than the real length) and it will not be as expensive as calling PyObject_LengthHint. |
Oh. Can you please document the optimization in the following doc as well? |
Sure! Opened #10200 to address that. :) |
Possible backward incompatibility caused by this issue: bpo-39829 |
See also bpo-14126: "Speed up list comprehensions by preallocating the list where possible". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: