bpo-36229: Avoid unnecessary copies for list, set, and bytearray ops.#12226
bpo-36229: Avoid unnecessary copies for list, set, and bytearray ops.#12226brandtbucher wants to merge 5 commits intopython:masterfrom
Conversation
If a list, set, or bytearray object's refcount is exactly one, binary operations delegate to their in-place counterparts, rather than creating expensive copies.
This removes the need for separate declarations of in-place functions.
This still tests that the list constructor efficiently allocates space... it just might not be *exactly* the same size as the given example, as was implied before.
...feels pretty cool.
|
This will only work when there is no reference on the object like Except for creating test cases or large lists of constants, does this happen a lot? I would think most of the time the operands come from arguments or a local variable and this optimization would not be used, and most of the time Did you see on improvement for real use-cases? |
|
@remilapeyre: beyond the first element, it's irrelevant whether the lists are bound to names or how many references they have. Currently, when adding
Only the second of these is actually returned - the first is used once and thrown away. The effect of this patch is to only create at most one copy for any arbitrarily long list summation, as this intermediate result will just mutate in-place for lists 3- Anyone who's been disappointed by the quadratic complexity of /* It's tempting to use PyNumber_InPlaceAdd instead of
PyNumber_Add here, to avoid quadratic running time
when doing 'sum(list_of_lists, [])'. However, this
would produce a change in behaviour: a snippet like
empty = []
sum([[x] for x in range(10)], empty)
would change the value of empty. */
temp = PyNumber_Add(result, item);With the patch, |
|
@brandtbucher I get it now, thanks for explaining. This seems like a very nice trick. Isn't the idiomatic way to do I made some measures and with your improvement Without your change |
The relevant changes are all in the first commit, which has a cleaner diff. The second commit reorganized the affected functions to avoid separate declarations... which sort of mangled it. The additions are actually quite small (about 30 lines of code, total).
https://bugs.python.org/issue36229