a suboptimal check in list_extend() #75330

orenmn · 2017-08-08T15:30:36Z

BPO	31147
Nosy	@rhettinger, @orenmn

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-08-13.05:12:20.432>
created_at = <Date 2017-08-08.15:30:36.122>
labels = ['interpreter-core', '3.7', 'performance']
title = 'a suboptimal check in list_extend()'
updated_at = <Date 2017-08-13.19:14:51.449>
user = 'https://github.com/orenmn'

bugs.python.org fields:

activity = <Date 2017-08-13.19:14:51.449>
actor = 'Oren Milman'
assignee = 'none'
closed = True
closed_date = <Date 2017-08-13.05:12:20.432>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2017-08-08.15:30:36.122>
creator = 'Oren Milman'
dependencies = []
files = []
hgrepos = []
issue_num = 31147
keywords = []
message_count = 4.0
messages = ['299933', '299967', '300211', '300229']
nosy_count = 2.0
nosy_names = ['rhettinger', 'Oren Milman']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue31147'
versions = ['Python 3.7']

orenmn · 2017-08-08T15:30:36Z

in listobject.c, in case list_extend() receives an 'iterable' which isn't 'self'
nor a tuple nor a list, we have the following (heavily edited for brevity):
mn = Py_SIZE(self) + PyObject_LengthHint(iterable);
list_resize(self, mn);

... // self is extended to also include the elements of 'iterable'.

    // (*)
    if (Py_SIZE(self) < self->allocated) {
        list_resize(self, Py_SIZE(self));
    }

IMHO, the condition in (*) is mostly useless, for two reasons:

the list_resize() which is called in () does nothing in case
(Py_SIZE(self) >= (self->allocated >> 1)) is true. In particular, this call
to list_resize() would have done nothing if it had been called while the
condition in () was false.
the condition in (*) is false only in the following cases:
- list_resize(self, mn) caused
  (self->allocated == Py_SIZE(self) + actual_length_of_iterable) to be
  true. e.g.:
  Py_SIZE(self) = 58 and PyObject_LengthHint(iterable) == 8 and
  actual_length_of_iterable == 22
  (because 66 + 66 // 8 + 6 == 80 == 58 + 22).
- list_resize(self, mn) caused
  (self->allocated < Py_SIZE(self) + actual_length_of_iterable), which
  sometime later caused list_extend() to call app1(), which called
  list_resize(), which caused
  (self->allocated == Py_SIZE(self) + actual_length_of_iterable) to be true.
  e.g.:
  Py_SIZE(self) == 58 and PyObject_LengthHint(iterable) == 8 and
  actual_length_of_iterable == 39
  (because 66 + 66 // 8 + 6 == 80 and
  81 + 81 // 8 + 6 == 97 == 58 + 39)
so ISTM that the condition is only rarely false, especially as
PyObject_LengthHint(iterable) >= actual_length_of_iterable is usually true
(i guess).

Thus, i believe we should either change the condition in (*) to
(Py_SIZE(self) < (self->allocated >> 1)), or remove it (and always call
list_resize()).

(note that i ignored error cases, as ISTM they are quite irrelevant to the
issue.)

rhettinger · 2017-08-09T04:09:30Z

-0 The test "Py_SIZE(self) < self->allocated" is very cheap so I don't mind leaving it in to handle the rare cases (or to prevent future bugs if the list_resize logic ever changes).

rhettinger · 2017-08-13T05:12:19Z

I'm concerned that is would provide near zero benefit but would set us up for future bugs by tightly coupling two ideas that I would like to be able to reason about separately.

The logic for list.extend() makes a potential overallocation, fills the list, and fixes-up the overallocation if needed (I want that logic should all stay together). And list_resize() should have its own separate logic as well.

For a course grained operation like extend(), I really don't mind having a code path that makes a relatively cheap test that only occasionally useful.

So, thank you for the suggestion, but I'm going to decline on the grounds that 1) the current code is sometimes useful, 2) it is always cheap, 3) that removing it is odds with the principles of loose coupling and high cohesion, 4) it creates a risk of introducing unnoticed errors in the future the list_resize logic were to change, and 5) no one else has stepped forward to advocate this patch.

orenmn · 2017-08-13T19:14:51Z

thank you for the elaborate reply :)

do you feel the same about changing the check to
(Py_SIZE(self) < (self->allocated >> 1)) ?

orenmn mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Aug 8, 2017

orenmn mannequin changed the title ~~a mostly useless check in list_extend()~~ a suboptimal check in list_extend() Aug 12, 2017

rhettinger closed this as completed Aug 13, 2017

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a suboptimal check in list_extend() #75330

a suboptimal check in list_extend() #75330

orenmn mannequin commented Aug 8, 2017

orenmn mannequin commented Aug 8, 2017

rhettinger commented Aug 9, 2017

rhettinger commented Aug 13, 2017

orenmn mannequin commented Aug 13, 2017

a suboptimal check in list_extend() #75330

a suboptimal check in list_extend() #75330

Comments

orenmn mannequin commented Aug 8, 2017

orenmn mannequin commented Aug 8, 2017

rhettinger commented Aug 9, 2017

rhettinger commented Aug 13, 2017

orenmn mannequin commented Aug 13, 2017