Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a suboptimal check in list_extend() #75330

Closed
orenmn mannequin opened this issue Aug 8, 2017 · 4 comments
Closed

a suboptimal check in list_extend() #75330

orenmn mannequin opened this issue Aug 8, 2017 · 4 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@orenmn
Copy link
Mannequin

orenmn mannequin commented Aug 8, 2017

BPO 31147
Nosy @rhettinger, @orenmn

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2017-08-13.05:12:20.432>
created_at = <Date 2017-08-08.15:30:36.122>
labels = ['interpreter-core', '3.7', 'performance']
title = 'a suboptimal check in list_extend()'
updated_at = <Date 2017-08-13.19:14:51.449>
user = 'https://github.com/orenmn'

bugs.python.org fields:

activity = <Date 2017-08-13.19:14:51.449>
actor = 'Oren Milman'
assignee = 'none'
closed = True
closed_date = <Date 2017-08-13.05:12:20.432>
closer = 'rhettinger'
components = ['Interpreter Core']
creation = <Date 2017-08-08.15:30:36.122>
creator = 'Oren Milman'
dependencies = []
files = []
hgrepos = []
issue_num = 31147
keywords = []
message_count = 4.0
messages = ['299933', '299967', '300211', '300229']
nosy_count = 2.0
nosy_names = ['rhettinger', 'Oren Milman']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue31147'
versions = ['Python 3.7']

@orenmn
Copy link
Mannequin Author

orenmn mannequin commented Aug 8, 2017

in listobject.c, in case list_extend() receives an 'iterable' which isn't 'self'
nor a tuple nor a list, we have the following (heavily edited for brevity):
mn = Py_SIZE(self) + PyObject_LengthHint(iterable);
list_resize(self, mn);

... // self is extended to also include the elements of 'iterable'.
    // (*)
    if (Py_SIZE(self) < self->allocated) {
        list_resize(self, Py_SIZE(self));
    }

IMHO, the condition in (*) is mostly useless, for two reasons:

  1. the list_resize() which is called in () does nothing in case
    (Py_SIZE(self) >= (self->allocated >> 1)) is true. In particular, this call
    to list_resize() would have done nothing if it had been called while the
    condition in (
    ) was false.

  2. the condition in (*) is false only in the following cases:

    • list_resize(self, mn) caused
      (self->allocated == Py_SIZE(self) + actual_length_of_iterable) to be
      true. e.g.:
      Py_SIZE(self) = 58 and PyObject_LengthHint(iterable) == 8 and
      actual_length_of_iterable == 22
      (because 66 + 66 // 8 + 6 == 80 == 58 + 22).
    • list_resize(self, mn) caused
      (self->allocated < Py_SIZE(self) + actual_length_of_iterable), which
      sometime later caused list_extend() to call app1(), which called
      list_resize(), which caused
      (self->allocated == Py_SIZE(self) + actual_length_of_iterable) to be true.
      e.g.:
      Py_SIZE(self) == 58 and PyObject_LengthHint(iterable) == 8 and
      actual_length_of_iterable == 39
      (because 66 + 66 // 8 + 6 == 80 and
      81 + 81 // 8 + 6 == 97 == 58 + 39)

    so ISTM that the condition is only rarely false, especially as
    PyObject_LengthHint(iterable) >= actual_length_of_iterable is usually true
    (i guess).

Thus, i believe we should either change the condition in (*) to
(Py_SIZE(self) < (self->allocated >> 1)), or remove it (and always call
list_resize()).

(note that i ignored error cases, as ISTM they are quite irrelevant to the
issue.)

@orenmn orenmn mannequin added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Aug 8, 2017
@rhettinger
Copy link
Contributor

-0 The test "Py_SIZE(self) < self->allocated" is very cheap so I don't mind leaving it in to handle the rare cases (or to prevent future bugs if the list_resize logic ever changes).

@orenmn orenmn mannequin changed the title a mostly useless check in list_extend() a suboptimal check in list_extend() Aug 12, 2017
@rhettinger
Copy link
Contributor

I'm concerned that is would provide near zero benefit but would set us up for future bugs by tightly coupling two ideas that I would like to be able to reason about separately.

The logic for list.extend() makes a potential overallocation, fills the list, and fixes-up the overallocation if needed (I want that logic should all stay together). And list_resize() should have its own separate logic as well.

For a course grained operation like extend(), I really don't mind having a code path that makes a relatively cheap test that only occasionally useful.

So, thank you for the suggestion, but I'm going to decline on the grounds that 1) the current code is sometimes useful, 2) it is always cheap, 3) that removing it is odds with the principles of loose coupling and high cohesion, 4) it creates a risk of introducing unnoticed errors in the future the list_resize logic were to change, and 5) no one else has stepped forward to advocate this patch.

@orenmn
Copy link
Mannequin Author

orenmn mannequin commented Aug 13, 2017

thank you for the elaborate reply :)

do you feel the same about changing the check to
(Py_SIZE(self) < (self->allocated >> 1)) ?

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

1 participant