Relax list.sort()'s notion of "descending" runs #116554

tim-one · 2024-03-10T02:20:11Z

Feature or enhancement

Bug description:

A question on StackOverflow got me thinking about "the other end" of list.sort(): the start, with count_run().

https://stackoverflow.com/questions/78108792/

That descending runs have to be strictly descending was always annoying, but since we only have __lt__ to work with, it seemed too expensive to check for equality. As comments in the cited issue say, if we could tell when adjacent elements are equal, then runs of equal elements encountered during a descending run could be reversed in-place on the fly, and then their original order would be restored by a reverse-the-whole-run at the end.

Another minor annoyance is that, e.g., [3, 2, 1, 3, 4, 5] gives up after finding the [3, 2, 1] prefix. But after a descending run is reversed, it's all but free to go on to see whether it can be extended by a luckily ascending run immediately following.

So the idea is to make count_run() smarter, and have it do all reversing by itself.

A Python prototype convinced me this can be done without slowing processing of ascending runs(*), and even on randomized data it creates runs about 20% longer on average.

Which doesn't much matter on its own: on randomized data, all runs are almost certain to be artificially boosted to length MINRUN anyway. But starting with longer short runs would have beneficial effect anyway, by cutting the number of searches the binary insertion sort has to do.

Not a major thing regardless. The wins would be large on various new cases of high partial order, and on the brain relief by not having to explain the difference between the definitions of "ascending" and "descending" by appeal to implementation details.

(*) Consider [100] * 1_000_000 + [2]. The current code does a million compares to detect that this starts with a million-element ascending run, followed by a drop. The whole thing would be a "descending run" under the new definition, but just not worth it if it required 2 million compares to detect the long run of equalities.

But, happily, the current code for that doesn't need to change at all. After the million compares, it knows

a[0] <= a[1] <= ... <= a[-2] > a[-1]

At that point, it only needs one more compare, "is a[0] < a[-2]?". If so, we're done: at some point (& we don't care where) the sequence increased, so this cannot be a descending run. But if not, the prefix both starts and ends with 100, so all elements in the prefix must be equal: this is a descending run. We reverse the first million elements in-p1ace, extend the right end to include the trailing 2, then finally reverse the whole thing in-place.

Yes, that final result could be gotten with about half the stores using a suitable in-place array-rotation algorithm instead, but that's out of scope for this issue report. The approach sketched works no matter how many distinct blocks of all-equal runs are encountered.

CPython versions tested on:

3.12

Operating systems tested on:

No response

Linked PRs

GH-116554: Relax list.sort()'s notion of "descending" runs #116578

The text was updated successfully, but these errors were encountered:

tim-one · 2024-03-10T05:17:55Z

But, happily, the current code for that doesn't need to change at all. After the million compares, it knows

a[0] <= a[1] <= ... <= a[-2] > a[-1]

At that point, it only needs one more compare, "is a[0] < a[-2]?".

Alas, "one more" is huge in this context. We currently get out of count_run()'s ascending-sequence search as soon as a drop is found. And on randomized data, we've usually done only 3 compares at that point (having found an ascending run of length just 2). So "only one more" is a large percentage increase. Without the extra compare, we can't know whether the prefix is, or isn't, all equal. We only know that it's non-decreasing.

Offhand, I don't see an elegant way to worm around that. However, a number of hacks come to mind ☹️

pochmann3 · 2024-03-10T05:36:27Z

And on randomized data, we've usually done only 3 compares at that point (having found an ascending run of length just 2). So "only one more" is a large percentage increase.

Is it? If I'm not mistaken, random data currently takes ~120 comparisons for a minrun of 32, and ~300 comparisons for a minrun of 64. One more seems relatively insignificant.

Test code and results

Adapted from here.

import random

class Int:
    def __init__(self, value):
        self.value = value
    def __lt__(self, other):
        global comparisons
        comparisons += 1
        return self.value < other.value

for n in range(32, 65):
    a = random.sample(range(n), n)
    comparisons = 0
    sorted(map(Int, a))
    print(f'{n=}  =>  {comparisons} comparisons')

Attempt This Online!

Sample output:

n=32  =>  121 comparisons
n=33  =>  127 comparisons
n=34  =>  134 comparisons
n=35  =>  131 comparisons
n=36  =>  140 comparisons
n=37  =>  145 comparisons
n=38  =>  155 comparisons
n=39  =>  154 comparisons
n=40  =>  159 comparisons
n=41  =>  162 comparisons
n=42  =>  174 comparisons
n=43  =>  176 comparisons
n=44  =>  185 comparisons
n=45  =>  189 comparisons
n=46  =>  195 comparisons
n=47  =>  200 comparisons
n=48  =>  207 comparisons
n=49  =>  210 comparisons
n=50  =>  214 comparisons
n=51  =>  220 comparisons
n=52  =>  228 comparisons
n=53  =>  235 comparisons
n=54  =>  245 comparisons
n=55  =>  249 comparisons
n=56  =>  256 comparisons
n=57  =>  254 comparisons
n=58  =>  264 comparisons
n=59  =>  265 comparisons
n=60  =>  274 comparisons
n=61  =>  283 comparisons
n=62  =>  288 comparisons
n=63  =>  298 comparisons
n=64  =>  307 comparisons

tim-one · 2024-03-10T06:13:34Z

I'm not sorting at all - I'm just counting comparisons done by my pure-Python count_run() prototypes. So the percentages I'm talking about are with respect to a fraction of the total comparisons done by a sort.

Back o' the envelope: assuming a pathetic average run length of 2, count_run() will be applied 16 times to a block of 32 elements, to find the 32/2 = 16 runs. That does about 16*2 = 32 compares (more generally, for a run of length r, r-1 comparisons to establish that the r elements are "in order", and another to establish that the run has ended).. If I add another on each call, up to 48. That's a large percentage increase (50%), which, on the face of it, isn't delightful 😉.

In the full context of sorting, it may not matter. Or it may.

OTOH, it appears I'm getting about a 20% increase from the other gimmick all by itself: leave the definition of descending alone, and just go on to see whether a reversed descending run just happens to bump into a run that's ascending with respect to it.

That also seems surprisingly effective at increasing the average run length found,. But about 40% of the time it's tried, the compare it does says "nope! the descending run you found here cannot be extended even after reversing it", and so such a compare was wasted effort.

tim-one · 2024-03-10T07:39:42Z

OTOH ... when count_run() does find a short run (such as length 2), the main loop immediately forces it to length MINRUN, and count_run() will never be called on the elements absorbed by that. For randomized data, count_run() is essentially called ceiling(len(list) / MINRUN) times. Adding about len(list) / MINRUN calls worst case isn't really scary - but neither is it a priori attractive 😉.

pochmann3 · 2024-03-10T07:56:58Z

That's what I meant. Was wondering why you looked at count_runs that wouldn't happen :-)

tim-one · 2024-03-10T08:00:06Z

I'm afraid it's a persistent blind spot for me: I never liked forcing runs to MINRUN - felt like a hack, a failure to find a more elegant approach - so I tend to forget that's what the code actually does 😄

…hon#116578) * pythonGH-116554: Relax list.sort()'s notion of "descending" run Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal. In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run). While these have been in the back of my mind for years, a question on StackOverflow pushed it to action: https://stackoverflow.com/questions/78108792/ They were wondering why it took about 4x longer to sort a list like: [999_999, 999_999, ..., 2, 2, 1, 1, 0, 0] than "similar" lists. Of course that runs very much faster after this patch. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>

pochmann3 · 2024-03-13T06:33:13Z

The new code, in the descending-phase, checks for equality eagerly: a failed IF_NEXT_SMALLER is immediately followed by IF_NEXT_LARGER. For example ([1] + [0] * 10**6).sort() previously took 1000234 comparisons and now takes 1999999. Is that ok?

I haven't fully thought it through, but can we delay the equality check there as well? That is, again keep going as long as you have <=, and at the end compare first (of that tail of potential equals) and last. If first>=last, then reverse that run of equals and keep going. If first<last, then output both the descending run (after reversing it) and the ascending run (so count_run would somehow output two runs and that would need to be handled somehow - or I guess it could merge them and just output one).

…hon#116578) * pythonGH-116554: Relax list.sort()'s notion of "descending" run Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal. In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run). While these have been in the back of my mind for years, a question on StackOverflow pushed it to action: https://stackoverflow.com/questions/78108792/ They were wondering why it took about 4x longer to sort a list like: [999_999, 999_999, ..., 2, 2, 1, 1, 0, 0] than "similar" lists. Of course that runs very much faster after this patch. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>

tim-one added type-feature A feature request or enhancement performance Performance or resource usage stdlib Python modules in the Lib dir labels Mar 10, 2024

tim-one changed the title ~~Relax lsit.sort()'s notion of "descending" runs~~ Relax list.sort()'s notion of "descending" runs Mar 10, 2024

tim-one added a commit to tim-one/cpython that referenced this issue Mar 10, 2024

pythonGH-116554: Relax list.sort()'s notion of "descending" run

1eb8ba1

bedevere-app bot mentioned this issue Mar 10, 2024

GH-116554: Relax list.sort()'s notion of "descending" runs #116578

Merged

tim-one closed this as completed Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax list.sort()'s notion of "descending" runs #116554

Relax list.sort()'s notion of "descending" runs #116554

tim-one commented Mar 10, 2024 •

edited by bedevere-app bot

tim-one commented Mar 10, 2024

pochmann3 commented Mar 10, 2024 •

edited

tim-one commented Mar 10, 2024

tim-one commented Mar 10, 2024

pochmann3 commented Mar 10, 2024

tim-one commented Mar 10, 2024

pochmann3 commented Mar 13, 2024 •

edited

Relax list.sort()'s notion of "descending" runs #116554

Relax list.sort()'s notion of "descending" runs #116554

Comments

tim-one commented Mar 10, 2024 • edited by bedevere-app bot

Feature or enhancement

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

tim-one commented Mar 10, 2024

pochmann3 commented Mar 10, 2024 • edited

tim-one commented Mar 10, 2024

tim-one commented Mar 10, 2024

pochmann3 commented Mar 10, 2024

tim-one commented Mar 10, 2024

pochmann3 commented Mar 13, 2024 • edited

tim-one commented Mar 10, 2024 •

edited by bedevere-app bot

pochmann3 commented Mar 10, 2024 •

edited

pochmann3 commented Mar 13, 2024 •

edited