Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax list.sort()'s notion of "descending" runs #116554

Closed
tim-one opened this issue Mar 10, 2024 · 7 comments
Closed

Relax list.sort()'s notion of "descending" runs #116554

tim-one opened this issue Mar 10, 2024 · 7 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@tim-one
Copy link
Member

tim-one commented Mar 10, 2024

Feature or enhancement

Bug description:

A question on StackOverflow got me thinking about "the other end" of list.sort(): the start, with count_run().

https://stackoverflow.com/questions/78108792/

That descending runs have to be strictly descending was always annoying, but since we only have __lt__ to work with, it seemed too expensive to check for equality. As comments in the cited issue say, if we could tell when adjacent elements are equal, then runs of equal elements encountered during a descending run could be reversed in-place on the fly, and then their original order would be restored by a reverse-the-whole-run at the end.

Another minor annoyance is that, e.g., [3, 2, 1, 3, 4, 5] gives up after finding the [3, 2, 1] prefix. But after a descending run is reversed, it's all but free to go on to see whether it can be extended by a luckily ascending run immediately following.

So the idea is to make count_run() smarter, and have it do all reversing by itself.

A Python prototype convinced me this can be done without slowing processing of ascending runs(*), and even on randomized data it creates runs about 20% longer on average.

Which doesn't much matter on its own: on randomized data, all runs are almost certain to be artificially boosted to length MINRUN anyway. But starting with longer short runs would have beneficial effect anyway, by cutting the number of searches the binary insertion sort has to do.

Not a major thing regardless. The wins would be large on various new cases of high partial order, and on the brain relief by not having to explain the difference between the definitions of "ascending" and "descending" by appeal to implementation details.

(*) Consider [100] * 1_000_000 + [2]. The current code does a million compares to detect that this starts with a million-element ascending run, followed by a drop. The whole thing would be a "descending run" under the new definition, but just not worth it if it required 2 million compares to detect the long run of equalities.

But, happily, the current code for that doesn't need to change at all. After the million compares, it knows

a[0] <= a[1] <= ... <= a[-2] > a[-1]

At that point, it only needs one more compare, "is a[0] < a[-2]?". If so, we're done: at some point (& we don't care where) the sequence increased, so this cannot be a descending run. But if not, the prefix both starts and ends with 100, so all elements in the prefix must be equal: this is a descending run. We reverse the first million elements in-p1ace, extend the right end to include the trailing 2, then finally reverse the whole thing in-place.

Yes, that final result could be gotten with about half the stores using a suitable in-place array-rotation algorithm instead, but that's out of scope for this issue report. The approach sketched works no matter how many distinct blocks of all-equal runs are encountered.

CPython versions tested on:

3.12

Operating systems tested on:

No response

Linked PRs

@tim-one tim-one added type-feature A feature request or enhancement performance Performance or resource usage stdlib Python modules in the Lib dir labels Mar 10, 2024
@tim-one tim-one changed the title Relax lsit.sort()'s notion of "descending" runs Relax list.sort()'s notion of "descending" runs Mar 10, 2024
@tim-one
Copy link
Member Author

tim-one commented Mar 10, 2024

But, happily, the current code for that doesn't need to change at all. After the million compares, it knows

a[0] <= a[1] <= ... <= a[-2] > a[-1]

At that point, it only needs one more compare, "is a[0] < a[-2]?".

Alas, "one more" is huge in this context. We currently get out of count_run()'s ascending-sequence search as soon as a drop is found. And on randomized data, we've usually done only 3 compares at that point (having found an ascending run of length just 2). So "only one more" is a large percentage increase. Without the extra compare, we can't know whether the prefix is, or isn't, all equal. We only know that it's non-decreasing.

Offhand, I don't see an elegant way to worm around that. However, a number of hacks come to mind ☹️

@pochmann3
Copy link
Contributor

pochmann3 commented Mar 10, 2024

And on randomized data, we've usually done only 3 compares at that point (having found an ascending run of length just 2). So "only one more" is a large percentage increase.

Is it? If I'm not mistaken, random data currently takes ~120 comparisons for a minrun of 32, and ~300 comparisons for a minrun of 64. One more seems relatively insignificant.

Test code and results

Adapted from here.

import random

class Int:
    def __init__(self, value):
        self.value = value
    def __lt__(self, other):
        global comparisons
        comparisons += 1
        return self.value < other.value

for n in range(32, 65):
    a = random.sample(range(n), n)
    comparisons = 0
    sorted(map(Int, a))
    print(f'{n=}  =>  {comparisons} comparisons')

Attempt This Online!

Sample output:

n=32  =>  121 comparisons
n=33  =>  127 comparisons
n=34  =>  134 comparisons
n=35  =>  131 comparisons
n=36  =>  140 comparisons
n=37  =>  145 comparisons
n=38  =>  155 comparisons
n=39  =>  154 comparisons
n=40  =>  159 comparisons
n=41  =>  162 comparisons
n=42  =>  174 comparisons
n=43  =>  176 comparisons
n=44  =>  185 comparisons
n=45  =>  189 comparisons
n=46  =>  195 comparisons
n=47  =>  200 comparisons
n=48  =>  207 comparisons
n=49  =>  210 comparisons
n=50  =>  214 comparisons
n=51  =>  220 comparisons
n=52  =>  228 comparisons
n=53  =>  235 comparisons
n=54  =>  245 comparisons
n=55  =>  249 comparisons
n=56  =>  256 comparisons
n=57  =>  254 comparisons
n=58  =>  264 comparisons
n=59  =>  265 comparisons
n=60  =>  274 comparisons
n=61  =>  283 comparisons
n=62  =>  288 comparisons
n=63  =>  298 comparisons
n=64  =>  307 comparisons

@tim-one
Copy link
Member Author

tim-one commented Mar 10, 2024

I'm not sorting at all - I'm just counting comparisons done by my pure-Python count_run() prototypes. So the percentages I'm talking about are with respect to a fraction of the total comparisons done by a sort.

Back o' the envelope: assuming a pathetic average run length of 2, count_run() will be applied 16 times to a block of 32 elements, to find the 32/2 = 16 runs. That does about 16*2 = 32 compares (more generally, for a run of length r, r-1 comparisons to establish that the r elements are "in order", and another to establish that the run has ended).. If I add another on each call, up to 48. That's a large percentage increase (50%), which, on the face of it, isn't delightful 😉.

In the full context of sorting, it may not matter. Or it may.

OTOH, it appears I'm getting about a 20% increase from the other gimmick all by itself: leave the definition of descending alone, and just go on to see whether a reversed descending run just happens to bump into a run that's ascending with respect to it.

That also seems surprisingly effective at increasing the average run length found,. But about 40% of the time it's tried, the compare it does says "nope! the descending run you found here cannot be extended even after reversing it", and so such a compare was wasted effort.

@tim-one
Copy link
Member Author

tim-one commented Mar 10, 2024

OTOH ... when count_run() does find a short run (such as length 2), the main loop immediately forces it to length MINRUN, and count_run() will never be called on the elements absorbed by that. For randomized data, count_run() is essentially called ceiling(len(list) / MINRUN) times. Adding about len(list) / MINRUN calls worst case isn't really scary - but neither is it a priori attractive 😉.

@pochmann3
Copy link
Contributor

That's what I meant. Was wondering why you looked at count_runs that wouldn't happen :-)

@tim-one
Copy link
Member Author

tim-one commented Mar 10, 2024

I'm afraid it's a persistent blind spot for me: I never liked forcing runs to MINRUN - felt like a hack, a failure to find a more elegant approach - so I tend to forget that's what the code actually does 😄

tim-one added a commit to tim-one/cpython that referenced this issue Mar 10, 2024
@tim-one tim-one closed this as completed Mar 13, 2024
tim-one added a commit to tim-one/cpython that referenced this issue Mar 13, 2024
…hon#116578)

* pythonGH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.

In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).

While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
@pochmann3
Copy link
Contributor

pochmann3 commented Mar 13, 2024

The new code, in the descending-phase, checks for equality eagerly: a failed IF_NEXT_SMALLER is immediately followed by IF_NEXT_LARGER. For example ([1] + [0] * 10**6).sort() previously took 1000234 comparisons and now takes 1999999. Is that ok?

I haven't fully thought it through, but can we delay the equality check there as well? That is, again keep going as long as you have <=, and at the end compare first (of that tail of potential equals) and last. If first>=last, then reverse that run of equals and keep going. If first<last, then output both the descending run (after reversing it) and the ascending run (so count_run would somehow output two runs and that would need to be handled somehow - or I guess it could merge them and just output one).

vstinner pushed a commit to vstinner/cpython that referenced this issue Mar 20, 2024
…hon#116578)

* pythonGH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.

In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).

While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…hon#116578)

* pythonGH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.

In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).

While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…hon#116578)

* pythonGH-116554: Relax list.sort()'s notion of "descending" run

Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal.

In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run).

While these have been in the back of my mind for years, a question on StackOverflow pushed it to action:

https://stackoverflow.com/questions/78108792/

They were wondering why it took about 4x longer to sort a list like:

[999_999, 999_999, ..., 2, 2, 1, 1, 0, 0]

than "similar" lists. Of course that runs very much faster after this patch.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants