New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax list.sort()'s notion of "descending" runs #116554
Comments
Alas, "one more" is huge in this context. We currently get out of Offhand, I don't see an elegant way to worm around that. However, a number of hacks come to mind |
Is it? If I'm not mistaken, random data currently takes ~120 comparisons for a minrun of 32, and ~300 comparisons for a minrun of 64. One more seems relatively insignificant. Test code and resultsAdapted from here. import random
class Int:
def __init__(self, value):
self.value = value
def __lt__(self, other):
global comparisons
comparisons += 1
return self.value < other.value
for n in range(32, 65):
a = random.sample(range(n), n)
comparisons = 0
sorted(map(Int, a))
print(f'{n=} => {comparisons} comparisons') Sample output:
|
I'm not sorting at all - I'm just counting comparisons done by my pure-Python Back o' the envelope: assuming a pathetic average run length of 2, In the full context of sorting, it may not matter. Or it may. OTOH, it appears I'm getting about a 20% increase from the other gimmick all by itself: leave the definition of descending alone, and just go on to see whether a reversed descending run just happens to bump into a run that's ascending with respect to it. That also seems surprisingly effective at increasing the average run length found,. But about 40% of the time it's tried, the compare it does says "nope! the descending run you found here cannot be extended even after reversing it", and so such a compare was wasted effort. |
OTOH ... when |
That's what I meant. Was wondering why you looked at |
I'm afraid it's a persistent blind spot for me: I never liked forcing runs to MINRUN - felt like a hack, a failure to find a more elegant approach - so I tend to forget that's what the code actually does 😄 |
…hon#116578) * pythonGH-116554: Relax list.sort()'s notion of "descending" run Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal. In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run). While these have been in the back of my mind for years, a question on StackOverflow pushed it to action: https://stackoverflow.com/questions/78108792/ They were wondering why it took about 4x longer to sort a list like: [999_999, 999_999, ..., 2, 2, 1, 1, 0, 0] than "similar" lists. Of course that runs very much faster after this patch. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
The new code, in the descending-phase, checks for equality eagerly: a failed IF_NEXT_SMALLER is immediately followed by IF_NEXT_LARGER. For example I haven't fully thought it through, but can we delay the equality check there as well? That is, again keep going as long as you have |
…hon#116578) * pythonGH-116554: Relax list.sort()'s notion of "descending" run Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal. In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run). While these have been in the back of my mind for years, a question on StackOverflow pushed it to action: https://stackoverflow.com/questions/78108792/ They were wondering why it took about 4x longer to sort a list like: [999_999, 999_999, ..., 2, 2, 1, 1, 0, 0] than "similar" lists. Of course that runs very much faster after this patch. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
…hon#116578) * pythonGH-116554: Relax list.sort()'s notion of "descending" run Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal. In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run). While these have been in the back of my mind for years, a question on StackOverflow pushed it to action: https://stackoverflow.com/questions/78108792/ They were wondering why it took about 4x longer to sort a list like: [999_999, 999_999, ..., 2, 2, 1, 1, 0, 0] than "similar" lists. Of course that runs very much faster after this patch. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
…hon#116578) * pythonGH-116554: Relax list.sort()'s notion of "descending" run Rewrote `count_run()` so that sub-runs of equal elements no longer end a descending run. Both ascending and descending runs can have arbitrarily many sub-runs of arbitrarily many equal elements now. This is tricky, because we only use ``<`` comparisons, so checking for equality doesn't come "for free". Surprisingly, it turned out there's a very cheap (one comparison) way to determine whether an ascending run consisted of all-equal elements. That sealed the deal. In addition, after a descending run is reversed in-place, we now go on to see whether it can be extended by an ascending run that just happens to be adjacent. This succeeds in finding at least one additional element to append about half the time, and so appears to more than repay its cost (the savings come from getting to skip a binary search, when a short run is artificially forced to length MIINRUN later, for each new element `count_run()` can add to the initial run). While these have been in the back of my mind for years, a question on StackOverflow pushed it to action: https://stackoverflow.com/questions/78108792/ They were wondering why it took about 4x longer to sort a list like: [999_999, 999_999, ..., 2, 2, 1, 1, 0, 0] than "similar" lists. Of course that runs very much faster after this patch. Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com> Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Feature or enhancement
Bug description:
A question on StackOverflow got me thinking about "the other end" of
list.sort()
: the start, withcount_run()
.https://stackoverflow.com/questions/78108792/
That descending runs have to be strictly descending was always annoying, but since we only have
__lt__
to work with, it seemed too expensive to check for equality. As comments in the cited issue say, if we could tell when adjacent elements are equal, then runs of equal elements encountered during a descending run could be reversed in-place on the fly, and then their original order would be restored by a reverse-the-whole-run at the end.Another minor annoyance is that, e.g.,
[3, 2, 1, 3, 4, 5]
gives up after finding the[3, 2, 1]
prefix. But after a descending run is reversed, it's all but free to go on to see whether it can be extended by a luckily ascending run immediately following.So the idea is to make
count_run()
smarter, and have it do all reversing by itself.A Python prototype convinced me this can be done without slowing processing of ascending runs(*), and even on randomized data it creates runs about 20% longer on average.
Which doesn't much matter on its own: on randomized data, all runs are almost certain to be artificially boosted to length MINRUN anyway. But starting with longer short runs would have beneficial effect anyway, by cutting the number of searches the binary insertion sort has to do.
Not a major thing regardless. The wins would be large on various new cases of high partial order, and on the brain relief by not having to explain the difference between the definitions of "ascending" and "descending" by appeal to implementation details.
(*) Consider
[100] * 1_000_000 + [2]
. The current code does a million compares to detect that this starts with a million-element ascending run, followed by a drop. The whole thing would be a "descending run" under the new definition, but just not worth it if it required 2 million compares to detect the long run of equalities.But, happily, the current code for that doesn't need to change at all. After the million compares, it knows
a[0] <= a[1] <= ... <= a[-2] > a[-1]
At that point, it only needs one more compare, "is a[0] < a[-2]?". If so, we're done: at some point (& we don't care where) the sequence increased, so this cannot be a descending run. But if not, the prefix both starts and ends with 100, so all elements in the prefix must be equal: this is a descending run. We reverse the first million elements in-p1ace, extend the right end to include the trailing 2, then finally reverse the whole thing in-place.
Yes, that final result could be gotten with about half the stores using a suitable in-place array-rotation algorithm instead, but that's out of scope for this issue report. The approach sketched works no matter how many distinct blocks of all-equal runs are encountered.
CPython versions tested on:
3.12
Operating systems tested on:
No response
Linked PRs
The text was updated successfully, but these errors were encountered: