Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sorting stable #5236

Closed
wants to merge 9 commits into from
Closed

Make sorting stable #5236

wants to merge 9 commits into from

Conversation

nikic
Copy link
Member

@nikic nikic commented Mar 4, 2020

RFC: https://wiki.php.net/rfc/stable_sorting

This makes the array sorting functions stable (mostly relevant for usort). This does not actually change the core sorting algorithm to be stable (it is still a quick sort). Instead, we store the position of the original elements and use that as a fallback comparison criterion. Due to the way hashtable sorting works, we can do this very cheaply, and without additional memory overhead.

i = ht->nNumUsed;
/* Store original order of elements in extra space to allow stable sorting. */
for (i = 0; i < ht->nNumUsed; i++) {
Z_EXTRA(ht->arData[i].val) = i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. :)

@nikic nikic force-pushed the stable-sort branch 4 times, most recently from ec7000b to a19dabc Compare March 11, 2020 10:13
@mvorisek
Copy link
Contributor

@nikic Huge thumb up for addressing this! Did you also considered using more and more popular Timsort algorithm? It is stable by default with the same complexity. Now also used by Java and C#.

@nikic
Copy link
Member Author

nikic commented May 12, 2020

@mvorisek Switching to Timsort is a more intrusive change. It's a different algorithm (hybrid merge rather than hybrid quick) with different performance characteristics (and more importantly, memory usage characteristics).

If you want to evaluate Timsort usage in PHP, I'm definitely interested in results.

@nikic nikic added the RFC label May 12, 2020
@mvorisek
Copy link
Contributor

@mvorisek Switching to Timsort is a more intrusive change. It's a different algorithm (hybrid merge rather than hybrid quick) with different performance characteristics (and more importantly, memory usage characteristics).

If you want to evaluate Timsort usage in PHP, I'm definitely interested in results.

I would love to help, but my C skills are limited. I have done now more reseach on this:

Usage: default also in JS/V8, Python

Benchmarks:

Memory complexity is O(n), more exactly up too N/2 references/primitives where N is the number of elements.

@nikic
Copy link
Member Author

nikic commented May 12, 2020

@mvorisek I gave it a quick try here: #5559 As expected, different performance characteristics. It does worse than the current implementation on random data and does better on already sorted data. I don't think I will pursue this myself.

if (!ARRAYG(compare_deprecation_thrown)) {
php_error_docref(NULL, E_DEPRECATED,
"Returning bool from comparison function is deprecated, "
"return an integer less than, equal to, or greater than zero");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an integer less than, equal to, or greater than zero

Isn't that a little redundant? (as it basically includes all integers). Alternatively we could say "-1, 0 or 1".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is intentionally redundant to clarify meaning of the value. I used "-1, 0 or 1" here initially, but as @hikari-no-yume pointed out on the mailing list, there is no actual requirement to return one of those here, and some functions like strcmp() which are reasonable to use in this context don't return -1, 0, 1 specifically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a better way to word it, but there is a risk of the message being too long…

@php-pulls php-pulls closed this in e12b9df Jun 25, 2020
@carusogabriel carusogabriel added this to the PHP 8.0 milestone Jun 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants