Skip to content

Conversation

@jmarble
Copy link

@jmarble jmarble commented Oct 27, 2025

Problem

array_unique() with SORT_REGULAR was missing duplicates, and sort() was incorrectly ordering equal values due to a transitivity violation in string comparison.

When comparing numeric strings (like '5', '10') with non-numeric strings (like '3A'), the comparison function fell back to lexicographic comparison, which violated transitivity:

  • '5' < '10' (numeric: 5 < 10)
  • '10' < '3A' (lexicographic: '1' < '3')
  • '5' > '3A' (lexicographic: '5' > '3') Violates transitivity

This caused equal values to not be grouped together in sorted arrays.

Solution

Fixed php_array_data_compare_unstable_i() in ext/standard/array.c to consistently order non-numeric strings before numeric strings when one is numeric and one is not, ensuring transitivity for all SORT_REGULAR operations.

Important: This fix does not change the behavior of comparison operators like <=>, maintaining backward compatibility. The fix only affects sorting and array operations with SORT_REGULAR, similar to how enum comparison is handled specially for sorting. This fix also does not correct the underlying issue as it affects object and nested arrays.

Test Updates Required

The following tests need their expected output updated to reflect the new transitive sort order:

  • ext/standard/tests/array/sort/array_multisort_basic1.phpt
  • ext/standard/tests/array/sort/array_multisort_variation5.phpt
  • ext/standard/tests/array/sort/array_multisort_variation6.phpt

Tests marked as "(OK to fail as result is unpredictable)" (arsort_variation11, asort_variation11, sort_variation11, rsort_variation11) should maybe still be updated for consistency.

Fixed transitivity violation in SORT_REGULAR comparison that caused
sort() and array_unique() to produce incorrect results with mixed
numeric and non-numeric strings.

The fix adds special handling in php_array_data_compare_unstable_i() to
consistently order non-numeric strings before numeric strings when one
is numeric and one is not. This ensures transitivity for sorting
operations.

Importantly, this fix does NOT change the behavior of comparison
operators like <=>, maintaining backward compatibility. The fix only
affects sorting and array_unique() operations with SORT_REGULAR, similar
to how enum comparison is handled specially for sorting.
@@ -0,0 +1,99 @@
--TEST--
GH-20262: array_unique() with SORT_REGULAR fails to identify duplicates with mixed numeric/alphanumeric strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good idea to verify this PR using naive impl. using 2 loops and == operator and random data.

Copy link
Author

@jmarble jmarble Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a simple test with a small amount of random data, looped 1k times:
https://gist.github.com/jmarble/c7debbadbca5f2dc21a160f0a48a65af

@jmarble
Copy link
Author

jmarble commented Oct 27, 2025

I tired to take this strategy one step further, to apply a similar fix for arrays and objects -- I was able to fix the transitivity bug with arrays but I could not find a solution for objects.

We could add a context parameter to zendi_smart_strcmp() to enable a "transitivity mode" but that seems a bit risky, and probably the reason php_array_data_compare_unstable_i() exists.

@jmarble
Copy link
Author

jmarble commented Oct 28, 2025

Closing in favor of #20315 which provides a complete solution that fixes transitivity for all mixed types (scalars, nested arrays, and objects).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants