Skip to content

Conversation

@jmarble
Copy link

@jmarble jmarble commented Oct 23, 2025

The Bug

array_unique() with SORT_REGULAR was failing to remove duplicate numeric strings when mixed with alphanumeric strings:

$units = ['5', '10', '3A', '5'];
array_unique($units, SORT_REGULAR);
// Before: ['5', '10', '3A', '5'] - duplicate '5' not removed 
// After:  ['5', '10', '3A']      - works correctly 

Root Cause

Non-transitive comparisons in SORT_REGULAR (where '5' == 5 and 5 != '5abc' but '5' < '5abc') broke the sort-based algorithm's assumption that sorting would group duplicates adjacently.

Solution

Implemented type-optimized hybrid approach for SORT_REGULAR:

Three-tier algorithm selection:

  1. Integer-only arrays: O(N) hash table for optimal performance
  2. Arrays/Objects present: O(N log N) sorting (works correctly for complex types)
  3. Simple mixed types: O(N) hash bucketing with dynamic sizing (64-16,384 buckets)

Key features:

  • Preserves full SORT_REGULAR type coercion semantics (1 == '1' == true)
  • Dynamic bucket sizing for memory efficiency
  • Comprehensive exception handling prevents memory leaks
  • Integer overflow protection on all allocation paths

Security Hardening

  • Eliminates Hash-DoS vulnerabilities through dynamic bucketing and type-based routing
  • Integer overflow checks prevent heap corruption
  • Exception-safe on all code paths
  • Proper resource handling with identity-based hashing

Performance

Improved performance across all data types compared to PHP 8.4.13 while fixing correctness issues.

Backward Compatibility

  • 100% BC compatible - preserves all existing coercion behavior
  • Uses identical comparison functions as current implementation
  • All existing tests pass (834/834)

Tests Added

  • gh20262.phpt - Minimal regression test for the bug
  • array_unique_variation_sort_regular.phpt - Comprehensive SORT_REGULAR behavior coverage (16 scenarios)

Closes #20262

@jmarble jmarble requested a review from bukka as a code owner October 23, 2025 23:06
@jmarble jmarble marked this pull request as draft October 23, 2025 23:35
@jmarble jmarble force-pushed the gh20262-array-unique-sort-regular branch from 4102fe6 to f60a45f Compare October 23, 2025 23:55
…h mixed strings

array_unique() with SORT_REGULAR was failing to remove duplicate numeric
strings when mixed with alphanumeric strings due to non-transitive
comparison issues in the sort-based algorithm.

Implemented hash-bucketing optimization for SORT_REGULAR that preserves
full type coercion semantics while improving performance from O(n²) to O(n).

Closes phpGH-20262
@jmarble jmarble force-pushed the gh20262-array-unique-sort-regular branch from f60a45f to c4fc6a9 Compare October 24, 2025 01:11
@jmarble jmarble marked this pull request as ready for review October 24, 2025 01:48
@jmarble jmarble marked this pull request as draft October 24, 2025 04:38
- Eliminate Hash-DoS and integer overflow vulnerabilities
- Add exception handling to prevent memory leaks
- Implement type-specific optimizations (hash/sort/bucket)
- Dynamic bucket sizing for memory efficiency
- Fix resource handling
Use ZVAL_DEREF macro to dereference values before comparing in the
array_unique function. Also add a new test case to verify the behavior.
@jmarble jmarble requested review from devnexen and nielsdos October 25, 2025 03:20
@jmarble jmarble requested a review from devnexen October 25, 2025 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_unique() with SORT_REGULAR returns duplicate values

3 participants