-
Couldn't load subscription status.
- Fork 8k
Description
Description
The following code:
<?php
$units = ['5', '10', '5', '3A', '5', '5'];
$unique = array_unique($units, SORT_REGULAR);
print_r($unique);Resulted in this output:
Array
(
[0] => 5
[1] => 10
[3] => 3A
[4] => 5
)
But I expected this output instead:
Array
(
[0] => 5
[1] => 10
[3] => 3A
)
Demonstrations:
- simple array of strings: https://3v4l.org/M5lcP
- array of objects: https://3v4l.org/5kr1t
- array of arrays: https://3v4l.org/0HVoV
PRs in progress:
- Fix GH-20262: array_unique() SORT_REGULAR fails to deduplicate with mixed strings #20273 :
This fixes the issue with an array of scalars and provides a performance boost to boot. - Fix #20262: SORT_REGULAR transitivity violation with mixed numeric/non-numeric strings #20305 :
This PR is an attempt to get to the core of the issue, the transitivity violation, which conveniently resolves the knock-on effect SORT_REGULAR has on other functions likesort()(see https://3v4l.org/lYP9Q).
Root Cause
From analyzing PHP source code (ext/standard/array.c, Zend/zend_operators.c):
The algorithm:
- Sort array using
zendi_smart_strcmp()which callsis_numeric_string_ex() - Walk through sorted array comparing only adjacent elements
- Delete duplicates from original array
The bug:
is_numeric_string_ex() extracts leading numeric portions:
"3A"→ extracts3"5"→ extracts5"10"→ extracts10- Compares numerically:
3 < 5 < 10
However, unstable sort produces:
Sorted: ["5", "10", "10", "3A", "5", "5"]
The "3A" (numeric value 3) ends up AFTER "10" instead of before "5", separating the duplicate "5" values.
The deduplication walks through comparing adjacent elements:
lastkept = position_0; // "5"
position_1 "10" != "5" → keep, lastkept = position_1
position_2 "10" == "10" → delete
position_3 "3A" != "10" → keep, lastkept = position_3
position_4 "5" != "3A" → keep ← Bug! Never compared to position_0
position_5 "5" == "5" → deleteThe flaw: The algorithm only compares with lastkept (last unique value), not with all previous values. Position 4's "5" is never compared back to position 0's "5".
Source files:
ext/standard/array.c-PHP_FUNCTION(array_unique)Zend/zend_operators.c-zendi_smart_strcmp(),is_numeric_string_ex()
Comparison with SORT_STRING
<?php
$units = ['5', '10', '5', '3A', '5', '5'];
echo count(array_unique($units, SORT_REGULAR)) . "\n"; // 4 ✗ Wrong
echo count(array_unique($units, SORT_STRING)) . "\n"; // 3 ✓ CorrectSORT_STRING uses lexical comparison without numeric extraction, so duplicates stay grouped.
Workaround
For simple arrays of scalar values, you can use array_unique with default SORT_STRING flag.
<?php
$unique = array_unique($array, SORT_STRING);For arrays or objects.
$uniqueAddr = [];
foreach ($addresses as $addr) {
if (! in_array($addr, $uniqueAddr)) {
$uniqueAddr[] = $addr;
}
}PHP Version
PHP 8.4.13 (cli) (built: Sep 26 2025 00:45:36) (NTS clang 15.0.0)
Copyright (c) The PHP Group
Built by Laravel Herd
Zend Engine v4.4.13, Copyright (c) Zend Technologies
with Zend OPcache v8.4.13, Copyright (c), by Zend Technologies