-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Description
Laravel Version
12.35.1
PHP Version
8.4.13
Database Driver & Version
No response
Description
Collection::unique() and Collection::duplicates() produce incorrect results when working with string data containing numeric-looking values. The methods either lose unique values or fail to remove duplicates.
This is a correctness bug causing data loss in production applications. The workaround (uniqueStrict()/duplicatesStrict()) is correct but 50-200x slower, making it unusable for large datasets.
Real-world impact:
- CRM contact deduplication (phone numbers from multiple sources)
- Product SKU processing (CSV imports with numeric/variant SKUs)
- API request tracking (finding unique failed requests)
- Duplicate detection produces false positives
Root cause: PHP's array_unique() with SORT_REGULAR uses loose comparison for numeric strings ('+15015551234' == '15015551234' → true) and an unstable sort with adjacent-only comparison, causing duplicates separated during sort to never be compared.
Related:
Previous PR attempts to provide a performant solution to the issue were closed:
- [12.x] Add uniqueStrings() method to collections #57517
- [12.x] Add duplicateStrings() method to collections #57520
PHP Bug Report:
I've submitted a bug report and PR to correct the behavior of SORT_REGULAR, the root cause of the issue.
- array_unique() with SORT_REGULAR returns duplicate values php/php-src#20262
- Fix GH-20262: array_unique() SORT_REGULAR fails to deduplicate with mixed strings php/php-src#20273
Steps To Reproduce
Example 1: Phone Numbers (when collecting unique variations)
use Illuminate\Support\Collection;
$phones = collect([
'9495551234', // Local format
'19495551234', // With country code
'+19495551234', // International format
'949-555-1234', // Formatted with dashes
'9495551234', // Duplicate of first
]);
$unique = $phones->unique();
echo "Expected: 4 unique phone numbers\n";
echo "Actual: " . $unique->count() . " items\n";
print_r($unique->values()->all());Expected: 4 unique values ['9495551234', '19495551234', '+19495551234', '949-555-1234']
Actual: 6 items with:
- Lost:
'+19495551234'(incorrectly treated as duplicate of'19495551234') - Kept:
'9495551234'appears twice (duplicate not removed)
Example 2: Unit Numbers
$units = collect(['5', '10', '5', '3A', '5', '5']);
$unique = $units->unique();
echo "Expected: 3 unique values ['5', '10', '3A']\n";
echo "Actual: " . $unique->count() . " items\n";
print_r($unique->values()->all());Expected: 3 unique values ['5', '10', '3A']
Actual: 4 items ['5', '10', '3A', '5'] (duplicate '5' not removed)
Example 3: Duplicate Detection False Positives
$contacts = collect([
'5015551234', // First entry
'15015551234', // Different format (unique)
'+15015551234', // Different format (unique)
'5015551234', // Actual duplicate of first
]);
$duplicates = $contacts->duplicates();
echo "Expected: 1 duplicate at index 3\n";
echo "Actual: " . $duplicates->count() . " duplicates\n";
print_r($duplicates->all());Expected: 1 duplicate [3 => '5015551234']
Actual: 2 items with false positive [2 => '+15015551234', 3 => '5015551234']