Skip to content

Collection::unique() fails to provide all unique values, or incorrectly provides duplicates #57528

@jmarble

Description

@jmarble

Laravel Version

12.35.1

PHP Version

8.4.13

Database Driver & Version

No response

Description

Collection::unique() and Collection::duplicates() produce incorrect results when working with string data containing numeric-looking values. The methods either lose unique values or fail to remove duplicates.

This is a correctness bug causing data loss in production applications. The workaround (uniqueStrict()/duplicatesStrict()) is correct but 50-200x slower, making it unusable for large datasets.

Real-world impact:

  • CRM contact deduplication (phone numbers from multiple sources)
  • Product SKU processing (CSV imports with numeric/variant SKUs)
  • API request tracking (finding unique failed requests)
  • Duplicate detection produces false positives

Root cause: PHP's array_unique() with SORT_REGULAR uses loose comparison for numeric strings ('+15015551234' == '15015551234'true) and an unstable sort with adjacent-only comparison, causing duplicates separated during sort to never be compared.

Related:
Previous PR attempts to provide a performant solution to the issue were closed:

PHP Bug Report:
I've submitted a bug report and PR to correct the behavior of SORT_REGULAR, the root cause of the issue.

Steps To Reproduce

Example 1: Phone Numbers (when collecting unique variations)

use Illuminate\Support\Collection;

$phones = collect([
    '9495551234',      // Local format
    '19495551234',     // With country code  
    '+19495551234',    // International format
    '949-555-1234',    // Formatted with dashes
    '9495551234',      // Duplicate of first
]);

$unique = $phones->unique();

echo "Expected: 4 unique phone numbers\n";
echo "Actual: " . $unique->count() . " items\n";
print_r($unique->values()->all());

Expected: 4 unique values ['9495551234', '19495551234', '+19495551234', '949-555-1234']

Actual: 6 items with:

  • Lost: '+19495551234' (incorrectly treated as duplicate of '19495551234')
  • Kept: '9495551234' appears twice (duplicate not removed)

Example 2: Unit Numbers

$units = collect(['5', '10', '5', '3A', '5', '5']);
$unique = $units->unique();

echo "Expected: 3 unique values ['5', '10', '3A']\n";
echo "Actual: " . $unique->count() . " items\n";
print_r($unique->values()->all());

Expected: 3 unique values ['5', '10', '3A']

Actual: 4 items ['5', '10', '3A', '5'] (duplicate '5' not removed)


Example 3: Duplicate Detection False Positives

$contacts = collect([
    '5015551234',      // First entry
    '15015551234',     // Different format (unique)
    '+15015551234',    // Different format (unique)
    '5015551234',      // Actual duplicate of first
]);

$duplicates = $contacts->duplicates();

echo "Expected: 1 duplicate at index 3\n";
echo "Actual: " . $duplicates->count() . " duplicates\n";
print_r($duplicates->all());

Expected: 1 duplicate [3 => '5015551234']

Actual: 2 items with false positive [2 => '+15015551234', 3 => '5015551234']

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions