Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faker unique() not working properly #46287

Closed
binaryfire opened this issue Feb 27, 2023 · 7 comments
Closed

Faker unique() not working properly #46287

binaryfire opened this issue Feb 27, 2023 · 7 comments

Comments

@binaryfire
Copy link

binaryfire commented Feb 27, 2023

  • Laravel Version: 10.0.3
  • PHP Version: 8.2.3
  • Database Driver & Version: MySQL 8.032

Description:

fake()->unique() doesn't seem to be working properly, at least for the generators I've tried. Some examples:

numberBetween
domainName
domainWord
word

Some variations I've tried that have created dupes:

'invoice_number' => fake()->unique()->numberBetween($min = 1, $max = 999),
'invoice_number' => fake()->unique(true)->numberBetween($min = 1, $max = 999),
'domain' => fake()->unique()->domainName,
'domain' => fake()->unique()->word,

Example error msg:
SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'pollich.biz' for key.........

I'm only creating 200-300 records so numberBetween shouldn't be running out of unique values. But according to the Faker docs an exception should be thrown if a duplicate value is generated: https://fakerphp.github.io/#modifiers

A few others are having the same issue too: https://laracasts.com/discuss/channels/laravel/faker-unique-not-always-working

To reproduce:

  1. Do a migrate:fresh
  2. Create some factories using the generators I mentioned above
  3. Seed a few hundred records
  4. Rinse and repeat steps 1-3 a few times. At some point you'll end up with dupes. No exceptions will be thrown.
@timacdonald
Copy link
Member

@binaryfire the unique aspect of faker is only during a single invocation of the Framework.

For example:

# Invocation 1
php artisan migrate:fresh --seed

# Invocation 2
php artisan db:seed --class=Foo

# Invocation 3
php artisan tinker

Within each of these commands Faker should generate unique values, however it does not generate unique values across them. So unique values generated in Invocation 1 may also be generated in Invocation 2 and 3.

Each time PHP shuts down Faker will reset its unique cache and may then generate the same values again.

Within a single invocation:

Screen Shot 2023-02-28 at 8 22 15 am

Across multiple invocations:

Screen Shot 2023-02-28 at 8 24 21 am

Hope this helps, but let us know if we have missed something.

@timacdonald
Copy link
Member

timacdonald commented Feb 27, 2023

Looks like numberBetween is not respecting the unique constraint. Spoke to soon. Does seems to be respecting it.

Screen Shot 2023-02-28 at 8 41 48 am

@binaryfire
Copy link
Author

binaryfire commented Feb 27, 2023

@timacdonald Hey Tim. Yeah I’m aware of that. The problem occurs on a single invocation. What I meant by “rinse and repeat” is to “nuke the db and try again” if you can't reproduce the problem the first time. Hitting a dupe isn’t guaranteed even if unique() is broken- it’s a matter of probability.

Did you try seeding hundreds of records at once and attempt to reproduce the problem several times, starting with an empty db each time? Here's some dummy files to use:

Factory:

<?php

declare(strict_types=1);

namespace Database\Factories;

use App\Models\User;
use Illuminate\Support\Str;
use Illuminate\Database\Eloquent\Factories\Factory;

/**
 * @extends \Illuminate\Database\Eloquent\Factories\Factory<\App\Models\User>
 */
class UserFactory extends Factory
{
    /**
     * Define the model's default state.
     *
     * @return array<string, mixed>
     */
    public function definition(): array
    {
        return [
            'first_name' => fake()->unique()->firstName(),
            'last_name' => fake()->unique()->lastName(),
            'email' => fake()->unique()->word(),
            'password' => fake()->unique()->domainName(),
            'remember_token' => fake()->unique(true)->numberBetween($min = 1, $max = 999),
        ];
    }

    /**
     * Indicate that the model's email address should be unverified.
     */
    public function unverified(): static
    {
        return $this->state(fn (array $attributes) => [
            'email_verified_at' => null,
        ]);
    }
}

Seeder:

<?php

declare(strict_types=1);

namespace Database\Seeders;

use App\Models\User;
use Illuminate\Database\Seeder;
use Illuminate\Database\Console\Seeds\WithoutModelEvents;

class UserSeeder extends Seeder
{
    /**
     * Run the database seeds.
     */
    public function run(): void
    {

        // Create lots of users
        User::factory(500)->create();

    }
}

Example of the problem using the above 2 files:

image

@binaryfire
Copy link
Author

binaryfire commented Feb 28, 2023

@timacdonald Have done a bit more investigation. I'm using unique(true) for numberBetween because of this: fzaninotto/Faker#1512.

From the Faker docs:

// you can reset the unique modifier for all providers by passing true as first argument
$faker->unique($reset = true)->randomDigitNotNull(); // will not throw OverflowException since unique() was reset
// tip: unique() keeps one array of values per provider

But using that for a single generator breaks exception handling for all the other generators. Using unique(true) for everything does greatly improve the odds of avoiding dupes, but it'll still happen after 5 or 6 attempts.

'first_name' => fake()->unique(true)->firstName(),
'last_name' => fake()->unique(true)->lastName(),
'email' => fake()->unique(true)->safeEmail(),
'password' => fake()->unique(true)->domainName(),
'remember_token' => fake()->unique(true)->numberBetween($min = 1, $max = 999),

Is there anything that can be done to improve this? I can generate emails and domains using bothify() as a workaround, but I'm not sure there's a way of generating unique numbers in a range without using numberBetween. And numberBetween doesn't work without unique(true) when creating large numbers of records.

@binaryfire
Copy link
Author

binaryfire commented Feb 28, 2023

@timacdonald Calling faker directly works 100% of the time. This will work:

<?php

declare(strict_types=1);

namespace Database\Factories;

use App\Models\User;
use Faker\Factory as Faker;
use Illuminate\Support\Str;
use Illuminate\Database\Eloquent\Factories\Factory;

/**
 * @extends \Illuminate\Database\Eloquent\Factories\Factory<\App\Models\User>
 */
class UserFactory extends Factory
{
    /**
     * Define the model's default state.
     *
     * @return array<string, mixed>
     */
    public function definition(): array
    {

        $faker = Faker::create();

        return [
            'first_name' => $faker->unique()->firstName(),
            'last_name' => $faker->unique()->lastName(),
            'email' => $faker->unique()->email(),
            'password' => $faker->unique()->domainName(),
            'remember_token' => $faker->unique()->numberBetween($min = 1, $max = 999),
        ];
    }

    /**
     * Indicate that the model's email address should be unverified.
     */
    public function unverified(): static
    {
        return $this->state(fn (array $attributes) => [
            'email_verified_at' => null,
        ]);
    }
}

@timacdonald
Copy link
Member

The fake() global helper will always resolve a singleton. If you call fake()->unique(true) it will clear the singletons shared unique cache.

This is the expected behaviour.

If you need different functionality, I would recommend accessing an instance of Faker directory and not use the fake() helper.

@pxlrbt
Copy link
Contributor

pxlrbt commented Jan 12, 2024

@binaryfire I think the reset flag shouldn't be used inside factories ($this->faker->unique(true)) as it resets the unique cache on every call. That's why you are getting random collisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants