Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usernames are not random enough -- getting collisions in my specs #19

Closed
d11wtq opened this issue Jun 4, 2011 · 10 comments
Closed

Usernames are not random enough -- getting collisions in my specs #19

d11wtq opened this issue Jun 4, 2011 · 10 comments

Comments

@d11wtq
Copy link

d11wtq commented Jun 4, 2011

This happens very sporadically, but in tests I'm using ffaker to generate random usernames. I was previously just using ActiveSupport::SecureRandom.hex(16), but a colleague convinced me that ffaker was the way to go, so I switched and have had much success, besides this single issue.

In specs where I needs to create two or more users, sometimes, but not always (read, mostly not), ffaker generates the same username twice, which causes my specs to fail due to duplicate key conflicts in the database. Re-running the test suite always corrects the issue. My database is correctly cleaned between examples... this issue is localized to within a single example/test.

I think a possible solution (and maybe it actually exists but I don't see it), is for ffaker to strike out already used usernames, so that the same one cannot be generated twice, then provide some method to reset ffaker's dictionary so that all struck-out entries are released for re-use. This way one can just reset ffaker in their setup/teardown logic.

@EmmanuelOga
Copy link

Hello,

The reason you get duplicates is because ffaker works out of random indexes to arrays of data. Since we want the names to be random and not always generate the same data each time you start getting names, we use Kernel's rand method to get the index to the arrays, and so sometimes you might get duplicated values.

One easy 'fix' for this is to create a helper method to store names already generated. I've using sham.rb from an old version of the machinist gem to this purpose:

https://gist.github.com/1008143

require 'sham'
require 'ffaker'

Sham.define do
  email { Faker::Internet.email }
end

1.upto(10) do
  puts Sham.email
end

I'm thinking it might be useful to have Sham inside ffaker, wdyt?

@rstacruz
Copy link
Contributor

rstacruz commented Jun 4, 2011

+1 for a built-in Sham inside ffaker

@d11wtq
Copy link
Author

d11wtq commented Jun 5, 2011

+1 for having it built-in, I agree. I notice that Machinist 1 requires that the developer uses Sham, while Machinist 2 removes that requirement. I wonder how they get around it. I'm using DataMapper so I'm using Machinist 1, without Sham... and now I understand why Sham is needed, thanks ;) You could probably just have a Faker::Base.reset! call, or something of that nature to simplify things I guess. Might knock a few points off those performance numbers though ;)

@rstacruz
Copy link
Contributor

rstacruz commented Jun 5, 2011

By the way, I think if this was to be built-into FFaker, it has to be optional for the obvious performance hit just to answer a case that's probably only meaningful 5% of the time.

Maybe this can be done with something like:

  1. Faker::Internet.ensure_uniqueness! which will wrap all of the given generator's methods with a sham shim (heh), or
  2. Implement Faker::Unique (with a const_missing hook) so you can just use Faker::Unique::Internet.email

@chulkilee
Copy link

I don't think (f)faker are responsible for creating unique values.

If you need to create unique username in tests, you can do it easily with adding sequence (e.g. FactoryGirl sequence)

Generating not enough random values is quite different issue - that can be a problem, but I don't think your description is that case.

@deXterbed
Copy link

@chulkilee i don't think using sequence with names makes any sense, meaningful names is the reason anyone started using ffaker in the first place. Why not just maintain a list of used names to keep up with the uniqueness constraints?

@chulkilee
Copy link

I do think username with sequence is meaningful enough for testing.

It's not "meaningless" username as much as "user1", "user2". In the real world people are experiencing collision of usernames, aren't they? And websites suggest adding random unique number (not sequential) at the end in that case (e.g. google)

I don't think adding sham in ffaker is a good idea. Generally it's good to have the single responsibility for a library. For example I'm using factory_girl (which can be used in the place of sham).

@EmmanuelOga
Copy link

Sadly I don't use ffaker myself very much these days and putting some time aside to work on it has not been the first thing in my mind for a while.

But I always have in the back burner the idea of removing randomness from ffaker and given each unique ffaker permutation a unique sequence number. I think this would be a great feature. The API would look something like:

def (Faker::Name).name(permutation_id = random(max_name_permutations))
blah...
end

Rewriting each generator to accept a permutation id would be a lot of effort... Once done, if you needed unique names you would be able to pass an incrementing number to each method:

Faker::NameXX.fullname(1) # Tony Kamo
Faker::NameXX.fullname(2) # Anita Lahuerfanita
Faker::NameXX.fullname(3) # Carozo Ynarizota
Faker::NameXX.fullname(1) # Tony Kamo
# etc...

A checksum to int mod max_permutation_number could be used to map names to names in funny ways

Faker::NameUS.fullname("Emmanuel Oga".hash % NameUS::FULLNAME_COUNT) # John Doe

😄

One MAJOR advantage is that the test suite would be a lot cleaner and less flaky!

@gmile
Copy link

gmile commented Jun 13, 2014

@EmmanuelOga would be really useful! Just stumbled upon this "buggy" behaviour as well...

@toupeira
Copy link

This looks like an outdated issue, FFaker now supports unique results e.g. FFaker::Name.unique.name.

@Volosh1n Volosh1n closed this as completed Sep 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants