Usernames are not random enough -- getting collisions in my specs #19

d11wtq · 2011-06-04T14:40:06Z

This happens very sporadically, but in tests I'm using ffaker to generate random usernames. I was previously just using ActiveSupport::SecureRandom.hex(16), but a colleague convinced me that ffaker was the way to go, so I switched and have had much success, besides this single issue.

In specs where I needs to create two or more users, sometimes, but not always (read, mostly not), ffaker generates the same username twice, which causes my specs to fail due to duplicate key conflicts in the database. Re-running the test suite always corrects the issue. My database is correctly cleaned between examples... this issue is localized to within a single example/test.

I think a possible solution (and maybe it actually exists but I don't see it), is for ffaker to strike out already used usernames, so that the same one cannot be generated twice, then provide some method to reset ffaker's dictionary so that all struck-out entries are released for re-use. This way one can just reset ffaker in their setup/teardown logic.

The text was updated successfully, but these errors were encountered:

EmmanuelOga · 2011-06-04T18:07:59Z

Hello,

The reason you get duplicates is because ffaker works out of random indexes to arrays of data. Since we want the names to be random and not always generate the same data each time you start getting names, we use Kernel's rand method to get the index to the arrays, and so sometimes you might get duplicated values.

One easy 'fix' for this is to create a helper method to store names already generated. I've using sham.rb from an old version of the machinist gem to this purpose:

https://gist.github.com/1008143

require 'sham'
require 'ffaker'

Sham.define do
  email { Faker::Internet.email }
end

1.upto(10) do
  puts Sham.email
end

I'm thinking it might be useful to have Sham inside ffaker, wdyt?

rstacruz · 2011-06-04T18:16:35Z

+1 for a built-in Sham inside ffaker

d11wtq · 2011-06-05T02:09:06Z

+1 for having it built-in, I agree. I notice that Machinist 1 requires that the developer uses Sham, while Machinist 2 removes that requirement. I wonder how they get around it. I'm using DataMapper so I'm using Machinist 1, without Sham... and now I understand why Sham is needed, thanks ;) You could probably just have a Faker::Base.reset! call, or something of that nature to simplify things I guess. Might knock a few points off those performance numbers though ;)

rstacruz · 2011-06-05T12:23:00Z

By the way, I think if this was to be built-into FFaker, it has to be optional for the obvious performance hit just to answer a case that's probably only meaningful 5% of the time.

Maybe this can be done with something like:

Faker::Internet.ensure_uniqueness! which will wrap all of the given generator's methods with a sham shim (heh), or
Implement Faker::Unique (with a const_missing hook) so you can just use Faker::Unique::Internet.email

chulkilee · 2013-05-07T19:05:55Z

I don't think (f)faker are responsible for creating unique values.

If you need to create unique username in tests, you can do it easily with adding sequence (e.g. FactoryGirl sequence)

Generating not enough random values is quite different issue - that can be a problem, but I don't think your description is that case.

deXterbed · 2014-05-26T11:00:26Z

@chulkilee i don't think using sequence with names makes any sense, meaningful names is the reason anyone started using ffaker in the first place. Why not just maintain a list of used names to keep up with the uniqueness constraints?

chulkilee · 2014-05-27T17:22:24Z

I do think username with sequence is meaningful enough for testing.

It's not "meaningless" username as much as "user1", "user2". In the real world people are experiencing collision of usernames, aren't they? And websites suggest adding random unique number (not sequential) at the end in that case (e.g. google)

I don't think adding sham in ffaker is a good idea. Generally it's good to have the single responsibility for a library. For example I'm using factory_girl (which can be used in the place of sham).

EmmanuelOga · 2014-06-13T08:47:18Z

Sadly I don't use ffaker myself very much these days and putting some time aside to work on it has not been the first thing in my mind for a while.

But I always have in the back burner the idea of removing randomness from ffaker and given each unique ffaker permutation a unique sequence number. I think this would be a great feature. The API would look something like:

def (Faker::Name).name(permutation_id = random(max_name_permutations))
blah...
end

Rewriting each generator to accept a permutation id would be a lot of effort... Once done, if you needed unique names you would be able to pass an incrementing number to each method:

Faker::NameXX.fullname(1) # Tony Kamo
Faker::NameXX.fullname(2) # Anita Lahuerfanita
Faker::NameXX.fullname(3) # Carozo Ynarizota
Faker::NameXX.fullname(1) # Tony Kamo
# etc...

A checksum to int mod max_permutation_number could be used to map names to names in funny ways

Faker::NameUS.fullname("Emmanuel Oga".hash % NameUS::FULLNAME_COUNT) # John Doe

😄

One MAJOR advantage is that the test suite would be a lot cleaner and less flaky!

gmile · 2014-06-13T11:48:16Z

@EmmanuelOga would be really useful! Just stumbled upon this "buggy" behaviour as well...

toupeira · 2018-10-27T09:48:32Z

This looks like an outdated issue, FFaker now supports unique results e.g. FFaker::Name.unique.name.

EmmanuelOga mentioned this issue Oct 8, 2014

Make deterministic #153

Closed

xunker mentioned this issue Jul 16, 2016

Make deterministic / repeatedly random using seed #285

Merged

Volosh1n closed this as completed Sep 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usernames are not random enough -- getting collisions in my specs #19

Usernames are not random enough -- getting collisions in my specs #19

d11wtq commented Jun 4, 2011

EmmanuelOga commented Jun 4, 2011

rstacruz commented Jun 4, 2011

d11wtq commented Jun 5, 2011

rstacruz commented Jun 5, 2011

chulkilee commented May 7, 2013

deXterbed commented May 26, 2014

chulkilee commented May 27, 2014

EmmanuelOga commented Jun 13, 2014

gmile commented Jun 13, 2014

toupeira commented Oct 27, 2018

Usernames are not random enough -- getting collisions in my specs #19

Usernames are not random enough -- getting collisions in my specs #19

Comments

d11wtq commented Jun 4, 2011

EmmanuelOga commented Jun 4, 2011

rstacruz commented Jun 4, 2011

d11wtq commented Jun 5, 2011

rstacruz commented Jun 5, 2011

chulkilee commented May 7, 2013

deXterbed commented May 26, 2014

chulkilee commented May 27, 2014

EmmanuelOga commented Jun 13, 2014

gmile commented Jun 13, 2014

toupeira commented Oct 27, 2018