Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffling algorithm produces biaised permutation #11269

Closed
mratsim opened this issue May 17, 2019 · 2 comments
Closed

Shuffling algorithm produces biaised permutation #11269

mratsim opened this issue May 17, 2019 · 2 comments
Assignees

Comments

@mratsim
Copy link
Collaborator

mratsim commented May 17, 2019

The shuffling algorithm in the standard library is biaised and does not produce evenly distributed permutations.

Nim/lib/pure/random.nim

Lines 567 to 580 in f1a8edc

proc shuffle*[T](r: var Rand; x: var openArray[T]) =
## Shuffles a sequence of elements in-place using the given state.
##
## See also:
## * `shuffle proc<#shuffle,openArray[T]>`_ that uses the default
## random number generator
runnableExamples:
var cards = ["Ace", "King", "Queen", "Jack", "Ten"]
var r = initRand(678)
r.shuffle(cards)
doAssert cards == ["King", "Ace", "Queen", "Ten", "Jack"]
for i in countdown(x.high, 1):
let j = r.rand(i)
swap(x[i], x[j])

It should implement an unbiaised pseudo-random permutation algorithm instead, here are some propositions:

The most well known is Fisher-Yates, also often called Knuth Shuffle.

See in-depth explanations with demos:

Here are implementations in Python of Fisher-Yates, Prime shuffling, Feistel Shuffling and Swap-or-not shuffling: https://github.com/ethereum/research/tree/260faf622ef5e8e84bda1258a9f98956baf72991/shuffling

@Araq
Copy link
Member

Araq commented May 20, 2019

I fail to see how the stdlib's implementation differs from what https://medium.com/@oldwestaction/randomness-is-hard-e085decbcbb2 describes

@narimiran
Copy link
Member

narimiran commented May 20, 2019

I did a quick test of Nim's shuffle frequencies based on the article @Araq linked:

For the first test ("300k tries on a 4-item set"):

expected: 12500 (one in 24)
smallest: 12339 (one in 24)
largest:  12694 (one in 23)

For the second test ("1 million tries on a 6-item set"):

expected: 1388 (one in 720)
smallest: 1269 (one in 788)
largest:  1491 (one in 670)

Both examples show quite uniform distribution. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants