New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: math/rand: add Shuffle #20480

Closed
josharian opened this Issue May 24, 2017 · 17 comments

Comments

Projects
None yet
6 participants
@josharian
Contributor

josharian commented May 24, 2017

CL 33909 re-implements Fisher-Yates shuffling--simple and easy but notoriously error prone. CL 41973 used rand.Perm to avoid re-implementing it, but at the cost of an alloc+copy.

Random shuffling of a slice is a very common need. Now that we have reflect.Swapper, however, we could implement a reasonably efficient Shuffle function in math/rand:

// Shuffle pseudo-randomizes the order of elements in slice.
func Shuffle(slice interface{})

Or ShuffleSlice?

@josharian josharian added this to the Proposal milestone May 24, 2017

@gopherbot gopherbot added the Proposal label May 24, 2017

@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz May 24, 2017

Member

And crypto/rand.ShuffleSlice, and math/rand.(*Rand).ShuffleSlice?

Starting to get a little heavy.

Member

bradfitz commented May 24, 2017

And crypto/rand.ShuffleSlice, and math/rand.(*Rand).ShuffleSlice?

Starting to get a little heavy.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian May 24, 2017

Contributor

I see no clear need for crypto/rand.Shuffle; there's no crypto/rand.Perm.

Yes, math/rand.(*Rand).Shuffle. The "all methods are duplicated" pattern is easy to notice as a user, and thus isn't really the same as doubling the API surface with new methods.

Contributor

josharian commented May 24, 2017

I see no clear need for crypto/rand.Shuffle; there's no crypto/rand.Perm.

Yes, math/rand.(*Rand).Shuffle. The "all methods are duplicated" pattern is easy to notice as a user, and thus isn't really the same as doubling the API surface with new methods.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jun 5, 2017

Contributor

I was really hoping to keep the swapper experiment limited to sort. If we do this (if), Shuffle should take an interface that is the obvious subset of sort.Interface (no Less). It seems like then we'd also need ShuffleSlice for convenience, though.

But maybe instead of thinking about analogies to sort.Sort we should think about analogies to sort.Search. What about:

rand.Shuffle(n int, swap func(i, j int))

where you'd write

rand.Shuffle(len(x), func(i, j int) { x[i], x[j] = x[j], x[i] })

Maybe that's OK?

Contributor

rsc commented Jun 5, 2017

I was really hoping to keep the swapper experiment limited to sort. If we do this (if), Shuffle should take an interface that is the obvious subset of sort.Interface (no Less). It seems like then we'd also need ShuffleSlice for convenience, though.

But maybe instead of thinking about analogies to sort.Sort we should think about analogies to sort.Search. What about:

rand.Shuffle(n int, swap func(i, j int))

where you'd write

rand.Shuffle(len(x), func(i, j int) { x[i], x[j] = x[j], x[i] })

Maybe that's OK?

@rsc rsc changed the title from proposal: add math/rand.Shuffle to proposal: math/rand: add Shuffle Jun 5, 2017

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jun 12, 2017

Contributor

Any thoughts on comment above?

Contributor

rsc commented Jun 12, 2017

Any thoughts on comment above?

@cespare

This comment has been minimized.

Show comment
Hide comment
@cespare

cespare Jun 12, 2017

Contributor

I looked through my own code and the Go code at work and all the shuffling I can find operates on slices. So from a user perspective, the original proposal (ShuffleSlice which uses swapper) is more convenient since the swap function will always be the same func(i, j int) { x[i], x[j] = x[j], x[i] } boilerplate.

Any of these proposals seem reasonable and would be useful for me, though.

Contributor

cespare commented Jun 12, 2017

I looked through my own code and the Go code at work and all the shuffling I can find operates on slices. So from a user perspective, the original proposal (ShuffleSlice which uses swapper) is more convenient since the swap function will always be the same func(i, j int) { x[i], x[j] = x[j], x[i] } boilerplate.

Any of these proposals seem reasonable and would be useful for me, though.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Jun 13, 2017

Contributor

I agree with @cespare.

  • Though a rand.Shuffle that accepts a swap function is more general, I suspect that the "sort a slice" case so dominates usage that it'd be better to optimize for it at the cost of generality.
  • It is worth double-checking whether the two would be equivalent, or at least close, in terms of performance. If there's a big gap, I'd favor the more efficient one.
  • In the end, I'd be perfectly happy with either API.
Contributor

josharian commented Jun 13, 2017

I agree with @cespare.

  • Though a rand.Shuffle that accepts a swap function is more general, I suspect that the "sort a slice" case so dominates usage that it'd be better to optimize for it at the cost of generality.
  • It is worth double-checking whether the two would be equivalent, or at least close, in terms of performance. If there's a big gap, I'd favor the more efficient one.
  • In the end, I'd be perfectly happy with either API.
@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Jun 13, 2017

Contributor

I'm with @rsc, in general. I've often found that I either want to track the elements got shuffled, or I want to shuffle multiple different slices (say, the inputs and outputs). The more general function would also make it easy to shuffle rows or columns of a matrix (for me). This would be hard with just the slice formulation.

Contributor

btracey commented Jun 13, 2017

I'm with @rsc, in general. I've often found that I either want to track the elements got shuffled, or I want to shuffle multiple different slices (say, the inputs and outputs). The more general function would also make it easy to shuffle rows or columns of a matrix (for me). This would be hard with just the slice formulation.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Jun 13, 2017

Contributor

Thanks, @btracey. The "shuffle multiple slices" and row/column use cases are pretty compelling. I'm convinced.

Contributor

josharian commented Jun 13, 2017

Thanks, @btracey. The "shuffle multiple slices" and row/column use cases are pretty compelling. I'm convinced.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jun 13, 2017

Contributor

Does anyone have any numbers on how often this even comes up? I mean, sorting data happens all the time; I can't remember the last time I needed to shuffle data.

Contributor

rsc commented Jun 13, 2017

Does anyone have any numbers on how often this even comes up? I mean, sorting data happens all the time; I can't remember the last time I needed to shuffle data.

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Jun 13, 2017

Contributor

In Gonum, we currently have two uses of rand.Perm (well, one use that occurs twice), for Latin Hypercube sampling [0], where you want samples in random buckets. This could be implemented with shuffle instead of Perm. We also have a use in our testing suite for generating a matrix in a particular form [1](name not our fault, copied from the old fortran days).

Searching a large chunk of my personal research code over the past couple of years, I have about 10 distinct uses (more if you count mulitple implementations of similar functionality). Loosely, I've used it in machine learning for stochastic gradient descent variations, in cross validation for training/testing partitioning (probably would not reimplement with shuffle), for choosing a unique set of bits to flip in a bit string (not shuffle), and for dealing with algorithms that depend on data order, but you want to remove that bias.

I also have a file that appears to read in a csv, shuffle the order of the rows, and then write a new csv. I do not remember why I thought that was necessary.

[0] https://godoc.org/gonum.org/v1/gonum/stat/samplemv#LatinHypercube
[1] https://godoc.org/gonum.org/v1/gonum/lapack/testlapack#DgetriTest

Contributor

btracey commented Jun 13, 2017

In Gonum, we currently have two uses of rand.Perm (well, one use that occurs twice), for Latin Hypercube sampling [0], where you want samples in random buckets. This could be implemented with shuffle instead of Perm. We also have a use in our testing suite for generating a matrix in a particular form [1](name not our fault, copied from the old fortran days).

Searching a large chunk of my personal research code over the past couple of years, I have about 10 distinct uses (more if you count mulitple implementations of similar functionality). Loosely, I've used it in machine learning for stochastic gradient descent variations, in cross validation for training/testing partitioning (probably would not reimplement with shuffle), for choosing a unique set of bits to flip in a bit string (not shuffle), and for dealing with algorithms that depend on data order, but you want to remove that bias.

I also have a file that appears to read in a csv, shuffle the order of the rows, and then write a new csv. I do not remember why I thought that was necessary.

[0] https://godoc.org/gonum.org/v1/gonum/stat/samplemv#LatinHypercube
[1] https://godoc.org/gonum.org/v1/gonum/lapack/testlapack#DgetriTest

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jun 19, 2017

Contributor

Josh are you saying you're now convinced to do:

rand.Shuffle(n int, swap func(i, j int))

?

Contributor

rsc commented Jun 19, 2017

Josh are you saying you're now convinced to do:

rand.Shuffle(n int, swap func(i, j int))

?

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Jun 19, 2017

Contributor

Yes. I've not had a chance yet to gather empirical data about demand.

Contributor

josharian commented Jun 19, 2017

Yes. I've not had a chance yet to gather empirical data about demand.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Jun 20, 2017

Contributor

I gathered some quick and dirty empirical data from GitHub searches, restricted with language:Go.

For a baseline:

Perm/shuffling:

  • rand.Perm: 9,502 results. Of the first two pages of results, 5 were implementing shuffling, 6 were simple demo/experimentation code, 2 were wrappers around math.Rand, 1 didn't call rand.Perm, 3 were used for random input for tests/benchmarks. Extrapolation suggests ~2,500 uses of rand.Perm to implement shuffling.
  • fisher-yates: 479 results
  • shuffle: 10,304 results

Off the topic of my head, in the standard library, in addition to the two cases above from the compiler, we also do a shuffle of cases in the select implementation, and might do one in the sort tests.

My pulse from this is that rand.Shuffle would be of low-to-medium popularity among math/rand APIs.

If we decide not to add rand.Shuffle, I would suggest considering having rand.Shuffle replace rand.Perm as the shuffling primitive in math/rand for Go 2.

Contributor

josharian commented Jun 20, 2017

I gathered some quick and dirty empirical data from GitHub searches, restricted with language:Go.

For a baseline:

Perm/shuffling:

  • rand.Perm: 9,502 results. Of the first two pages of results, 5 were implementing shuffling, 6 were simple demo/experimentation code, 2 were wrappers around math.Rand, 1 didn't call rand.Perm, 3 were used for random input for tests/benchmarks. Extrapolation suggests ~2,500 uses of rand.Perm to implement shuffling.
  • fisher-yates: 479 results
  • shuffle: 10,304 results

Off the topic of my head, in the standard library, in addition to the two cases above from the compiler, we also do a shuffle of cases in the select implementation, and might do one in the sort tests.

My pulse from this is that rand.Shuffle would be of low-to-medium popularity among math/rand APIs.

If we decide not to add rand.Shuffle, I would suggest considering having rand.Shuffle replace rand.Perm as the shuffling primitive in math/rand for Go 2.

@btracey

This comment has been minimized.

Show comment
Hide comment
@btracey

btracey Jun 20, 2017

Contributor

I don't think it'll affect your results too much, but is it easy to search for *rand.Rand types that call those methods? In gonum we make sure to generate explicit streams so we have reproducible tests.

Contributor

btracey commented Jun 20, 2017

I don't think it'll affect your results too much, but is it easy to search for *rand.Rand types that call those methods? In gonum we make sure to generate explicit streams so we have reproducible tests.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jul 17, 2017

Contributor

Let's start with rand.Shuffle and see about rand.ShuffleSlice after that. There should be no crypto/rand.ShuffleSlice (mentioned above, but I can't see why).

Contributor

rsc commented Jul 17, 2017

Let's start with rand.Shuffle and see about rand.ShuffleSlice after that. There should be no crypto/rand.ShuffleSlice (mentioned above, but I can't see why).

@rsc rsc modified the milestones: Go1.10, Proposal Jul 17, 2017

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Jul 17, 2017

Contributor

Assuming @josharian wants to do this, but if not speak up and anyone else can.

Contributor

rsc commented Jul 17, 2017

Assuming @josharian wants to do this, but if not speak up and anyone else can.

@josharian josharian self-assigned this Jul 21, 2017

@gopherbot

This comment has been minimized.

Show comment
Hide comment
@gopherbot

gopherbot Jul 29, 2017

Change https://golang.org/cl/51891 mentions this issue: math/rand: add Shuffle

gopherbot commented Jul 29, 2017

Change https://golang.org/cl/51891 mentions this issue: math/rand: add Shuffle

josharian added a commit to josharian/go that referenced this issue Aug 12, 2017

math/rand: add Shuffle
Shuffle uses the Fisher-Yates algorithm.

Since this is new API, it affords us the opportunity
to use a much faster Int31n implementation that mostly avoids division.
As a result, BenchmarkPerm30ViaShuffle is
about 30% faster than BenchmarkPerm30,
despite requiring a separate initialization loop
and using function calls to swap elements.

Fixes #20480
Updates #16213
Updates #21211

Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5

@gopherbot gopherbot closed this in a2dfe5d Sep 8, 2017

josharian added a commit to josharian/go that referenced this issue Nov 13, 2017

math/rand: add Shuffle
Shuffle uses the Fisher-Yates algorithm.

Since this is new API, it affords us the opportunity
to use a much faster Int31n implementation that mostly avoids division.
As a result, BenchmarkPerm30ViaShuffle is
about 30% faster than BenchmarkPerm30,
despite requiring a separate initialization loop
and using function calls to swap elements.

Fixes #20480
Updates #16213
Updates #21211

Change-Id: Ib8956c4bebed9d84f193eb98282ec16ee7c2b2d5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment