Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: use PCG Source in math/rand for Go 2 #21835

Open
robpike opened this issue Sep 11, 2017 · 21 comments

Comments

@robpike
Copy link
Contributor

commented Sep 11, 2017

The original math/rand package had a number of flaws. Some of the flaws in the API have been addressed, such as the addition of the Source64 interface, but the most important issue remains: the default Source is poor.

The Source has several problems.

  • It has no clear provenance (it came from Plan 9, but its history before that is unknown).
  • It is a linear feedback shift register, which is known to have flaws uncovered by various empirical tests.
  • It is difficult and expensive to seed.
  • It is enormous! The shift register itself comprises 607 64-bit words, a total of 4856 bytes.

This last point is not fatal if there is only one instance, but it couples poorly with another fundamental and perhaps unfixable problem: bad behavior when accessed concurrently. It must be protected with a mutex, but if it were much smaller (and cheaper to seed) it would be practical instead to have each client in the address space access a separate instance, seeded uniquely.

To put it another way, with a separate Source for each goroutine, no locking would be required, but the current implementation is impractically large for such an approach.

The proposal here is in several parts.

First, we create an alternate Source implementation. Our choice is the Permuted Congruential Generator (PCG) designed and thoroughly tested by O'Neill in:

PCG: A Family of Simple Fast Space-Efficient Statistically Good
Algorithms for Random Number Generation
Melissa E. O’Neill, Harvey Mudd College
http://www.pcg-random.org/pdf/toms-oneill-pcg-family-v1.02.pdf

A 128-bit PCG generator producing a 64-bit output is only two words of memory but provably has a period of 2**128, is cheap to seed, and efficient. (A 64-bit generator would be noticeably cheaper, but a period of only 2**64 is deemed too small for modern hardware). It is practical and reasonable for every goroutine to have a local Source (a simple seeding algorithm is to use the address of the Source as the seed).

An implementation of this generator, specifically PCG XSL RR 128/64 (LCG), is checked in to golang.org/x/exp/rand as its default Source.

Second, we work on the compiler, possibly through math/bits, to make it a little more efficient. The current implementation, in pure Go, is somewhat slower than math/rand, but could be comparably fast or even faster given compiler support for the 128-bit multiply inside.

The third part is the critical piece.

We propose to make this generator the default one in math/rand for Go 2. The Go 1 compatibility decree could allow it to happen today, but we have chosen in the past to take the guarantee more literally than is expressed by also requiring that the generated stream of bits will never change. This decision was made when a fix was done, but enough tests in production were discovered that depended on the exact stream that it was deemed simpler to lock down the implementation rather than to fix all the tests.

However, with Go 2 on the horizon, there is enough time and understanding and process to offer an opportunity to use rolling out an improved, API-compatible version of math/rand as an early test case for Go 2. Changing math/rand to use PCG is strictly incompatible, but not in a way that the Go 1 decree actually prohibits. A soft breakage at this level would be an excellent step on the way to understanding what it takes to roll out a breaking change for Go 2.

In essence, the proposal is to use the fixing of math/rand as a test case for introducing breaking changes in Go 2. And a much better random number generator would result, too: the breaking change is worth making.

The precise sequence of steps to roll it out needs discussion, but one possible outline is like this:

  • Add PCG as a secondary source to math/rand, but not as the default.
  • Make it available under an alternate name; PCGSource is the obvious.
  • Provide an alias for the current source, say LFSRSource.
  • Optional: Provide a gofix that updates existing code to use LFSRSource rather than Source. Since Source is usually hidden, accessed through Rand, this may be ineffectual unless we also provide PCG functions explicitly for Int32, etc. Or perhaps provide LFSR versions and rotate towards those in legacy code.
  • Make announcements of what's to come.
  • At some point, flip the alias so Source points to PCGSource and LFSRSource is no longer the default.
  • In parallel with this work, make math/bits support its operations more efficiently through the compiler.
  • Just prior to Go 2, delete LFSRSource.

At some point along the way, Source64 should become the default, under a similar sequence of events. Minor cleanups could occur too, tidying up the API and fixing some old-style names like Intn not IntN.

There is much detail that remains to be worked out; this proposal is only an outline.

@gopherbot gopherbot added this to the Proposal milestone Sep 11, 2017

@robpike robpike self-assigned this Sep 11, 2017

@robpike robpike changed the title proposal: fix math/rand for Go 2 proposal: use PCG source in math/rand for Go 2 Sep 11, 2017

@robpike robpike changed the title proposal: use PCG source in math/rand for Go 2 proposal: use PCG Source in math/rand for Go 2 Sep 11, 2017

@odeke-em

This comment has been minimized.

Copy link
Member

commented Sep 11, 2017

@MichaelTJones perhaps you might be interested in this issue, as you already have an implementation of Melissa O'Neill's PCG generator at https://github.com/MichaelTJones/pcg.

@MichaelTJones

This comment has been minimized.

Copy link
Contributor

commented Sep 12, 2017

I would very much support this change. In fact my careful PCG implementation is because of the weaknesses mentioned here. I long ago switched to this code and will not use the present library PRNG. In addition to being technically excellent, a new generator can be created in 2.72ns, makes new 64-bit values in 5.56ns, and has small state. Examine the tests in my implementation for details.

@dongweigogo

This comment has been minimized.

Copy link

commented Sep 12, 2017

why not bring it to Go 1

@mvdan

This comment has been minimized.

Copy link
Member

commented Sep 12, 2017

@dongweigogo please read the proposal text carefully, it is explained.

@btracey

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2017

To put it another way, with a separate Source for each goroutine, no locking would be required, but the current implementation is impractically large for such an approach.

For some random generators, generating a bunch of new streams is worse statistically for once stream. Section 4.3.2 of the paper suggests that this is not a problem with PCG if the seeding is done properly. Just wanted to comment for anyone curious.

@MichaelTJones

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2017

@lemire

This comment has been minimized.

Copy link

commented Jan 10, 2018

@MichaelTJones Notice how the bounded functions are significantly slower, that's because of the divisions. You can follow the new implementation at go/src/math/rand/rand.go and avoid divisions (with high probability) and get something faster...

MichaelTJones/pcg#2

Once Go gets something like a full 64-bit multiplication (outputting both the high and low 64-bit values), then we will be able to similarly speed up Bounded64.

@lemire

This comment has been minimized.

Copy link

commented Jan 10, 2018

@robpike The implementation of the bounded RNGs at https://github.com/golang/exp/blob/master/rand/rand.go implies one or two 64-bit division per call, assuming no magical tricks from the compiler. It costs dozens of cycles. The divisions probably cost more than the generation of the random number itself! The 64-bit division is a monster on x64 processors. Even the 32-bit division is surprisingly expensive. It is not very different on ARM processors.

A better alternative is to do a multiply followed by a shift...
https://github.com/golang/go/blob/master/src/math/rand/rand.go#L138-L160

@MichaelTJones

This comment has been minimized.

Copy link
Contributor

commented Jan 10, 2018

@josharian

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

There are at least three parts of math/rand: the source, the conversion/distribution routines, and the concurrency strategy. I think that it is worth revisiting all three for Go 2 and that all three should be considered together.

This proposal as written addresses the source.

The conversion routines changes include #16213 and a handful of other math/rand decisions that were made to preserve existing output, e.g. #8013, #6721, #12290, #16124, #8731, and probably others.

For the concurrency strategy, see #20387 and #21393.

@MichaelTJones

This comment has been minimized.

Copy link
Contributor

commented Jan 11, 2018

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 2, 2018

If get per-P storage in Go 2, or if we are willing to let math/rand have privileged runtime access, I think we should consider adding a global Source that:

  • stores state in a P
  • is lock-free
  • is explicitly documented not to have Any reproducibility
  • is seeded at startup or on first use
  • panics if Seed is called

This would allow people not to have to manually thread PRNG state around through goroutines, which seems to be a significant complaint.

Maybe it should even be the default Source, and have people opt in to reproducibility by making their own Source.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 2, 2018

cc @cespare for that last comment

@lemire

This comment has been minimized.

Copy link

commented Jul 23, 2018

(@imneme) O'Neill (PCG's inventor) has a relevant blog post...

Efficiently Generating a Number in a Range
http://www.pcg-random.org/posts/bounded-rands.html

@josharian

This comment has been minimized.

Copy link
Contributor

commented Feb 21, 2019

Second, we work on the compiler, possibly through math/bits, to make it a little more efficient. The current implementation, in pure Go, is somewhat slower than math/rand, but could be comparably fast or even faster given compiler support for the 128-bit multiply inside.

Go 1.12 includes 64 bit arithmetic in math/bits. I happened to want the speed-up today, so I implemented it. Go 1.12 isn't out yet, so it doesn't make sense to mail a CL, but in case anyone wants it in the interim, here are drop-in replacements/additions:

const (
	mulHigh    = multiplier >> 64
	mulLow     = multiplier & maxUint64
)

func (pcg *PCGSource) add() {
	var carry uint64
	pcg.low, carry = bits.Add64(pcg.low, incLow, 0)
	pcg.high, _ = bits.Add64(pcg.high, incHigh, carry)
}

func (pcg *PCGSource) multiply() {
	hi, lo := bits.Mul64(pcg.low, mulLow)
	hi += pcg.high * mulLow
	hi += pcg.low * mulHigh
	pcg.low = lo
	pcg.high = hi
}

If no one beats me to it, I'll eventually mail a CL to upstream these.

@robpike

This comment has been minimized.

Copy link
Contributor Author

commented Feb 21, 2019

@gopherbot

This comment has been minimized.

Copy link

commented Feb 21, 2019

Change https://golang.org/cl/163258 mentions this issue: exp: use new operations from math/bits for faster generation

@robpike

This comment has been minimized.

Copy link
Contributor Author

commented Feb 21, 2019

Nice speedup

benchmark                        old ns/op     new ns/op     delta
BenchmarkPCGMultiply-4           3.48          2.31          -33.62%
BenchmarkSource-4                7.46          3.54          -52.55%
BenchmarkInt63Threadsafe-4       21.6          19.4          -10.19%
BenchmarkInt63Unthreadsafe-4     7.69          3.60          -53.19%
BenchmarkIntn1000-4              17.3          13.2          -23.70%
BenchmarkInt63n1000-4            17.7          13.2          -25.42%
BenchmarkInt31n1000-4            17.4          13.2          -24.14%
BenchmarkFloat32-4               13.5          8.57          -36.52%
BenchmarkFloat64-4               11.1          6.43          -42.07%
BenchmarkPerm3-4                 64.6          49.2          -23.84%
BenchmarkPerm30-4                614           430           -29.97%
BenchmarkPerm30ViaShuffle-4      556           450           -19.06%
BenchmarkShuffleOverhead-4       969           788           -18.68%
BenchmarkRead3-4                 11.7          10.1          -13.68%
BenchmarkRead64-4                127           83.0          -34.65%
BenchmarkRead1000-4              1820          1191          -34.56%
gopherbot pushed a commit to golang/exp that referenced this issue Feb 21, 2019
exp: use new operations from math/bits for faster generation
Code by josharian.

benchmark                        old ns/op     new ns/op     delta
BenchmarkPCGMultiply-4           3.48          2.31          -33.62%
BenchmarkSource-4                7.46          3.54          -52.55%
BenchmarkInt63Threadsafe-4       21.6          19.4          -10.19%
BenchmarkInt63Unthreadsafe-4     7.69          3.60          -53.19%
BenchmarkIntn1000-4              17.3          13.2          -23.70%
BenchmarkInt63n1000-4            17.7          13.2          -25.42%
BenchmarkInt31n1000-4            17.4          13.2          -24.14%
BenchmarkFloat32-4               13.5          8.57          -36.52%
BenchmarkFloat64-4               11.1          6.43          -42.07%
BenchmarkPerm3-4                 64.6          49.2          -23.84%
BenchmarkPerm30-4                614           430           -29.97%
BenchmarkPerm30ViaShuffle-4      556           450           -19.06%
BenchmarkShuffleOverhead-4       969           788           -18.68%
BenchmarkRead3-4                 11.7          10.1          -13.68%
BenchmarkRead64-4                127           83.0          -34.65%
BenchmarkRead1000-4              1820          1191          -34.56%

Also move the rotate out from under the 1.9 build tag, as 1.9 is old.
The new code goes under a 1.12 build tag.

Update golang/go#21835.

Change-Id: I66d5cac6578b8df2cdfc3b90ed097c5525adb6bc
Reviewed-on: https://go-review.googlesource.com/c/163258
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
danhhz added a commit to danhhz/cockroach that referenced this issue Mar 2, 2019
workload/tpcc: use a faster random number generator
Background: golang/go#21835

The current thinking for go2 is to switch to a Permuted Congruential
Generator (PCG), which is much faster than the current "math/rand"
generator. It's smaller in memory (128-bit vs 607 64-bit words) and
_much_ faster to seed (which is a common operation in workload, which
uses seeding to make deterministic results).

The linked issues claims "The current implementation, in pure Go, is
somewhat slower than math/rand, but could be comparably fast or even
faster given compiler support for the 128-bit multiply inside." but for
our needs it seems to already be faster.

Benchmark results:
- InitTPCC is only generating the data,
- ImportFixtureTPCC is loading it with
  COCKROACH_IMPORT_WORKLOAD_FASTER=true, which skips the CSV roundtrip.

    name                     old time/op    new time/op    delta
    InitTPCC/warehouses=1-8     2.25s ± 1%     2.06s ± 1%  -8.07%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         4.90s ± 3%     4.64s ± 2%  -5.44%  (p=0.008 n=5+5)

    name                     old speed      new speed      delta
    InitTPCC/warehouses=1-8  31.5MB/s ± 1%  34.3MB/s ± 1%  +8.80%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8      18.0MB/s ± 3%  19.1MB/s ± 2%  +5.77%  (p=0.008 n=5+5)

    name                     old alloc/op   new alloc/op   delta
    InitTPCC/warehouses=1-8     246MB ± 0%     246MB ± 0%    ~     (p=0.095 n=5+5)
    ImportFixtureTPCC-8        3.61GB ± 0%    3.61GB ± 0%    ~     (p=0.548 n=5+5)

    name                     old allocs/op  new allocs/op  delta
    InitTPCC/warehouses=1-8     5.60M ± 0%     5.60M ± 0%  +0.06%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         32.2M ± 0%     32.3M ± 0%  +0.08%  (p=0.008 n=5+5)

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Mar 3, 2019
workload/tpcc: use a faster random number generator
Background: golang/go#21835

The current thinking for go2 is to switch to a Permuted Congruential
Generator (PCG), which is much faster than the current "math/rand"
generator. It's smaller in memory (128-bit vs 607 64-bit words) and
_much_ faster to seed (which is a common operation in workload, which
uses seeding to make deterministic results).

The linked issues claims "The current implementation, in pure Go, is
somewhat slower than math/rand, but could be comparably fast or even
faster given compiler support for the 128-bit multiply inside." but for
our needs it seems to already be faster.

Benchmark results:
- InitTPCC is only generating the data,
- ImportFixtureTPCC is loading it with
  COCKROACH_IMPORT_WORKLOAD_FASTER=true, which skips the CSV roundtrip.

    name                     old time/op    new time/op    delta
    InitTPCC/warehouses=1-8     2.25s ± 1%     2.06s ± 1%  -8.07%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         4.90s ± 3%     4.64s ± 2%  -5.44%  (p=0.008 n=5+5)

    name                     old speed      new speed      delta
    InitTPCC/warehouses=1-8  31.5MB/s ± 1%  34.3MB/s ± 1%  +8.80%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8      18.0MB/s ± 3%  19.1MB/s ± 2%  +5.77%  (p=0.008 n=5+5)

    name                     old alloc/op   new alloc/op   delta
    InitTPCC/warehouses=1-8     246MB ± 0%     246MB ± 0%    ~     (p=0.095 n=5+5)
    ImportFixtureTPCC-8        3.61GB ± 0%    3.61GB ± 0%    ~     (p=0.548 n=5+5)

    name                     old allocs/op  new allocs/op  delta
    InitTPCC/warehouses=1-8     5.60M ± 0%     5.60M ± 0%  +0.06%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         32.2M ± 0%     32.3M ± 0%  +0.08%  (p=0.008 n=5+5)

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Mar 4, 2019
workload/tpcc: use a faster random number generator
Background: golang/go#21835

The current thinking for go2 is to switch to a Permuted Congruential
Generator (PCG), which is much faster than the current "math/rand"
generator. It's smaller in memory (128-bit vs 607 64-bit words) and
_much_ faster to seed (which is a common operation in workload, which
uses seeding to make deterministic results).

The linked issues claims "The current implementation, in pure Go, is
somewhat slower than math/rand, but could be comparably fast or even
faster given compiler support for the 128-bit multiply inside." but for
our needs it seems to already be faster.

Benchmark results:
- InitTPCC only measures generating initial data
- ImportFixtureTPCC measures loading initial data with
  COCKROACH_IMPORT_WORKLOAD_FASTER=true, which skips the CSV roundtrip.

    name                     old time/op    new time/op    delta
    InitTPCC/warehouses=1-8     2.25s ± 1%     2.06s ± 1%  -8.07%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         4.90s ± 3%     4.64s ± 2%  -5.44%  (p=0.008 n=5+5)

    name                     old speed      new speed      delta
    InitTPCC/warehouses=1-8  31.5MB/s ± 1%  34.3MB/s ± 1%  +8.80%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8      18.0MB/s ± 3%  19.1MB/s ± 2%  +5.77%  (p=0.008 n=5+5)

    name                     old alloc/op   new alloc/op   delta
    InitTPCC/warehouses=1-8     246MB ± 0%     246MB ± 0%    ~     (p=0.095 n=5+5)
    ImportFixtureTPCC-8        3.61GB ± 0%    3.61GB ± 0%    ~     (p=0.548 n=5+5)

    name                     old allocs/op  new allocs/op  delta
    InitTPCC/warehouses=1-8     5.60M ± 0%     5.60M ± 0%  +0.06%  (p=0.008 n=5+5)
    ImportFixtureTPCC-8         32.2M ± 0%     32.3M ± 0%  +0.08%  (p=0.008 n=5+5)

Touches cockroachdb#34809

Release note: None
@db47h

This comment has been minimized.

Copy link

commented Jun 9, 2019

I understand this is PCG XSL RR 128/64 LCG. While it has decent performance on amd64, it drops dramatically compared to the current math/rand on 32bits platforms:

goos: linux
goarch: amd64
pkg: golang.org/x/exp/rand
BenchmarkSource-6       200000000                6.21 ns/op
BenchmarkSource_stdlib-6    200000000                7.07 ns/op
goos: linux
goarch: 386
pkg: golang.org/x/exp/rand
BenchmarkSource-6       30000000                45.8 ns/op
BenchmarkSource_stdlib-6    200000000                8.01 ns/op

BenchmarkSource_stdlib is the same as BenchmarkSource but using current math/rand. It might be worth including it in the benchmark suite.

Another thing to consider is licensing: PCG is under Apache 2.0 or MIT.

While I do not want to get into a debate about PRNGs or whether 32bits platforms are obsolete, there may be alternatives to consider like xoshiro256** and xoroshiro128** which are in the public domain and do not exhibit such dramatic performance drops on 32bits.

@lemire

This comment has been minimized.

Copy link

commented Jun 9, 2019

@db47h @imneme

there may be alternatives to consider (...) which are in the public domain

There may be good arguments for choosing something else than PCG... but this licensing issue is incorrect in my opinion. One PCG C++ library has been made available under "Apache 2.0 or MIT". If you want to use this particular library, then you should check the license, of course. However, if you implement a PCG algorithm in Go, you get to choose which license you want to apply. The reverse apply, I can pick any generator and apply a super evil license.

Furthermore, I'd like to point out that Go itself is not in the public domain.

A more relevant question is whether there is a patent for the PCG schemes. I just checked (it is easy to do) and there is not.

@db47h

This comment has been minimized.

Copy link

commented Jun 9, 2019

Furthermore, I'd like to point out that Go itself is not in the public domain.

Yes, but so far the Go standard library is under a single license. My "licensing concern" was to keep it that way.

A more relevant question is whether there is a patent for the PCG schemes. I just checked (it is easy to do) and there is not.

My bad, I assumed the license applied to the algorithms as well.

@mark-rushakoff mark-rushakoff referenced this issue Jun 14, 2019
2 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.