Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: math/rand: reconsider lock-free source #49892

Open
oakad opened this issue Dec 1, 2021 · 12 comments
Open

proposal: math/rand: reconsider lock-free source #49892

oakad opened this issue Dec 1, 2021 · 12 comments
Labels
Milestone

Comments

@oakad
Copy link

@oakad oakad commented Dec 1, 2021

Hereby, I would like to petition to reopen and salvage #18514.
There are several areas where having a global, performant and lock free random source is of very high utility.

  1. Development/testing of concurrent apps/algorithms (and Go is all about concurrency). One of the most potent techniques in this area requires placing random delays all over the code to emulate uneven progress of concurrent goroutines. Present implementation of rand.Int() with a global lock in its path all but negates the usefulness of this approach.

  2. Zero communication "unfair" sharing. Go is commonly used to implement various proxies, load balancers and other similar gadgets, which require "unfair" sharing of resources across multiple instances (example applications include AB testing, canary deployments, various types of hot failovers and so on). This can be efficiently achieved by means of random selection with custom probability distribution and very efficient algorithms were developed for this purpose (such as Walker-Vose O(1) uneven sampling algorithm). Unfortunately, "unfair" selection can only be as performant as the underlying uniform random source is.

  3. "Locally distributed" data structures. A good, simple example of those is Java's java.util.concurrent.atomic.LongAccumulator and friends (however, same technique can be applied to other similarly constructed objects, such as concurrent pools, multiple producer queues, and so on). sync.Pool cunningly uses the private fastrand(), we, the users, want one too! :-)

In most other languages these problems are resolved by means of thread local PRNGs. In my opinion, Go should also expose one, or, rather, make its global PRNG instance behave like one. It does not even requires any changes to the language, because at present users have no ability to affect the private rand.globalRand object in any way and thus can have no preference on what sort of pseudo-random sequence is returned from gloabal rand functions.

@gopherbot gopherbot added this to the Proposal milestone Dec 1, 2021
@mdlayher mdlayher changed the title Proposal: reopen and salvage #18514 (because fast, lock free rand.Int() is a really useful thing) Proposal: reconsider lock-free math/rand source Dec 1, 2021
@mdlayher mdlayher changed the title Proposal: reconsider lock-free math/rand source Proposal: math/rand: reconsider lock-free source Dec 1, 2021
@DeedleFake
Copy link

@DeedleFake DeedleFake commented Dec 1, 2021

Go purposefully has no thread-local storage system, and it even purposefully avoids exposing anything that would make it easy to implement one, like IDs. One could be implemented for something like this, but it would be preferable for people to just instantiate their own rand.Rands and keep track of them themselves, I think.

That being said, as you point out it wouldn't exactly be terrible for the global rand.Rand instance to behave that way. It could be considered a breaking change, however, and especially since it can be relatively easily worked around, I don't think this is likely to get accepted. I could be wrong, though.

@oakad
Copy link
Author

@oakad oakad commented Dec 1, 2021

Of course, I don't want this to be about thread local, it's a long and difficult topic (just mentioned it for the context).

All I personally need is exported and inlineable runtime.Fastrand(), but it may be called rand.Uint32() just the same. Even though having a similar 64b version will be nice too.

The previous proposal was accepted. And not, it's not easy to achieve the same behavior with rand.Rand, you're welcome to try. It's not for the fainthearted though. :-)

@DeedleFake
Copy link

@DeedleFake DeedleFake commented Dec 1, 2021

In that case, while I would certainly not be against exporting fastrand() in some way, it should be noted that you can use a //go:linkname directive to get direct access to it. Maybe the math/rand package can just do something like

//go:linkname Runtime runtime.fastrand
func Runtime() uint32

And I very much know how difficult it is to deal with performance and concurrency with rand.Rand. I had to once and it caused me all sorts of problems. I wound up faking it, as it didn't really matter that much for what I was doing. Speaking of which, I still need to go clean that mess up... I haven't touched it in years.

@oakad
Copy link
Author

@oakad oakad commented Dec 1, 2021

I know about the go:linkname hack. Everybody who's doing the sort of thing I enumerated in my OP does. :-)

My point here that time has come to make it nice and official. In fact, this was the case even back in 2017, pretty unfortunate that the previous proposal was closed, with implementation almost ready and reviewed.

@robpike
Copy link
Contributor

@robpike robpike commented Dec 1, 2021

See #21835

The exp/rand package might help you today.

@cespare
Copy link
Contributor

@cespare cespare commented Dec 1, 2021

Using exp/rand makes it cheaper to create per-goroutine sources, but sometimes it is not convenient to structure the code this way.

I still think the best solution here is to implement #18802 in some form. Then people can create their own CPU-local RNGs easily enough.

Failing that, I think adding a CPU-local RNG in the stdlib would be good. It doesn't have the API concerns of a more general sharding mechanism: the per-CPU-ness isn't user-observable except insofar that the global functions don't see a huge slowdown in highly parallel contexts.

I have an experimental demonstration of one #18802 solution at github.com/cespare/percpu and it includes a wrapper around exp/rand as github.com/cespare/percpu/clrand, in case anyone would like to see how these ideas could look and perform. (But beware that there's a go:linkname hack under the hood.)

@oakad
Copy link
Author

@oakad oakad commented Dec 2, 2021

We can either make sharded values with fast random sources or we can make fast random sources with sharded values. Sort of like chickens and eggs. :-)

@ianlancetaylor ianlancetaylor changed the title Proposal: math/rand: reconsider lock-free source proposal: math/rand: reconsider lock-free source Dec 2, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Dec 2, 2021
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 2, 2021

There are lots of things we could shared on a per-goroutine or per-thread or per-P basis (a P is the entity counted by GOMAXPROCS). For example, we've built magic into sync.Pool to shard it across P's (but that's not so bad since sync.Pool is also magic with respect to garbage collection). It's not clear to me that random numbers are important enough to shard in this away. Or, perhaps the desire to shard them suggests that there should be some general mechanism to support sharding. While there are natural concerns with sharding across goroutines, as that can easily lead to a complicated programming model, sharding across threads or P's is perhaps less bad; the lack of a persistent connection between a goroutine and a thread or P may sufficiently limit the effect on the programming model. Maybe.

@oakad
Copy link
Author

@oakad oakad commented Dec 2, 2021

What's wrong with defining user visible rand.Uint32() in terms of internal runtime.fastrand()? No fancy stuff, just that and call it a day.

This is the de-facto situation today with the linkname thing.

@josharian
Copy link
Contributor

@josharian josharian commented Dec 2, 2021

See also a slightly more general discussion of the interaction between concurrency and reproducibility guarantees at #26263.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 2, 2021

@oakad I think that in effect you are suggesting that random numbers are important enough that we should shard them across P's. Why is that?

(The fact that the current implementation happens to have an existing mechanism for doing that isn't an argument for why it should be done in user space. We shouldn't in general reason from a particular implementation.)

@oakad
Copy link
Author

@oakad oakad commented Dec 3, 2021

(Original comment replaced with this. Sorry.)

I have went on to see what people do on Github in general. Right now, there are several common techniques to obtain fast randoms:

  1. Linkname to runtime.fastrand. Generally, appears to be a favorite approach, but requires unsafe and generally suboptimal, because fastrand is not inlined.
  2. sync.Pool of PRNGs seeded from something. This approach feels totally dubious to me, especially considering how sync.Pool is implemented. Sort of like always driving in reverse.
  3. Fake "thread local" by means of syscall.Gettid. Various approaches possible, but again, a strong feeling of "driving in reverse".
  4. Assembler (rdrand or rdtsc sometimes). We are at the mercy of the hardware platform, especially considering all the evils Intel had done to rdrand (on some CPUs it's actually very slow these days).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Proposals
Incoming
Development

No branches or pull requests

7 participants