Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/rand: use a sharded lock for the default shared Source #20387

Open
ianlancetaylor opened this issue May 17, 2017 · 9 comments

Comments

@ianlancetaylor
Copy link
Contributor

commented May 17, 2017

Benchmarks and performance sensitive code that use the top-level functions in the math/rand package (rand.Intn and so forth) will have hidden lock contention on the default shared Source. This can be a surprise to people, as they see unexpectedly poor performance without realizing clearly that the problem is in their code.. If we had a sharded lock of some sort, this would be a good place to use it.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone May 17, 2017

@bradfitz

This comment has been minimized.

Copy link
Member

commented May 17, 2017

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 17, 2017

Maybe a cheap intermediate fix would be to have lockedSource be (say)

type lockedSource struct {
	n    uint64
	lks  [64]sync.Mutex
	srcs [64]Source64
}

where n is updated with atomic.AddUint64 on each use, and used (%64) as an index into lks and srcs. There's only one of them, so the size increase shouldn't be too bad.

@bradfitz

This comment has been minimized.

Copy link
Member

commented May 17, 2017

Do we care about changing the generated numbers users see? I thought we did, but I might be thinking of something else.

@cespare

This comment has been minimized.

Copy link
Contributor

commented May 17, 2017

@bradfitz yeah I thought we did. #13215 is one example; I think there are others I can't find as well.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 17, 2017

There have been a slew of cases (#12290, #16124, #8731). I have been unable to discern a coherent explanation for when we can change it and when we can't. Discussion in #8013 and #14416 seems to indicate that none will be forthcoming soon. (I'd still like to see a "big break" release in which we fix a bunch of bugs and inefficiencies that have been blocked by this constraint, including using a much better PRNG, and publish a backwards-compat golang.org/x package usable by those that still really need the original stream.)

However, I think with some care and extra overhead I could write this such that the random stream is unaltered when not called concurrently. When called concurrently, all bets for reproducibility are off anyway, and that's the case in which we'd want to shard locks. I'll give it a try.

@gopherbot

This comment has been minimized.

Copy link

commented May 17, 2017

CL https://golang.org/cl/43611 mentions this issue.

@josharian

This comment has been minimized.

Copy link
Contributor

commented May 17, 2017

With some brain-hurting/scary atomic fairy dust, I managed to get these performance numbers for "only shard locks when concurrent":

name                        old time/op  new time/op  delta
Int63Threadsafe             23.4ns ± 5%  35.9ns ± 9%   +52.92%  (p=0.000 n=10+10)
Int63Threadsafe-2           22.2ns ±13%  34.9ns ± 3%   +57.08%  (p=0.000 n=10+10)
Int63Threadsafe-4           20.9ns ±16%  34.9ns ± 2%   +67.26%  (p=0.000 n=10+10)
Int63Threadsafe-8           20.1ns ± 2%  34.9ns ± 1%   +74.10%  (p=0.000 n=8+8)
Int63Threadsafe-16          21.2ns ±18%  34.9ns ± 2%   +64.15%  (p=0.000 n=10+10)
Int63Threadsafe-32          20.9ns ± 2%  34.8ns ± 1%   +66.45%  (p=0.000 n=9+9)
Int63Threadsafe-64          22.1ns ±15%  34.6ns ± 1%   +56.38%  (p=0.000 n=10+9)
Int63ThreadsafeParallel     21.2ns ± 3%  35.2ns ± 1%   +65.65%  (p=0.000 n=10+8)
Int63ThreadsafeParallel-2   28.1ns ± 2%  66.3ns ± 4%  +135.54%  (p=0.000 n=10+10)
Int63ThreadsafeParallel-4   45.9ns ± 1%  43.9ns ± 1%    -4.31%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-8   60.1ns ± 2%  34.1ns ± 4%   -43.23%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-16  70.4ns ± 2%  33.9ns ± 3%   -51.75%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-32  78.3ns ±17%  33.5ns ± 3%   -57.18%  (p=0.000 n=10+10)
Int63ThreadsafeParallel-64   105ns ± 5%    33ns ± 1%   -68.63%  (p=0.000 n=10+9)

I'm not sure that this is worth it. If others disagree, I'm happy to do a documentation pass over the CL to make it reviewable.

@valyala

This comment has been minimized.

Copy link
Contributor

commented May 17, 2017

IMHO, the current behaviour of math/rand package shouldn't be changed, since certain users may depend on it. It would better adding new package like math/fastrand that is optimized for performance and scales on multiple CPU cores. See this package as an example.

@cespare

This comment has been minimized.

Copy link
Contributor

commented May 17, 2017

@valyala not changing the behavior for users that depend on it is exactly what @josharian's patch attempts to do.

josharian added a commit to josharian/go that referenced this issue May 23, 2017
math/rand: shard locks for global locked source
DO NOT REVIEW

[needs careful docs, not sure we want to do it]

demo for golang#20387

name                        old time/op  new time/op  delta
Int63Threadsafe             23.4ns ± 5%  35.9ns ± 9%   +52.92%  (p=0.000 n=10+10)
Int63Threadsafe-2           22.2ns ±13%  34.9ns ± 3%   +57.08%  (p=0.000 n=10+10)
Int63Threadsafe-4           20.9ns ±16%  34.9ns ± 2%   +67.26%  (p=0.000 n=10+10)
Int63Threadsafe-8           20.1ns ± 2%  34.9ns ± 1%   +74.10%  (p=0.000 n=8+8)
Int63Threadsafe-16          21.2ns ±18%  34.9ns ± 2%   +64.15%  (p=0.000 n=10+10)
Int63Threadsafe-32          20.9ns ± 2%  34.8ns ± 1%   +66.45%  (p=0.000 n=9+9)
Int63Threadsafe-64          22.1ns ±15%  34.6ns ± 1%   +56.38%  (p=0.000 n=10+9)
Int63ThreadsafeParallel     21.2ns ± 3%  35.2ns ± 1%   +65.65%  (p=0.000 n=10+8)
Int63ThreadsafeParallel-2   28.1ns ± 2%  66.3ns ± 4%  +135.54%  (p=0.000 n=10+10)
Int63ThreadsafeParallel-4   45.9ns ± 1%  43.9ns ± 1%    -4.31%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-8   60.1ns ± 2%  34.1ns ± 4%   -43.23%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-16  70.4ns ± 2%  33.9ns ± 3%   -51.75%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-32  78.3ns ±17%  33.5ns ± 3%   -57.18%  (p=0.000 n=10+10)
Int63ThreadsafeParallel-64   105ns ± 5%    33ns ± 1%   -68.63%  (p=0.000 n=10+9)

Change-Id: I02f036c4c80e41df3065446be36840992b1c978e
josharian added a commit to josharian/go that referenced this issue May 8, 2018
math/rand: shard locks for global locked source
DO NOT REVIEW

[needs careful docs, not sure we want to do it]

demo for golang#20387

name                        old time/op  new time/op  delta
Int63Threadsafe             23.4ns ± 5%  35.9ns ± 9%   +52.92%  (p=0.000 n=10+10)
Int63Threadsafe-2           22.2ns ±13%  34.9ns ± 3%   +57.08%  (p=0.000 n=10+10)
Int63Threadsafe-4           20.9ns ±16%  34.9ns ± 2%   +67.26%  (p=0.000 n=10+10)
Int63Threadsafe-8           20.1ns ± 2%  34.9ns ± 1%   +74.10%  (p=0.000 n=8+8)
Int63Threadsafe-16          21.2ns ±18%  34.9ns ± 2%   +64.15%  (p=0.000 n=10+10)
Int63Threadsafe-32          20.9ns ± 2%  34.8ns ± 1%   +66.45%  (p=0.000 n=9+9)
Int63Threadsafe-64          22.1ns ±15%  34.6ns ± 1%   +56.38%  (p=0.000 n=10+9)
Int63ThreadsafeParallel     21.2ns ± 3%  35.2ns ± 1%   +65.65%  (p=0.000 n=10+8)
Int63ThreadsafeParallel-2   28.1ns ± 2%  66.3ns ± 4%  +135.54%  (p=0.000 n=10+10)
Int63ThreadsafeParallel-4   45.9ns ± 1%  43.9ns ± 1%    -4.31%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-8   60.1ns ± 2%  34.1ns ± 4%   -43.23%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-16  70.4ns ± 2%  33.9ns ± 3%   -51.75%  (p=0.000 n=9+10)
Int63ThreadsafeParallel-32  78.3ns ±17%  33.5ns ± 3%   -57.18%  (p=0.000 n=10+10)
Int63ThreadsafeParallel-64   105ns ± 5%    33ns ± 1%   -68.63%  (p=0.000 n=10+9)

Change-Id: I02f036c4c80e41df3065446be36840992b1c978e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.