New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: select is not fair #21806

Closed
bench opened this Issue Sep 8, 2017 · 33 comments

Comments

Projects
None yet
9 participants
@bench

bench commented Sep 8, 2017

What version of Go are you using (go version)?

$ go version
go version go1.9 linux/amd64

Does this issue reproduce with the latest release?

yes, on 1.9

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/bchenebault/DEV/git-oab.si.fr.intraorange/BU900102/ceo-be"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build913166031=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

A Go select statement with reflect package function reflect.Select(...) should block until at least one of the cases can proceed and make a uniform pseudo-random choice. It appears that the behavior has changed since returned values are no longer pseudo-random in specific cases only.

Our case is the following : 2 signaling channels preceding 2 "logic" channels
https://play.golang.org/p/0TiMtsRk03

But the same version with only one channel preceding the 2 others works
https://play.golang.org/p/sy4RjEEjdg

This case perfectly works on go1.8.3 and below.

What did you expect to see?

A pseudo random distribution

What did you see instead?

A strange distribution, surely not pseudo random

@bench bench changed the title from reflect.select() no longer pseudo-random choices to reflect.Select(...) no longer pseudo-random choices Sep 8, 2017

@bench bench changed the title from reflect.Select(...) no longer pseudo-random choices to reflect.Select(...) no longer pseudo-random selects Sep 8, 2017

@ianlancetaylor ianlancetaylor changed the title from reflect.Select(...) no longer pseudo-random selects to runtime: reflect.Select(...) no longer pseudo-random selects Sep 8, 2017

@ianlancetaylor ianlancetaylor changed the title from runtime: reflect.Select(...) no longer pseudo-random selects to runtime: select is not fair Sep 8, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Sep 8, 2017

This is not just using reflect. This similar program shows the same problem using an ordinary select statement.

package main

import (
	"fmt"
)

func main() {
	in1 := make(chan int, 15000)
	in2 := make(chan int, 15000)

	for i := 0; i < 15000; i++ {
		in1 <- 1
	}

	for i := 0; i < 15000; i++ {
		in2 <- 2
	}
	out := make(chan int)

	go func() {
		c1 := make(chan struct{})
		c2 := make(chan struct{})
		for {
			var v int
			select {
			case <-c1:
			case <-c2:
			case v = <-in2:
			case v = <-in1:
			}
			out <- v
		}
	}()
	
	var nb1, nb2 int = 0, 0

ConsumerLoop:
	for {
		select {
		case d := <-out:
			switch d {
			case 1:
				nb1++
				if nb1 > 5000 { // stop consume. At this point, messages should be the equidistribution
					break ConsumerLoop
				}
			case 2:
				nb2++
			default:
				panic(d)
			}
		}
	}

	print(nb1, "\n")
	print(nb2, "\n")

	if !areSameOrder(nb1, nb2) {
		fmt.Printf("Should have as many nb1 as nb2, got %d nb1 and %d nb2\n", nb1, nb2)
	}
}

func areSameOrder(a, b int) bool {
	ratio := float32(a) / float32(b)
	return ratio < 1.3 && ratio > 0.7
}

@ianlancetaylor ianlancetaylor added this to the Go1.9.1 milestone Sep 8, 2017

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Sep 8, 2017

It seems to be a problem with fastrandn. If I replace this line in runtime/select.go:

		j := fastrandn(uint32(i + 1))

with the line that was in 1.8:

		j := int(fastrand()) % (i + 1)

then the program runs correctly.

CC @josharian

@go101

This comment has been minimized.

go101 commented Sep 8, 2017

also not fair for

			select {
			case v = <-in2:
			case v = <-in1:
			case v = <-in2:
			case v = <-in1:
			}

the ratio is constantly larger than 1.08.

The ratio for the following is even worse, about 2.0

			select {
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in1:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			case v = <-in2:
			}
@randall77

This comment has been minimized.

Contributor

randall77 commented Sep 8, 2017

The problem is one of correlation.

package main

import "fmt"

var state uint32 = 1

func fastrand() uint32 {
	fr := state
	mx := uint32(int32(fr)>>31) & 0xa8888eef
	fr = fr<<1 ^ mx
	state = fr
	return fr
}

func fastrandn(n uint32) uint32 {
	return uint32(uint64(fastrand()) * uint64(n) >> 32)
}

func main() {
	const N = 4
	var hist [N][N]int
	for i := 0; i < 1000000; i++ {
		x := fastrandn(N)
		y := fastrandn(N)
		hist[x][y]++
	}
	for i := 0; i < N; i++ {
		for j := 0; j < N; j++ {
			fmt.Printf("%6d ", hist[i][j])
		}
		fmt.Println()
	}
}

Outputs:

124831 124610      0      0 
     0      0 125374 125232 
     0      0 124870 124900 
125083 125100      0      0 

So if we get a 0 from fastrandn(4), the next output is guaranteed to be a 0 or a 1.
I think this is fundamentally a problem with this RNG when N is a power of two, because the values we return are highly predictive of whether we end up xoring in the feedback or not.

Maybe if we did the feedback in the other direction (shift right, xor based on the low bit), it would help. Or maybe we have to go back to using modulo so we can depend on the low bits.

@randall77

This comment has been minimized.

Contributor

randall77 commented Sep 8, 2017

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 8, 2017

@randall77 looks like it is flaw of LFSR. Old version were not better:

var state uint32 = 1
func fastrand() uint32 {
    fr := state
    fr <<= 1
    fr ^= uint32(int32(fr)>>31) & 0x88888eef
    state = fr
    return fr
}                                                    
func fastrandn(n uint32) uint32 {
        return uint32(uint64(fastrand()) * uint64(n) >> 31)
}
125097 124561      0      0
     0      0 124738 125274
125113 125123      0      0
     0      0 124721 125373

I'm thinking what could be done with.

@rasky

This comment has been minimized.

Member

rasky commented Sep 8, 2017

What about using xorshift instead?

var state uint32 = 2463534242

func fastrand() uint32 {
	fr := state
	fr ^= fr<<13
	fr ^= fr>>17
	fr ^= fr<<5
	state = fr
	return fr
}

Outputs:

 62797  62420  62588  62303 
 62401  62408  61990  62399 
 62207  62601  62742  62515 
 62916  62716  62655  62342 

(https://play.golang.org/p/sD9jmKGWJy)

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 8, 2017

@rasky , xorshift is slower. Otherwise it is good because it gives almost same randomness for high and low bits.

I have other variant for fastrandn:

func fastrandn(n uint32) uint32 {
    // some constants with random bit pattern
    // (this are from hash32.go)
    const m1 = 3168982561
    const m2 = 3339683297
    fr := fastrand()
    fr ^= m1
    fr *= m2
    return uint32(uint64(fr) * uint64(n) >> 32)
}

It also gives fair distribution

100192 100322  99857 100461
 99535 100347 100216 100209
 99533 100023  99899 100376
 99678  99796  99982  99574

And it is faster than xorshift.
But it uses multiplication, and doesn't fix cases when raw fastrand used.

https://play.golang.org/p/29n_4OfVQq

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 8, 2017

Benchmark:

  • master
$ GOMAXPROCS=1 ../bin/go test runtime -run 'foo' -bench '.*astrand.*'
BenchmarkFastrand               1000000000               2.35 ns/op
BenchmarkFastrandHashiter       50000000                27.3 ns/op
BenchmarkFastrandn/2            1000000000               2.39 ns/op
BenchmarkFastrandn/3            1000000000               2.39 ns/op
BenchmarkFastrandn/4            1000000000               2.39 ns/op
BenchmarkFastrandn/5            1000000000               2.39 ns/op
  • xorshift
BenchmarkFastrand               500000000                3.16 ns/op
BenchmarkFastrandHashiter       50000000                33.9 ns/op
BenchmarkFastrandn/2            500000000                3.34 ns/op
BenchmarkFastrandn/3            500000000                3.34 ns/op
BenchmarkFastrandn/4            500000000                3.34 ns/op
BenchmarkFastrandn/5            500000000                3.34 ns/op
  • fastrand in this way:
func fastrand() uint32 {
    mp := getg().m
    fr := mp.fastrand
    mx := uint32(int32(fr)>>31) & 0xa8888eef
    fr = fr<<1 ^ mx
    mp.fastrand = fr
    fr ^= 3168982561
    fr *= 3339683297
    return fr ^ (fr >> 16)
}
BenchmarkFastrand               1000000000               2.34 ns/op
BenchmarkFastrandHashiter       50000000                34.3 ns/op
BenchmarkFastrandn/2            1000000000               2.56 ns/op
BenchmarkFastrandn/3            1000000000               2.56 ns/op
BenchmarkFastrandn/4            1000000000               2.56 ns/op
BenchmarkFastrandn/5            1000000000               2.56 ns/op
@rasky

This comment has been minimized.

Member

rasky commented Sep 8, 2017

Is 1ns slower important for fastrand? Does it ever use a noticeable amount of cpu in a hot loop? Xorshift is among the fastest with good properties; you can come up with endless number of faster prngs with worse properties, but I fear we might end up in a similar bug some day.

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 8, 2017

My variant is definitely good for all of fastrand intended uses. And it certainly fixes fastrandn.
If you want, I'll pass it through dieharder, and compare with xorshift.

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 8, 2017

I think, performance should be measured on all supported platforms. Something that is faster on intel, could be slower on Power or Arm.

@rasky

This comment has been minimized.

Member

rasky commented Sep 8, 2017

Yes, xorshift on ARM is basically 3 instructions thanks to barrel shift, hard to beat. I still think that 1ns in fastrand is not going to have any real world impact, though.

@rasky

This comment has been minimized.

Member

rasky commented Sep 9, 2017

By adapting xorshift+ to 32-bit integers, using a triplet picked up by the original xorshift paper, I tested this:

func fastrand_xorshiftplus() uint32 {
        // WARNING: untested for statistical properties
	s0, s1 := state1, state0
	s1 ^= s1 << 5
	s1 = s1 ^ s0 ^ (s1 >> 17) ^ (s0 >> 13)
	state0, state1 = s0, s1
	return s0 + s1
}

which is faster than the original fastrand on my i7 skylake:

BenchmarkOrig-4           	1000000000	         2.18 ns/op
BenchmarkXorShift-4       	1000000000	         2.89 ns/op
BenchmarkXorShiftPlus-4   	2000000000	         1.94 ns/op

Notice that the triplet choice can be totally wrong, I've just used it to test speed, but we should probably try BigCrush on it, and maybe select a different triplet.

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 9, 2017

I supposed that 2*32bit version will be faster. But will fastrand increase it's state from 1*32bit to 2*32bit?
I hope, yes.
Why you didn't take triplet from xorshift pdf ? https://www.jstatsoft.org/article/view/v008i14/xorshift.pdf
For n = 64, a, b, c = [10, 13, 10], [8, 9, 22], [2, 7, 3], [23, 3, 24]

s1 ^= s1 << a
s1 ^= s0 ^ (s1 >> b) ^ (s0 >> c)
@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 9, 2017

Looks like tripple [10, 13, 10] doesn't pass diehard. Probably, it was mistaken. Triplet [10, 10, 13] passes, but could not be trusted.

@rasky

This comment has been minimized.

Member

rasky commented Sep 9, 2017

I did take it from xorshift's pdf: [5,17,13] is one of them, valid for 32-bit numbers. The period of this xorshift+ 32-bit version (with two 32-bit words of state) should be 2^64-1.

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 9, 2017

5,17,13 - is for 1*32 bit generator.
Look at "3.1 Binary vector spaces of dimension n = 96, 128, 160 . . .." in that pdf

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 9, 2017

shifts for 1*32bit version does not produce full period for 2*32bit generator. I know what I'm talking about, please believe me.

@rasky

This comment has been minimized.

Member

rasky commented Sep 9, 2017

Yes, I see, you're right.

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 9, 2017

I had an experince with running xorshift framework to generate all correct triplets.
It is not too hard.
At least, pdf tripplets should be checked, cause [10,13,10] is cleanly wrong (it has quite short period with seed 1,0).

@rasky

This comment has been minimized.

Member

rasky commented Sep 9, 2017

I tried [23,3,24], it passes SmallCrush but not Crush:

       Test                          p-value
 ----------------------------------------------
  1  SerialOver, t = 2                eps
  2  SerialOver, t = 4                eps
  6  CollisionOver, t = 4           2.1e-25
  7  CollisionOver, t = 8           7.1e-63
 12  BirthdaySpacings, t = 3       1.2e-273
 13  BirthdaySpacings, t = 4        8.3e-26
 15  BirthdaySpacings, t = 7          eps
 16  BirthdaySpacings, t = 8          eps
 20  ClosePairs NP, t = 7            4.9e-7
 20  ClosePairs mNP, t = 7          1.9e-27
 20  ClosePairs mNP1, t = 7         7.1e-30
 20  ClosePairs mNP2, t = 7         1.1e-10
 20  ClosePairs NJumps, t = 7       3.1e-34
 36  Run of U01, r = 15               eps
 38  Permutation, r = 15              eps
 65  RandomWalk1 H (L = 90)         3.3e-16
 65  RandomWalk1 M (L = 90)          1.5e-5
 72  LinearComp, r = 29             1 - eps1
 87  HammingIndep, L = 300           2.1e-4
 ----------------------------------------------
 All other tests were passed

(full log)

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 9, 2017

I was mistaken for [10,13,10] - i got error in my test program.
So triplet from pdf could be trusted.
I vote for [10,13,10] triplet.
All triplets for xorshift64 using 2*32bit state found by xorshift framework
You see: it contains all example triplets from pdf.
And I think, there is no need in xorshift+, because xorshift is already enough for fastrand use case.

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Sep 9, 2017

With xorshift64 (2*32) I got following (number differs from test above cause it is other computer):
master

goos: linux
goarch: amd64
pkg: runtime
BenchmarkFastrand         	500000000	         3.86 ns/op
BenchmarkFastrandHashiter 	30000000	        43.0 ns/op
BenchmarkFastrandn/2      	500000000	         3.72 ns/op
BenchmarkFastrandn/3      	500000000	         3.73 ns/op
BenchmarkFastrandn/4      	500000000	         3.72 ns/op
BenchmarkFastrandn/5      	500000000	         3.73 ns/op
PASS
ok  	runtime	12.628s

xorshift

goos: linux
goarch: amd64
pkg: runtime
BenchmarkFastrand         	500000000	         3.73 ns/op
BenchmarkFastrandHashiter 	30000000	        53.1 ns/op
BenchmarkFastrandn/2      	500000000	         3.89 ns/op
BenchmarkFastrandn/3      	500000000	         3.88 ns/op
BenchmarkFastrandn/4      	500000000	         3.89 ns/op
BenchmarkFastrandn/5      	500000000	         3.88 ns/op
PASS
ok  	runtime	13.250s

xorshift64+ is just a bit slower than xorshift64, but I don't think we need that "extra quality".

I leave priority to open changset to @rasky.

@gopherbot

This comment has been minimized.

gopherbot commented Sep 9, 2017

Change https://golang.org/cl/62530 mentions this issue: runtime: improve fastrand with a better generator

@gopherbot gopherbot closed this in e7e4a4f Sep 16, 2017

@mundaym

This comment has been minimized.

Member

mundaym commented Sep 16, 2017

Reopening for backport to 1.9.1.

@mundaym mundaym reopened this Sep 16, 2017

@rasky

This comment has been minimized.

Member

rasky commented Sep 16, 2017

@ianlancetaylor @randall77 should I backport the patch as-is or you prefer a simpler version (and if so, what would you simplify)?

@randall77

This comment has been minimized.

Contributor

randall77 commented Sep 16, 2017

I like the CL as is.

@gopherbot

This comment has been minimized.

gopherbot commented Sep 17, 2017

Change https://golang.org/cl/64193 mentions this issue: [release-branch.go1.9] runtime: improve fastrand with a better generator

@funny-falcon

This comment has been minimized.

Contributor

funny-falcon commented Oct 4, 2017

@rsc rsc modified the milestones: Go1.9.1, Go1.9.2 Oct 4, 2017

@gopherbot

This comment has been minimized.

gopherbot commented Oct 4, 2017

Change https://golang.org/cl/68050 mentions this issue: runtime: fix using fastrand in sema.go

gopherbot pushed a commit that referenced this issue Oct 4, 2017

runtime: fix using fastrand in sema.go
Before CL 62530 fastrand always returned non-zero value, and one
condition in sema.go depends on this behavior.

fastrand is used to generate random weight for treap of sudog, and
it is checked against zero to verify sudog were inserted into treap or
wait queue.

Since its precision is not very important for correctness, lets just
always set its lowest bit in this place.

Updates #22047
Updates #21806

Change-Id: Iba0b56d81054e6ef9c49ffd293fc5d92a6a31e9b
Reviewed-on: https://go-review.googlesource.com/68050
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc

This comment has been minimized.

Contributor

rsc commented Oct 13, 2017

If we cherry-pick CL 64193 then we must also cherry-pick CL 68050, because CL 64193 breaks a different property of fastrand.

I'm pretty skeptical of wholesale-replacing the random number generator and then putting it into a point release when we just found a serious problem with it 9 days ago.

Maybe for Go 1.9.2 we should just fix fastrandn to use a % again. The move away from % got this performance win:

    name                    old time/op  new time/op  delta
    SelectUncontended-8     33.7ns ± 2%  33.9ns ± 2%  +0.70%  (p=0.000 n=49+50)
    SelectSyncContended-8   1.68µs ± 4%  1.65µs ± 4%  -1.54%  (p=0.000 n=50+45)
    SelectAsyncContended-8   282ns ± 1%   277ns ± 1%  -1.50%  (p=0.000 n=48+43)
    SelectNonblock-8        5.31ns ± 1%  5.32ns ± 1%    ~     (p=0.275 n=45+44)
    SelectProdCons-8         585ns ± 3%   577ns ± 2%  -1.35%  (p=0.000 n=50+50)
    GoroutineSelect-8       1.59ms ± 2%  1.59ms ± 1%    ~     (p=0.084 n=49+48)

I'm happy to give that back for Go 1.9.2. The uses of fastrandn elsewhere in the runtime also look like they would appreciate not having terrible correlations, so I don't think we should change only select's use.

@rsc

This comment has been minimized.

Contributor

rsc commented Oct 13, 2017

Closing this and remilestoning Go 1.10. The fix for Go 1.9.2 is #22253.

@rsc rsc closed this Oct 13, 2017

@rsc rsc modified the milestones: Go1.9.2, Go1.10 Oct 13, 2017

@rsc rsc removed the release-blocker label Oct 13, 2017

@golang golang locked and limited conversation to collaborators Oct 13, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.