Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: select: fatal error: index out of range #40397

Open
prestonvanloon opened this issue Jul 24, 2020 · 10 comments
Open

runtime: select: fatal error: index out of range #40397

prestonvanloon opened this issue Jul 24, 2020 · 10 comments
Milestone

Comments

@prestonvanloon
Copy link

@prestonvanloon prestonvanloon commented Jul 24, 2020

What version of Go are you using (go version)?

I'm using bazel rules_go release v0.22.6 which is using go 1.14.2.

Does this issue reproduce with the latest release?

I'm not sure, this issue is not consistently reproducible.

What operating system and processor architecture are you using (go env)?

linux_amd64

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/preston/.cache/go-build"
GOENV="/home/preston/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/preston/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/6123"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/6123/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build083098668=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I was running llvm libfuzzer tests and saw this testcase appear. Binaries compiled with gc_goopts=-d=libfuzzer and linked with address sanitizer.

What did you expect to see?

No panics from the core select logic.

What did you see instead?

An index out of range here

c0 := scases[o].c

Full stack error below

fatal error: index out of range
--
  |  
  | goroutine 83 [running]:
  | runtime.throw(0x28d934, 0x12)
  | GOROOT/src/runtime/panic.go:1116 +0x74 fp=0x10c00007b910 sp=0x10c00007b8e0 pc=0x883a74
  | runtime.panicCheck1(0x8951ed, 0x28d934, 0x12)
  | GOROOT/src/runtime/panic.go:34 +0xdd fp=0x10c00007b940 sp=0x10c00007b910 pc=0x88125d
  | runtime.goPanicIndex(0x3d48, 0x7)
  | GOROOT/src/runtime/panic.go:87 +0x46 fp=0x10c00007b988 sp=0x10c00007b940 pc=0x881326
  | runtime.sellock(0x10c00007be90, 0x7, 0x7, 0x7f60c6d2c876, 0x7, 0x7)
  | GOROOT/src/runtime/select.go:48 +0xad fp=0x10c00007b9b8 sp=0x10c00007b988 pc=0x8951ed
  | runtime.selectgo(0x10c00007be90, 0x10c00007bb64, 0x7, 0x5, 0x8fb701)
  | GOROOT/src/runtime/select.go:319 +0xcc2 fp=0x10c00007bae0 sp=0x10c00007b9b8 pc=0x896092
  | github.com/ethereum/go-ethereum/consensus/ethash.(*remoteSealer).loop(0x10c000012f00)
  | external/com_github_ethereum_go_ethereum/consensus/ethash/sealer.go:278 +0x25a fp=0x10c00007bfd8 sp=0x10c00007bae0 pc=0x16098fa
  | runtime.goexit()
  | src/runtime/asm_amd64.s:1373 +0x1 fp=0x10c00007bfe0 sp=0x10c00007bfd8 pc=0x8b4ff1
  | created by github.com/ethereum/go-ethereum/consensus/ethash.startRemoteSealer
  | external/com_github_ethereum_go_ethereum/consensus/ethash/sealer.go:262 +0x2b0

@ianlancetaylor ianlancetaylor changed the title Select: fatal error: index out of range runtime: select: fatal error: index out of range Jul 24, 2020
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2020

Can you run the tests under the race detector?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2020

@ianlancetaylor ianlancetaylor added this to the Go1.16 milestone Jul 24, 2020
@prestonvanloon
Copy link
Author

@prestonvanloon prestonvanloon commented Jul 24, 2020

I tried recompiling these tests with race detection, but it failed to link due to duplicate symbols.
I'll fiddle around with it for a bit today.

Does race detection work with gc_goopts=-d=libfuzzer?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jul 24, 2020

It should work to run go build -gcflags=-d=libfuzzer -race. But I don't know about Bazel.

Can you use the fuzzer to find a problematic case and then run that case without -d=libfuzzer?

@josharian
Copy link
Contributor

@josharian josharian commented Jul 24, 2020

In theory this could also be caused by the libfuzzer instrumentation. cc @mdempsky just in case

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Jul 24, 2020

libfuzzer instrumentation is turned off when compiling the runtime, so I'd be really surprised if it was responsible for this.

go-ethereum is using reflect.SliceHeader incorrectly. For example, here: https://github.com/ethereum/go-ethereum/blob/master/consensus/ethash/algorithm.go#L154

It's only safe to use *reflect.SliceHeader as a pointer to an actual slice-typed variable. You should never have a value or variable of type reflect.SliceHeader.

Spot checking, I don't immediately see how this could be the fault, but I suggest fixing them, and then also using -d checkptr when fuzzing to enable runtime validation of unsafe.Pointer usage, and let us know if you still see the failure after that.

@mdempsky
Copy link
Member

@mdempsky mdempsky commented Jul 24, 2020

Why doesn't this code need to initialize pollorder[0] = 0?

go/src/runtime/select.go

Lines 158 to 163 in 074f2d8

// generate permuted order
for i := 1; i < ncases; i++ {
j := fastrandn(uint32(i + 1))
pollorder[i] = pollorder[j]
pollorder[j] = uint16(i)
}

If select is called in a loop and the compiler is able to reuse the pollorder array, isn't there a risk that pollorder[0] could be out of range? (Using select in a loop is hardly a rare usage, so I feel like I must be missing something obvious. Otherwise, surely this would fail super commonly?)

Edit: Nevermind, it's zero initialized by the compiler:

order := temp(types.NewArray(types.Types[TUINT16], 2*int64(n)))
r = nod(OAS, order, nil)

@prestonvanloon
Copy link
Author

@prestonvanloon prestonvanloon commented Jul 24, 2020

Running fuzzer with -d checkptr now. I'll circle back here in a few hours to see if any new testcases appear.

@prestonvanloon
Copy link
Author

@prestonvanloon prestonvanloon commented Jul 24, 2020

In an older build (without checkptr), I just saw this issue again.

goroutine 7 [running]:
--
  | runtime.throw(0x248753, 0x12)
  | GOROOT/src/runtime/panic.go:1116 +0x74 fp=0x10c000078cf8 sp=0x10c000078cc8 pc=0x48e174
  | runtime.panicCheck1(0x49f3ed, 0x248753, 0x12)
  | GOROOT/src/runtime/panic.go:34 +0xdd fp=0x10c000078d28 sp=0x10c000078cf8 pc=0x48b9cd
  | runtime.goPanicIndex(0x6170, 0x3)
  | GOROOT/src/runtime/panic.go:87 +0x46 fp=0x10c000078d70 sp=0x10c000078d28 pc=0x48ba96
  | runtime.sellock(0x10c000078f50, 0x3, 0x3, 0x10c000078f2a, 0x3, 0x3)
  | GOROOT/src/runtime/select.go:48 +0xad fp=0x10c000078da0 sp=0x10c000078d70 pc=0x49f3ed
  | runtime.selectgo(0x10c000078f50, 0x10c000078f24, 0x3, 0x10c000078f30, 0x7ffc34fdfc62)
  | GOROOT/src/runtime/select.go:319 +0xcc2 fp=0x10c000078ec8 sp=0x10c000078da0 pc=0x4a0242
  | github.com/dgraph-io/ristretto.(*Cache).processItems(0x10c000092720)
  | external/com_github_dgraph_io_ristretto/cache.go:299 +0x248 fp=0x10c000078fd8 sp=0x10c000078ec8 pc=0xa1b498
  | runtime.goexit()
  | src/runtime/asm_amd64.s:1373 +0x1 fp=0x10c000078fe0 sp=0x10c000078fd8 pc=0x4bd671
  | created by github.com/dgraph-io/ristretto.NewCache
  | external/com_github_dgraph_io_ristretto/cache.go:162 +0x2d7
@mdempsky
Copy link
Member

@mdempsky mdempsky commented Jul 27, 2020

Notably, both of these failures happen when the channel locks are re-acquired after sleeping. So the first call to sellock(scases, lockorder) succeeded, and only the second one is failing. We can also see from the traceback that len(scases) is correct in both cases. So it seems like the contents of lockorder must be getting corrupted while the goroutine is parked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.