Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: found bad pointer in Go heap #26243

Closed
wgliang opened this Issue Jul 6, 2018 · 21 comments

Comments

Projects
None yet
10 participants
@wgliang
Copy link
Contributor

wgliang commented Jul 6, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10 linux/amd64

Does this issue reproduce with the latest release?

I'm not sure.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/data/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build690544410=/tmp/go-build -gno-record-gcc-switches"

What did you do?

On a test machine, from time to time crash, the problem was not found elsewhere. This test machine is a virtual machine.

runtime: pointer 0xc423d9fe30 to unallocated span idx=0x1ecf span.base()=0xc423d9e000 span.limit=0xc423da6000 span.state=3
runtime: found in object at *(0xc424cd7480+0x0)
object=0xc424cd7480 k=0x621266b s.base()=0xc424cd6000 s.limit=0xc424cd8000 s.spanclass=6 s.elemsize=32 s.state=_MSpanInUse
*(object+0) = 0xc423d9fe30 <==
*(object+8) = 0x1f
*(object+16) = 0x1f
*(object+24) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw(0xf4d6b0, 0x3e)
/usr/local/go/src/runtime/panic.go:619 +0x81 fp=0x7ffce1cc2e88 sp=0x7ffce1cc2e68 pc=0x42cd81
runtime.heapBitsForObject(0xc423d9fe30, 0xc424cd7480, 0x0, 0xc41fd9945b, 0xc400000000, 0x7fdf0407c3f0, 0xc420047c70, 0xa4)
/usr/local/go/src/runtime/mbitmap.go:425 +0x473 fp=0x7ffce1cc2ee0 sp=0x7ffce1cc2e88 pc=0x414293
runtime.scanobject(0xc424cd7480, 0xc420047c70)
/usr/local/go/src/runtime/mgcmark.go:1209 +0x251 fp=0x7ffce1cc2f88 sp=0x7ffce1cc2ee0 pc=0x41f551
runtime.gcDrain(0xc420047c70, 0xd)
/usr/local/go/src/runtime/mgcmark.go:965 +0x237 fp=0x7ffce1cc2fe0 sp=0x7ffce1cc2f88 pc=0x41ed37
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1865 +0x187 fp=0x7ffce1cc3020 sp=0x7ffce1cc2fe0 pc=0x457107
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:409 +0x79 fp=0x7ffce1cc3028 sp=0x7ffce1cc3020 pc=0x459499
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1170 fp=0x7ffce1cc3030 sp=0x7ffce1cc3028 pc=0x431440

What did you expect to see?

What did you see instead?

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Jul 6, 2018

Which version of CentOS are you running? Which kernel version?

This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector .

@wgliang

This comment has been minimized.

Copy link
Contributor Author

wgliang commented Jul 6, 2018

@davecheney

CentOS Linux release 7.4.1708 (Core)

Linux version 3.10.0-693.21.1.el7.x86_64
and I will running program under the race detector.

@wgliang wgliang closed this Jul 6, 2018

@wgliang wgliang reopened this Jul 6, 2018

@wgliang

This comment has been minimized.

Copy link
Contributor Author

wgliang commented Jul 6, 2018

When I running program under the race detector there nothing output.

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jul 10, 2018

Are you using cgo or unsafe?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Jul 11, 2018

Is there a way that we can reproduce the problem ourselves?

@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Jul 11, 2018

@wgliang

This comment has been minimized.

Copy link
Contributor Author

wgliang commented Jul 18, 2018

Yes, sometimes.

(0x102cea0,0xc420628720)
runtime: pointer 0xc420b41e30 to unused region of span idx=0x5a0 span.base()=0xc420924000 span.limit=0xc420927f00 span.state=1
runtime: found in object at *(0xc4203a8160+0x0)
object=0xc4203a8160 k=0x62101d4 s.base()=0xc4203a8000 s.limit=0xc4203aa000 s.spanclass=6 s.elemsize=32 s.state=_MSpanInUse
*(object+0) = 0xc420b41e30 <==
*(object+8) = 0x2
*(object+16) = 0x2
*(object+24) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw(0xf89780, 0x3e)
/usr/local/go/src/runtime/panic.go:619 +0x81 fp=0x7f3dbbffed00 sp=0x7f3dbbffece0 pc=0x42cd81
runtime.heapBitsForObject(0xc420b41e30, 0xc4203a8160, 0x0, 0xc41ffe2bf4, 0xc400000000, 0x7f3dcd62f540, 0xc420047c70, 0xb)
/usr/local/go/src/runtime/mbitmap.go:425 +0x473 fp=0x7f3dbbffed58 sp=0x7f3dbbffed00 pc=0x414293
runtime.scanobject(0xc4203a8160, 0xc420047c70)
/usr/local/go/src/runtime/mgcmark.go:1209 +0x251 fp=0x7f3dbbffee00 sp=0x7f3dbbffed58 pc=0x41f551
runtime.gcDrain(0xc420047c70, 0xd)
/usr/local/go/src/runtime/mgcmark.go:965 +0x237 fp=0x7f3dbbffee58 sp=0x7f3dbbffee00 pc=0x41ed37
runtime.gcBgMarkWorker.func2()
/usr/local/go/src/runtime/mgc.go:1865 +0x187 fp=0x7f3dbbffee98 sp=0x7f3dbbffee58 pc=0x457107
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:409 +0x79 fp=0x7f3dbbffeea0 sp=0x7f3dbbffee98 pc=0x459499
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1170 fp=0x7f3dbbffeea8 sp=0x7f3dbbffeea0 pc=0x431440

goroutine 50 [GC worker (idle)]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:363 fp=0xc4202f8748 sp=0xc4202f8740 pc=0x459410
runtime.gcBgMarkWorker(0xc420046a00)
/usr/local/go/src/runtime/mgc.go:1829 +0x1ee fp=0xc4202f87d8 sp=0xc4202f8748 pc=0x41b38e
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4202f87e0 sp=0xc4202f87d8 pc=0x45bfa1
created by runtime.gcBgMarkStartWorkers

@wgliang

This comment has been minimized.

Copy link
Contributor Author

wgliang commented Jul 18, 2018

@davecheney @bradfitz @ianlancetaylor
All error related with /usr/local/go/src/runtime/mbitmap.go.

We are very distressed about it.

@davecheney

This comment has been minimized.

Copy link
Contributor

davecheney commented Jul 18, 2018

@wgliang please respond to @ianlancetaylor 's request, #26243 (comment)

@wgliang

This comment has been minimized.

Copy link
Contributor Author

wgliang commented Jul 18, 2018

It's a large project, even in the testing phase, we can't use the -race parameter to scan the stack all the time, it is also a burden for us.

@wgliang

This comment has been minimized.

Copy link
Contributor Author

wgliang commented Jul 18, 2018

@ianlancetaylor
This problem is also occasionally happening, I am sorry that I can't always reproduce it.

@wgliang wgliang changed the title runtime: pointer 0xc423d9fe30 to unallocated span on CentOS found bad pointer in Go heap (incorrect use of unsafe or cgo?) Jul 18, 2018

@agnivade agnivade changed the title found bad pointer in Go heap (incorrect use of unsafe or cgo?) runtime: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) Jul 18, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Jul 18, 2018

I didn't see a clear answer to whether you use cgo or unsafe.

From the limited information we have the most natural guess would be that your program is somehow producing invalid pointer values, which most commonly happens due to a violation of the cgo pointer passing rules (https://golang.org/cmd/cgo/#hdr-Passing_pointers). Would it be possible for you to run your program with the environment variable GODEBUG=cgocheck=2? That will make it run slower but will do additional checks on the use of pointers with cgo.

@kevinburke

This comment has been minimized.

Copy link
Contributor

kevinburke commented Dec 18, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Dec 18, 2018

CC @aclements See trybot link just above.

@FiloSottile FiloSottile changed the title runtime: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) runtime: fatal error: found bad pointer in Go heap Jan 2, 2019

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 2, 2019

@aclements, feel free to delegate if you're swamped on other things, but assigning to you by default for runtime.

@mknyszek

This comment has been minimized.

Copy link
Contributor

mknyszek commented Jan 3, 2019

The error in the trybot failure is reminiscent of #24993 which was fixed 13 days ago. Since it happened on freebsd, it's also possible it's related to #28054.

The failure in this issue is from go1.10, however, so it's unlikely to be related to either of those or the trybot failure. @kevinburke the hash at the top of that trybot run isn't in any branch I receive from a git fetch, can you point to the CL/commit which created that trybot failure?

@bradfitz

This comment has been minimized.

Copy link
Member

bradfitz commented Jan 3, 2019

@mknyszek, you'd see if it you did:

$ git ls-remote https://go.googlesource.com/go | grep 18080916300fb1a035d03703f481224cb6bce9ca
18080916300fb1a035d03703f481224cb6bce9ca        refs/changes/23/154423/3

Or you can search like:

https://go-review.googlesource.com/q/18080916300fb1a035d03703f481224cb6bce9ca

Which redirects to:

https://go-review.googlesource.com/c/go/+/154423 (where it was PS3)

@FiloSottile

This comment has been minimized.

Copy link
Member

FiloSottile commented Jan 7, 2019

@aclements Do you think this should be a release blocker? I am leaning towards no, since it’s extremely rare, does not feel like an easy fix, and has been an issue since 1.10.

@mknyszek

This comment has been minimized.

Copy link
Contributor

mknyszek commented Jan 7, 2019

@bradfitz thanks for the tip! I'll note that for the future.

@FiloSottile just to be precise, it's unclear at this point if this bug is related to anything but go1.10, or if it's a bug in the runtime at all. The linked trybot failure I think more information is needed before we can label it a release blocker, but I'll let @aclements have the final say. To summarize what others have asked for in this thread:

  1. Is this failure reproducible on go1.11 and on tip?
  2. What can we do to reproduce it ourselves?
  3. Does @wgliang make use of cgo or unsafe in their code? If yes, in what ways?

If I were to take a guess, and also assume that this is a bug in the runtime that's manifest, the most recent issue which has a similar-looking failure based on the stack traces provided above is #29362 whose fix has been backported to 1.10 already (#29567). The similarities in the stack traces between that issue and this one is a little subtle because in go1.10 there was heapBitsForObject where now findObject seems to fulfill a similar role.

@aclements

This comment has been minimized.

Copy link
Member

aclements commented Jan 7, 2019

In both of the provided traces, the pointer appears to be a legal Go heap pointer and we've already freed the span to which it points. In the first trace, we probably freed it recently since the pointer is still within the span bounds. In the second trace, the mspan has already been reused for another region of memory, since the bad pointer doesn't even fall into its bounds (so we picked up a stale span pointer from the spans array). Also in both cases, the object containing the bad pointer looks like it's probably a slice, and the pointer to its backing array is bad. If true, this is interesting because that pointer is largely hidden from user Go code.

@wgliang, in addition to @mknyszek's question, could you check if your code uses reflect.SliceHeader anywhere?

@FiloSottile, given that we need a lot more information to debug this, and the original report is from a fairly old version of Go, I'm going to drop release-blocker from this.

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Jan 7, 2019

above is #29362 whose fix has been backported to 1.10 already (#29567).

The fix for #29362 was not backported to 1.10. It was backported to 1.11 only.
(#29567 was closed as unnecessary.)

@gopherbot

This comment has been minimized.

Copy link

gopherbot commented Feb 7, 2019

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this Feb 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.