-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: go microservice is crashing in runtime during garbage collection running #64682
Comments
in frame 14 i dumped p and s value. (dlv) print p (dlv) print s in findObject it is failing at this condition here p is 824644280320 what could be reason it set to s.limit ? |
Verstuurd vanaf mijn iPad
Op 12 dec. 2023 om 18:51 heeft harsh8118 ***@***.***> het volgende geschreven:
Go version
go 1.20
What operating system and processor architecture are you using (go env)?
go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='.cache/go-build'
GOENV='.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build4248553971=/tmp/go-build -gno-record-gcc-switches'
What did you do?
system was in idle state for 48 hours. observed this crash
What did you expect to see?
no crash
What did you see instead?
0 0x000055925bb1e532 in runtime.readgstatus
at runtime/proc.go:977
1 0x000055925bb1e532 in runtime.tracebackothers.func1
at runtime/traceback.go:1231
2 0x000055925baf8969 in runtime.forEachGRace
at runtime/proc.go:621
3 0x000055925bb1e4db in runtime.tracebackothers
at runtime/traceback.go:1230
4 0x000055925bb0b9ab in runtime.sighandler
at runtime/signal_unix.go:748
5 0x000055925bb0b0d0 in runtime.sigtrampgo
at runtime/signal_unix.go:490
6 0x000055925bb1e532 in runtime.tracebackothers.func1
at runtime/traceback.go:1231
7 0x000055925baf8969 in runtime.forEachGRace
at runtime/proc.go:621
8 0x000055925bb1e4db in runtime.tracebackothers
at runtime/traceback.go:1230
9 0x000055925baf59c6 in runtime.dopanic_m
at runtime/panic.go:1316
10 0x000055925baf53ed in runtime.fatalthrow.func1
at runtime/panic.go:1170
11 0x000055925baf5345 in runtime.fatalthrow
at runtime/panic.go:1163
12 0x000055925baf503e in runtime.throw
at runtime/panic.go:1077
13 0x000055925bad16a5 in runtime.badPointer
at runtime/mbitmap.go:321
14 0x000055925bad1826 in runtime.findObject
at runtime/mbitmap.go:364
15 0x000055925badd40c in runtime.scanobject
at runtime/mgcmark.go:1335
16 0x000055925badccda in runtime.gcDrain
at runtime/mgcmark.go:1103
17 0x000055925bad95cf in runtime.gcBgMarkWorker.func2
at runtime/mgc.go:1385
18 0x000055925bb29327 in runtime.systemstack
at runtime/asm_amd64.s:509
19 0x000055925bb292c8 in runtime.systemstack_switch
at runtime/asm_amd64.s:474
20 0x000055925bad9296 in runtime.gcBgMarkWorker
at runtime/mgc.go:1353
21 0x000055925bb2b2c1 in runtime.goexit
at runtime/asm_amd64.s:1650
(dlv)
—
Reply to this email directly, view it on GitHub<#64682>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A3RUO3MRTMOOOZVZ76NDIY3YJCKSPAVCNFSM6AAAAABAR3XNTOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAZTQMRXGYZTMNY>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
It could be a problem with your code, if you have a data race or have incorrect uses of |
it's a very large application. this crash is seen very randomly in idle state. i tried running my program with race detector multiple times but i am not hitting the issue when -race is enabled. |
Thanks for that. The race detector will often catch problems well before they become a crash, so there's likely information in its lack of report even if you didn't reach the crash point with it on. Unfortunately, there's not a lot to go on here. The garbage collector is clearly finding a bad pointer, but that doesn't give us much information about where it found that pointer. If you could provide more of the bad pointer report, that would help. That report would be the lines starting with a phrase like "runtime: pointer 0xXXX to unused region of span" and ending with "fatal error: found bad pointer in Go heap". |
Hi, Unfortunately i was not able to collect the report logs when crash happen. but i have access to core file. i can dump all the values captured in logs using core files. p = 824644280320 previously i added complete dump for s also. so from there we can get s.limit, s.base and other related parameters. is there a way i can simulate the output of gcDumpObject() using dlv ? |
I am also confused about these traces in stack trace. in frame 5 runtime.sigtrampgo() is called is it due to badPointer call in same flow or some other signal handler ? 2 0x000055925baf8969 in runtime.forEachGRace |
I don't think there's any easy way to do that, no. Other than simulating it by hand.
The signal handler during the backtrace does look odd. The runtime is trying to print the backtrace of a goroutine and it is failing. If I had to guess I'd say there was a nil I'm having a bit of trouble making sense of your line numbers. The traceback.go line numbers should be 1240 and 1241, not 1230 and 1231, I think. Your environment variables say go1.21.4, but the line numbers look like they are from an earlier point release, maybe 1.21.1? |
go version is 1.20, looks my go env was wrong. |
Those line numbers look completely wrong for 1.20. Can you double-check? You should be able to run It would be worth trying 1.21.5 or tip, just to see if it is a problem that has been fixed. |
yeah sorry it is go1.21.1 |
I don't see anything obviously related in the 1.21.1 -> 1.21.5 changes. Was there a Go version under which this code didn't crash? If so, or if we could find one, then we could do binary search to find the change that is likely responsible. I don't have any other ideas to figure out what the problem is here, sorry. |
we were hitting this crash on previous version of go also. scanobject() api calculates p and pass it as parameter to findObject(). there can be any issue in scanobject while calculating p or since heap is corrupted it causes some issue in calculating p ? if state := s.state.get(); state != mSpanInUse || p < s.base() || p >= s.limit { |
If the heap is corrupted, arbitrarily bad things can happen. |
i hit another crash when system was in idle state, this time BT is different, it is crashing in pprof. we have pprof enabled for all profiles and keep collecting data every 15 mins. I there a chance pprof causes race race condition when GC is running that can possibly corrupt the heap ? Sending output to pager... |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
Go version
go 1.20
What operating system and processor architecture are you using (
go env
)?What did you do?
system was in idle state for 48 hours. observed this crash
What did you expect to see?
no crash
What did you see instead?
0 0x000055925bb1e532 in runtime.readgstatus
at runtime/proc.go:977
1 0x000055925bb1e532 in runtime.tracebackothers.func1
at runtime/traceback.go:1231
2 0x000055925baf8969 in runtime.forEachGRace
at runtime/proc.go:621
3 0x000055925bb1e4db in runtime.tracebackothers
at runtime/traceback.go:1230
4 0x000055925bb0b9ab in runtime.sighandler
at runtime/signal_unix.go:748
5 0x000055925bb0b0d0 in runtime.sigtrampgo
at runtime/signal_unix.go:490
6 0x000055925bb1e532 in runtime.tracebackothers.func1
at runtime/traceback.go:1231
7 0x000055925baf8969 in runtime.forEachGRace
at runtime/proc.go:621
8 0x000055925bb1e4db in runtime.tracebackothers
at runtime/traceback.go:1230
9 0x000055925baf59c6 in runtime.dopanic_m
at runtime/panic.go:1316
10 0x000055925baf53ed in runtime.fatalthrow.func1
at runtime/panic.go:1170
11 0x000055925baf5345 in runtime.fatalthrow
at runtime/panic.go:1163
12 0x000055925baf503e in runtime.throw
at runtime/panic.go:1077
13 0x000055925bad16a5 in runtime.badPointer
at runtime/mbitmap.go:321
14 0x000055925bad1826 in runtime.findObject
at runtime/mbitmap.go:364
15 0x000055925badd40c in runtime.scanobject
at runtime/mgcmark.go:1335
16 0x000055925badccda in runtime.gcDrain
at runtime/mgcmark.go:1103
17 0x000055925bad95cf in runtime.gcBgMarkWorker.func2
at runtime/mgc.go:1385
18 0x000055925bb29327 in runtime.systemstack
at runtime/asm_amd64.s:509
19 0x000055925bb292c8 in runtime.systemstack_switch
at runtime/asm_amd64.s:474
20 0x000055925bad9296 in runtime.gcBgMarkWorker
at runtime/mgc.go:1353
21 0x000055925bb2b2c1 in runtime.goexit
at runtime/asm_amd64.s:1650
(dlv)
The text was updated successfully, but these errors were encountered: