Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: all thread is block in runtime.futex,service is hanging,may be deadlock #22398

Closed
wuwanjie opened this issue Oct 23, 2017 · 15 comments

Comments

Projects
None yet
6 participants
@wuwanjie
Copy link

commented Oct 23, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.8.4

Does this issue reproduce with the latest release?

may be,it is tested in production service

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/wuwanjie/devspace/itachi"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build857213602=/tmp/go-build"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

i use gdb and strace attach to process,get gouroutines stack

atal: morestack on g0
fatal: morestack on g0
SIGTRAP: trace trap
PC=0x460072 m=17 sigcode=128

goroutine 0 [idle]:
runtime.morestack()
/usr/local/go/src/runtime/asm_amd64.s:368 +0x22

goroutine 4142382 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:281 fp=0xc4292551b0 sp=0xc4292551a8
runtime.gcStart(0x0, 0x200)
/usr/local/go/src/runtime/mgc.go:1014 +0x218 fp=0xc4292551e8 sp=0xc4292551b0
runtime.mallocgc(0x300, 0x0, 0x200, 0x10000c4263cb800)
/usr/local/go/src/runtime/malloc.go:785 +0x491 fp=0xc429255288 sp=0xc4292551e8
runtime.rawstring(0x2e0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/runtime/string.go:237 +0x85 fp=0xc4292552b8 sp=0xc429255288
runtime.rawstringtmp(0x0, 0x2e0, 0xc429255348, 0x44e37f, 0xc4263cbde0, 0x20, 0xc4263cb800)
/usr/local/go/src/runtime/string.go:107 +0x78 fp=0xc4292552f8 sp=0xc4292552b8
runtime.slicebytetostring(0x0, 0xc4263cbb00, 0x2e0, 0x300, 0x300, 0x2e0)
/usr/local/go/src/runtime/string.go:89 +0x3e fp=0xc429255350 sp=0xc4292552f8
log/syslog.(*Writer).Write(0xc420016d20, 0xc4263cbb00, 0x2e0, 0x300, 0x2e0, 0x300, 0x2)
/usr/local/go/src/log/syslog/syslog.go:174 +0x59 fp=0xc4292553a0 sp=0xc429255350
git.pandatv.com/panda-web/gobase/log.(*syslogWriter).write(0xc42005caf0, 0xc4263cb502, 0xc4263cb500, 0x2d8)
/home/wuwanjie/devspace/itachi/src/git.pandatv.com/panda-web/gobase/log/syslog.go:27 +0x143 fp=0xc429255450 sp=0xc4292553a0
git.pandatv.com/panda-web/gobase/log.write(0xcafc02, 0xc4263cb500, 0x2d8)
/home/wuwanjie/devspace/itachi/src/git.pandatv.com/panda-web/gobase/log/logkit.go:246 +0x91 fp=0xc429255508 sp=0xc429255450
git.pandatv.com/panda-web/gobase/log.(*XLogger).Infof(0xc42001a230, 0xcafc29, 0x27, 0xc4292555c0, 0x4, 0x4)
/home/wuwanjie/devspace/itachi/src/git.pandatv.com/panda-web/gobase/log/logkit.go:338 +0x8d fp=0xc429255550 sp=0xc429255508
golang/util.LogInfof(0xcb572e, 0x3f, 0xc429255c28, 0x7, 0x7)
/home/wuwanjie/devspace/itachi/src/golang/util/logger.go:28 +0x1e4 fp=0xc429255610 sp=0xc429255550
main.v1_check(0x10d23e0, 0xc42c6cefc0, 0xc426331700)
/home/wuwanjie/devspace/itachi/src/golang/itachi/router.go:111 +0x14a5 fp=0xc429255ca8 sp=0xc429255610
net/http.HandlerFunc.ServeHTTP(0xcbc2c0, 0x10d23e0, 0xc42c6cefc0, 0xc426331700)
/usr/local/go/src/net/http/server.go:1942 +0x44 fp=0xc429255cd0 sp=0xc429255ca8
net/http.(*ServeMux).ServeHTTP(0x10f81c0, 0x10d23e0, 0xc42c6cefc0, 0xc426331700)
/usr/local/go/src/net/http/server.go:2238 +0x130 fp=0xc429255d10 sp=0xc429255cd0
net/http.serverHandler.ServeHTTP(0xc420091d90, 0x10d23e0, 0xc42c6cefc0, 0xc426331700)
/usr/local/go/src/net/http/server.go:2568 +0x92 fp=0xc429255d58 sp=0xc429255d10
net/http.(*conn).serve(0xc42dc07b80, 0x10d2ba0, 0xc439b9e080)
/usr/local/go/src/net/http/server.go:1825 +0x612 fp=0xc429255fc8 sp=0xc429255d58
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc429255fd0 sp=0xc429255fc8
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2668 +0x2ce

goroutine 1 [IO wait]:
net.runtime_pollWait(0x7f8d2c6fbf08, 0x72, 0x10cf220)
/usr/local/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc422c74618, 0x72, 0x10cb728, 0xc435af4be0)
/usr/local/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc422c74618, 0xffffffffffffffff, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).accept(0xc422c745b0, 0x0, 0x10cdf20, 0xc435af4be0)
/usr/local/go/src/net/fd_unix.go:430 +0x1e5
net.(*TCPListener).accept(0xc42000e9a8, 0xc42dc07c00, 0xc1c940, 0xffffffffffffffff)
/usr/local/go/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).AcceptTCP(0xc42000e9a8, 0xc42003ed98, 0xc42003eda0, 0xc42003ed90)
/usr/local/go/src/net/tcpsock.go:215 +0x49
net/http.tcpKeepAliveListener.Accept(0xc42000e9a8, 0xcbc4b8, 0xc42dc07b80, 0x10d2c60, 0xc436b917d0)
/usr/local/go/src/net/http/server.go:3044 +0x2f
net/http.(*Server).Serve(0xc420091d90, 0x10d2620, 0xc42000e9a8, 0x0, 0x0)
/usr/local/go/src/net/http/server.go:2643 +0x228
net/http.(*Server).ListenAndServe(0xc420091d90, 0xc420091d90, 0xc436f77940)
/usr/local/go/src/net/http/server.go:2585 +0xb0
net/http.ListenAndServe(0xc436f77940, 0x5, 0x0, 0x0, 0x4, 0xc436f77940)
/usr/local/go/src/net/http/server.go:2787 +0x7f
main.main()
/home/wuwanjie/devspace/itachi/src/golang/itachi/main.go:28 +0x1b2

goroutine 17 [syscall, 473 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2197 +0x1

goroutine 5 [syscall, 473 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:116 +0x104
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
/usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 15 [chan receive, 315 minutes]:
golang/spam.reloadBayesData.func1(0xcb006f, 0x28, 0x10f7330, 0x10f7340)
/home/wuwanjie/devspace/itachi/src/golang/spam/bayes.go:147 +0x1fc
created by golang/spam.reloadBayesData
/home/wuwanjie/devspace/itachi/src/golang/spam/bayes.go:156 +0x5d

goroutine 16 [chan receive, 315 minutes]:
golang/spam.reloadBayesData.func1(0xcb0047, 0x28, 0x10f7310, 0x10f7320)
/home/wuwanjie/devspace/itachi/src/golang/spam/bayes.go:147 +0x1fc
created by golang/spam.reloadBayesData
/home/wuwanjie/devspace/itachi/src/golang/spam/bayes.go:156 +0x5d

goroutine 34 [chan receive]:
golang/spam.listenFile.func1(0xc42e974f40, 0xcb001f, 0x28, 0xc42e968450, 0xc42000e1f8)
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:31 +0xb0
created by golang/spam.listenFile
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:42 +0xb3

goroutine 35 [chan receive]:
golang/spam.listenFile.func1(0xc42e985400, 0xcb0700, 0x29, 0xc42e9698a0, 0xc42000e380)
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:31 +0xb0
created by golang/spam.listenFile
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:42 +0xb3

goroutine 36 [chan receive]:
golang/spam.listenFile.func1(0xc42c166f40, 0xcae10b, 0x23, 0xc42c145ee0, 0xc42000e508)
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:31 +0xb0
created by golang/spam.listenFile
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:42 +0xb3

goroutine 37 [chan receive]:
golang/spam.listenFile.func1(0xc4359cb680, 0xcb3051, 0x31, 0xc4359cc370, 0xc42000e690)
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:31 +0xb0
created by golang/spam.listenFile
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:42 +0xb3

goroutine 38 [chan receive]:
golang/spam.listenFile.func1(0xc436f787c0, 0xcaf4ba, 0x26, 0xc436f76900, 0xc42000e818)
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:31 +0xb0
created by golang/spam.listenFile
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:42 +0xb3

goroutine 39 [chan receive]:
golang/spam.listenFile.func1(0xc437169d40, 0xcae84e, 0x24, 0xc436f778a0, 0xc42000e9a0)
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:31 +0xb0
created by golang/spam.listenFile
/home/wuwanjie/devspace/itachi/src/golang/spam/common.go:42 +0xb3

goroutine 4142197 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4283a8cb0)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142281 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4358c5030)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142223 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4283a8f50)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142402 [runnable]:
net/http.setRequestCancel.func3(0x0, 0xc424af9ec0, 0xc439b9e600, 0xc43bc25e6c, 0xc42d4fa720)
/usr/local/go/src/net/http/client.go:320 +0x17c
created by net/http.setRequestCancel
/usr/local/go/src/net/http/client.go:330 +0x287

goroutine 4142337 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc43a802150)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142169 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4283a8a10)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142253 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4283a95e0)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142401 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4358c5570)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142383 [IO wait]:
net.runtime_pollWait(0x7f8d1c90db80, 0x72, 0xa)
/usr/local/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc4358c54f8, 0x72, 0x10cf220, 0x10cb728)
/usr/local/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc4358c54f8, 0xc439b9e0d1, 0x1)
/usr/local/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).Read(0xc4358c5490, 0xc439b9e0d1, 0x1, 0x1, 0x0, 0x10cf220, 0x10cb728)
/usr/local/go/src/net/fd_unix.go:250 +0x1b7
net.(*conn).Read(0xc4255c7be0, 0xc439b9e0d1, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:181 +0x70
net/http.(*connReader).backgroundRead(0xc439b9e0c0)
/usr/local/go/src/net/http/server.go:656 +0x58
created by net/http.(*connReader).startBackgroundRead
/usr/local/go/src/net/http/server.go:652 +0xdf

goroutine 4142309 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4283a9ce0)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142371 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4358c52d0)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

goroutine 4142273 [sleep]:
time.Sleep(0x2faf0800)
/usr/local/go/src/runtime/time.go:59 +0xf9
golang/spam.analyseStrV1.func1(0xc4283a9a40)
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:78 +0x2a
created by golang/spam.analyseStrV1
/home/wuwanjie/devspace/itachi/src/golang/spam/spam.go:80 +0x17a1

rax 0x17
rbx 0xc42e68c000
rcx 0x463ab5
rdx 0x17
rdi 0x2
rsi 0xca86ad
rbp 0x7f8d121fbd38
rsp 0xc429255188
r8 0x0
r9 0x0
r10 0x7f8d121fbd28
r11 0x212
r12 0x0
r13 0xffffffee
r14 0xa
r15 0xa
rip 0x460072
rflags 0x212
cs 0x33
fs 0x0
gs 0x0
image

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

What did you expect to see?

What did you see instead?

@wuwanjie

This comment has been minimized.

Copy link
Author

commented Oct 23, 2017

i and my colleague read runtime GC source,Why systemstack_switch not return?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2017

systemstack_switch is not a real function. Nothing calls it. The systemstack function puts it at the bottom of the stack to indicate where a user stack switched to the system stack. See the docs for systemstack in runtime/stubs.go.

@ianlancetaylor ianlancetaylor changed the title All thread is block in runtime.futex,service is hanging,may be deadlock runtime: all thread is block in runtime.futex,service is hanging,may be deadlock Oct 23, 2017

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 23, 2017

The stack trace suggests a runtime error, but in the issue title you say that there is a deadlock. Which is it?

How reproducible is this problem?

@wuwanjie

This comment has been minimized.

Copy link
Author

commented Oct 24, 2017

@ianlancetaylor I am sorry,it is problem in production env,it can not reproducible.
systemstack function's param is a function, it passed finishsweep_m(),there are for
loop in that function。So,it can not ret immediately。

@wuwanjie

This comment has been minimized.

Copy link
Author

commented Oct 24, 2017

@ianlancetaylor You can see in picture, all the thread is in futex function,there no M thread to do
GC,So i think this phenomenon is deadlock. Why there is a runtime error in morestack() function

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2017

The crash is saying that some thread tried to extend the g0 stack, which is impossible and should never happen. Unfortunately for some reason there isn't enough information there to see what caused that to happen.

I don't understand why you say that this is a deadlock, when it looks like a crash.

@wuwanjie

This comment has been minimized.

Copy link
Author

commented Oct 24, 2017

@ianlancetaylor OK, why some thread tried to extend the g0 stack is impossible?
Another question,i find all thread is in futex,and the futex function‘s third param
ns is -1 or big int,Why ns is -1,you kown -1 means block forever,so thread can not
reused.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2017

Sorry, when I said that extending the g0 stack is impossible, I didn't mean to imply that it wasn't happening here. What I mean is that the runtime is designed to never try to extend the g0 stack. It appears that something has gone wrong, but I don't know what.

Passing -1 as the timeout to futex doesn't mean block forever, it means block until something wakes up the futex.

It is entirely normal for most threads in a Go program to be blocked on a futex. There just has to be at least one thread running to keep the program going. And your program must have had at least one thread going, because that thread crashed.

That said, I don't know how you attached gdb to a crashed program, so there is certainly something I don't understand. In general I don't understand what you saw and I don't understand what you are reporting.

Please answer these two questions.

Did your program crash before you started using gdb?

Did your program get into a deadlock before you started using gdb?

If the answer to these questions are "no" and "yes", then the next question is: why do you think the deadlock is a bug in the Go runtime rather than in your program?

@wuwanjie

This comment has been minimized.

Copy link
Author

commented Oct 25, 2017

@ianlancetaylor
1、Did your program crash before you started using gdb?
no
2、id your program get into a deadlock before you started using gdb?
yes
3、why do you think the deadlock is a bug in the Go runtime rather than in your program?
when i use golang 1.6,never happen this phenomenon.

goroutine 4142382
you can see stack goroutine 4142382,it start gc,but never exit gc。
after i use gdb attach to that process,a lots seconds,it crashed.

I just want kown the reason cause this phenomenon。

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 25, 2017

Have you run your program under the race detector? (https://blog.golang.org/race-detector).

The Go runtime does change in each release. The fact that your program worked on Go 1.6 doesn't necessarily mean that your program is correct.

There are many many different ways that a program can deadlock.

Looking at your stack trace I see a several goroutines sleeping for 0.8 seconds. That doesn't seem like a deadlock.

It's completely normal for all threads to be sleeping on a futex. That is exactly what I would expect to see if several goroutines are sleeping waiting for something to happen.

@wuwanjie

This comment has been minimized.

Copy link
Author

commented Oct 26, 2017

@ianlancetaylor thank you very much,i use � race detector to build new version。

@shenli

This comment has been minimized.

Copy link

commented Jan 8, 2018

@wuwanjie We meet the same problem. Have you found a solution? Thanks!

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2018

@shenli please follow the advice to test your application with the race detector, and if you have confirmed that there are no data races in your program, please open a new issue.

@davecheney

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2018

Duplicate of #23360

@davecheney davecheney marked this as a duplicate of #23360 Jan 8, 2018

@gopherbot

This comment has been minimized.

Copy link

commented Feb 8, 2018

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@gopherbot gopherbot closed this Feb 8, 2018

@golang golang locked and limited conversation to collaborators Feb 8, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.