Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: StartProcess blocked at acquire lock #26836

Closed
wushukai opened this Issue Aug 7, 2018 · 9 comments

Comments

Projects
None yet
5 participants
@wushukai
Copy link

wushukai commented Aug 7, 2018

What version of Go are you using (go version)?

go version go1.8.3 linux/amd64

Does this issue reproduce with the latest release?

We have not tested yet.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/workspace"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build598025583=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

Our service will start subprocess periodically.

And we found some goroutines hanged for competing ForkLock.Lock(), but none succeed. Stack of one blocked goroutine is listed below

goroutine 1526643 [semacquire, 8274 minutes]:
sync.runtime_SemacquireMutex(0x1a2aba4)
/usr/local/go/src/runtime/sema.go:62 +0x34
sync.(*Mutex).Lock(0x1a2aba0)
/usr/local/go/src/sync/mutex.go:87 +0x9d
sync.(*RWMutex).Lock(0x1a2aba0)
/usr/local/go/src/sync/rwmutex.go:86 +0x2d
syscall.forkExec(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019f9e8, 0x0, 0x0, 0x0)
/usr/local/go/src/syscall/exec_unix.go:185 +0x1fd
syscall.StartProcess(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019f9e8, 0x2, 0x4, 0xc4244941e0, 0xc42019f9b8)
/usr/local/go/src/syscall/exec_unix.go:240 +0x64
os.startProcess(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019fb90, 0xc426b93440, 0x3, 0x3)
/usr/local/go/src/os/exec_posix.go:45 +0x1a3
os.StartProcess(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019fb90, 0x0, 0x0, 0x28)
/usr/local/go/src/os/exec.go:94 +0x64
os/exec.(*Cmd).Start(0xc426e70580, 0xc42019fc01, 0xc424fca850)
/usr/local/go/src/os/exec/exec.go:359 +0x3d2
os/exec.(*Cmd).Run(0xc426e70580, 0xc424fca850, 0xc426e70580)
/usr/local/go/src/os/exec/exec.go:277 +0x2b

And besides the blocked groutines, there are not other stack containing function "forkExec".

The problem occurs sometimes among our production services, but I have not found any way to ensure reproducing this.

Please help provides some clue for debugging this problem...

@tklauser tklauser changed the title syscall.StartProcess blocked at acquire lock syscall: StartProcess blocked at acquire lock Aug 7, 2018

@tklauser

This comment has been minimized.

Copy link
Member

tklauser commented Aug 7, 2018

@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Aug 7, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Aug 7, 2018

What you are describing sounds like a clear bug, but I do not recall any similar reports. The fork lock is only held briefly and I do not know of any code path in which it could be left locked. Is there any way that we can reproduce the problem ourselves? Is it possible that the kernel is sometimes killing a single thread of your program?

@wushukai

This comment has been minimized.

Copy link
Author

wushukai commented Aug 8, 2018

Our server use a customized kernel based on linux 4.1. And I had checked the kernel messages, and found nothing seems related to this problem.

I wrote a simple test program which randomly starts child processes, and have not reproduced yet.

I have tried upgrade the golang version to 1.10.3, and deploys the new version to parts of our production servers, I will check if it can reproduce in this new version

@crvv

This comment has been minimized.

Copy link
Contributor

crvv commented Aug 8, 2018

The ForkLock is a public variable. It can be held by any code.
Maybe this isn't a bug of stdlib but some other libraries.

@wushukai

This comment has been minimized.

Copy link
Author

wushukai commented Aug 9, 2018

@crvv
I checked all source code under GOPATH, no other libs use this lock..

@odeke-em

This comment has been minimized.

Copy link
Member

odeke-em commented Jan 30, 2019

How's it going @wushukai? Any more returns of this bug? Able to perhaps isolate a reproduction?

@wushukai

This comment has been minimized.

Copy link
Author

wushukai commented Jan 31, 2019

@odeke-em The problem was gone after we upgraded to 1.10.3. we had have some experiments in our test environment using version 1.8.3, but had no luck to reproduce..

@odeke-em

This comment has been minimized.

Copy link
Member

odeke-em commented Feb 1, 2019

@odeke-em The problem was gone after we upgraded to 1.10.3. we had have some experiments in our test environment using version 1.8.3, but had no luck to reproduce..

Thank you for the update @wushukai!

Given that this bug is elusive/no pulse yet to reproduce it after Go1.10, perhaps we could:
a) Move this issue to unplanned and out of the Go1.12 milestone
b) Close this issue as non-actionable

@ianlancetaylor what do you think we should do?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Feb 1, 2019

It sounds like it can be reproduced in newer versions of Go, so I think we can close this. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.