New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/cgo: pthread_create failed: Resource temporarily unavailable #24484

Open
fiber opened this Issue Mar 22, 2018 · 13 comments

Comments

Projects
None yet
6 participants
@fiber

fiber commented Mar 22, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10 linux/amd64

Does this issue reproduce with the latest release?

go1.10 is latest

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

my process starts around 500 child processes. The number of os level threads is creeping up slowly until it reaches around 10k, at which point child processes start to die with the below message.

Process limits seem set sufficiently high
Limit Soft Limit Hard Limit Units
Max processes 257093 257093 processes

$ cat /proc/sys/kernel/threads-max
514187

What did you expect to see?

no crash ;)

What did you see instead?

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f24685ab428 m=44 sigcode=18446744073709551610
goroutine 0 [idle]:
runtime: unknown pc 0x7f24685ab428
stack: frame={sp:0x7f2407ffea08, fp:0x0} stack=[0x7f24077ff2f0,0x7f2407ffeef0)
00007f2407ffe908: 00007f2468d84168 00007f2407ffea68
00007f2407ffe918: 00007f2468b67b1f 0000000000000002
00007f2407ffe928: 00007f2468d79a80 0000000000000005
00007f2407ffe938: 0000000000f021e0 00007f23d80008c0
00007f2407ffe948: 00000000000000f1 0000000000000011
00007f2407ffe958: 0000000000000000 0000000000c2597a
00007f2407ffe968: 00007f2468b6cac6 0000000000000005
00007f2407ffe978: 0000000000000000 0000000100000000
00007f2407ffe988: 00007f246857cde0 00007f2407ffeb20
00007f2407ffe998: 00007f2468b74923 000000ffffffffff
00007f2407ffe9a8: 0000000000000000 0000000000000000
00007f2407ffe9b8: 0000000000000000 2525252525252525
00007f2407ffe9c8: 2525252525252525 0000000000000000
00007f2407ffe9d8: 00007f246893b700 0000000000c2597a
00007f2407ffe9e8: 00007f23d80008c0 00000000000000f1
00007f2407ffe9f8: 0000000000000011 0000000000000000
00007f2407ffea08: <00007f24685ad02a 0000000000000020
00007f2407ffea18: 0000000000000000 0000000000000000
00007f2407ffea28: 0000000000000000 0000000000000000
00007f2407ffea38: 0000000000000000 0000000000000000
00007f2407ffea48: 0000000000000000 0000000000000000
00007f2407ffea58: 0000000000000000 0000000000000000
00007f2407ffea68: 0000000000000000 0000000000000000
00007f2407ffea78: 0000000000000000 0000000000000000
00007f2407ffea88: 0000000000000000 0000000000000000
00007f2407ffea98: 0000000000000000 0000000000000000
00007f2407ffeaa8: 00007f24685eebff 00007f246893b540
00007f2407ffeab8: 0000000000000001 00007f246893b5c3
00007f2407ffeac8: 00000000000000f1 0000000000000011
00007f2407ffead8: 00007f24685f0409 000000000000000a
00007f2407ffeae8: 00007f246866d2dd 000000000000000a
00007f2407ffeaf8: 00007f246893c770 0000000000000000
runtime: unknown pc 0x7f24685ab428
stack: frame={sp:0x7f2407ffea08, fp:0x0} stack=[0x7f24077ff2f0,0x7f2407ffeef0)
00007f2407ffe908: 00007f2468d84168 00007f2407ffea68
00007f2407ffe918: 00007f2468b67b1f 0000000000000002
00007f2407ffe928: 00007f2468d79a80 0000000000000005
00007f2407ffe938: 0000000000f021e0 00007f23d80008c0
00007f2407ffe948: 00000000000000f1 0000000000000011
00007f2407ffe958: 0000000000000000 0000000000c2597a
00007f2407ffe968: 00007f2468b6cac6 0000000000000005
00007f2407ffe978: 0000000000000000 0000000100000000
00007f2407ffe988: 00007f246857cde0 00007f2407ffeb20
00007f2407ffe998: 00007f2468b74923 000000ffffffffff
00007f2407ffe9a8: 0000000000000000 0000000000000000
00007f2407ffe9b8: 0000000000000000 2525252525252525
00007f2407ffe9c8: 2525252525252525 0000000000000000
00007f2407ffe9d8: 00007f246893b700 0000000000c2597a
00007f2407ffe9e8: 00007f23d80008c0 00000000000000f1
00007f2407ffe9f8: 0000000000000011 0000000000000000
00007f2407ffea08: <00007f24685ad02a 0000000000000020
00007f2407ffea18: 0000000000000000 0000000000000000
00007f2407ffea28: 0000000000000000 0000000000000000
00007f2407ffea38: 0000000000000000 0000000000000000
00007f2407ffea48: 0000000000000000 0000000000000000
00007f2407ffea58: 0000000000000000 0000000000000000
00007f2407ffea68: 0000000000000000 0000000000000000
00007f2407ffea78: 0000000000000000 0000000000000000
00007f2407ffea88: 0000000000000000 0000000000000000
00007f2407ffea98: 0000000000000000 0000000000000000
00007f2407ffeaa8: 00007f24685eebff 00007f246893b540
00007f2407ffeab8: 0000000000000001 00007f246893b5c3
00007f2407ffeac8: 00000000000000f1 0000000000000011
00007f2407ffead8: 00007f24685f0409 000000000000000a
00007f2407ffeae8: 00007f246866d2dd 000000000000000a
00007f2407ffeaf8: 00007f246893c770 0000000000000000
goroutine 632 [running]:
runtime.systemstack_switch()
/opt/go/1.10.0/go/src/runtime/asm_amd64.s:363 fp=0xc4204f6d50 sp=0xc4204f6d48 pc=0x457270
runtime.gcMarkTermination(0x3ff75e93c8506a48)
/opt/go/1.10.0/go/src/runtime/mgc.go:1647 +0x407 fp=0xc4204f6f20 sp=0xc4204f6d50 pc=0x41a907
runtime.gcMarkDone()
/opt/go/1.10.0/go/src/runtime/mgc.go:1513 +0x22c fp=0xc4204f6f48 sp=0xc4204f6f20 pc=0x41a49c
runtime.gcBgMarkWorker(0xc420048500)
/opt/go/1.10.0/go/src/runtime/mgc.go:1912 +0x2e7 fp=0xc4204f6fd8 sp=0xc4204f6f48 pc=0x41b417
runtime.goexit()
/opt/go/1.10.0/go/src/runtime/asm_amd64.s:2361 +0x1 fp=0xc4204f6fe0 sp=0xc4204f6fd8 pc=0x459de1
created by runtime.gcBgMarkStartWorkers
/opt/go/1.10.0/go/src/runtime/mgc.go:1723 +0x79

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Mar 22, 2018

There are many reasons why a program might leak threads. We need to know something about your programs. Ideally, you would give us code that we can use to recreate the problem. Thanks.

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Mar 22, 2018

@fiber

This comment has been minimized.

fiber commented Mar 22, 2018

I'm afraid I can not publicly share too much detail. The code and setup are fairly complex, so difficult to share either way. The instance processes run in separate network namespaces and exchange UDP+ICMP with approx. 100k peers in total. After the startup there is almost no spawning of new child processes going on. I don't see any goroutines leaking. I have 50+ CPUs in the server and I believe I can reduce the thread bleeding substantially by setting a lower GOMAXPROCS for the instances. If you can share some reaons or areas I may be able to take some of the list.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Mar 22, 2018

The most common reason for a thread to be created is because all the existing threads are blocked in system calls or in calls to C code via cgo (the error message shows that your application uses cgo). cgo calls would be the first place to look. See if any of those calls do not return.

@fiber

This comment has been minimized.

fiber commented Mar 22, 2018

I believe there is no cgo being used outside the standard library. I have compiled with CGO_ENABLED=0 now and will monitor the situation.

@fiber

This comment has been minimized.

fiber commented Apr 11, 2018

compiling with CGO_ENABLED=0 and setting a low GOMAXPROCS did reduce the overall number of threads to about 1/3rd. Instead of peaking out at 12k threads, I'm now down to 4.5k, but still (very) slowly creeping up. Still investigating.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jun 28, 2018

@kolyshkin

This comment has been minimized.

Contributor

kolyshkin commented Oct 18, 2018

@fiber you are probably hitting the kernel.pid_max sysctl limit. But raising it is not a solution, as it might lead to overall system being stuck (unavailable via ssh etc).

The real problem, though, if why golang runtime chooses to die upon receiving EAGAIN from pthread_create(). I have only started to look at it, but it appears that a trivial fork bomb run on the system can cause a Go app running on the same system to crash, even if there is no goroutine leak.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 18, 2018

In a Go program that uses cgo, new threads are created using pthread_create. If pthread_create fails, the Go runtime will retry up to 20 times. The relevant code is at https://golang.org/src/runtime/cgo/gcc_libinit.c#L91 .

@kolyshkin

This comment has been minimized.

Contributor

kolyshkin commented Oct 19, 2018

If pthread_create fails, the Go runtime will retry up to 20 times.

...on EAGAIN only (which I guess is right), and if it still fails, it calls abort(). This is unfortunate that there's no way to handle this in a more graceful manner. It does not matter how many threads the running program has, or if there a goroutine leak -- the running program aborts if it can't create a new goroutine.

In my case this is docker daemon that gets aborted once the kernel.pid_max limit is hit (which is easy to achieve by running many containers with many threads), and I don't see any practical way to avoid that.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 19, 2018

Goroutines are not threads. There are normally many many more goroutines than threads. A goroutine leak won't in itself lead to this problem. A thread leak will.

We can't fix this problem until we understand where the thread leak is coming from in the original program.

@xianglinghui

This comment has been minimized.

xianglinghui commented Oct 19, 2018

@ianlancetaylor Our service occur crash with the same error:

runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x3927232625 m=39 sigcode=18446744073709551610

Our go version is 1.11.1.

We didn't use cgo in our code. The library which we relying on also seems not use. Does the go itself will use cgo in some scenarios?

@SjonHortensius

This comment has been minimized.

SjonHortensius commented Oct 19, 2018

@xianglinghui - I have the same issue but as @ianlancetaylor explained

In a Go program that uses cgo, new threads are created using pthread_create

Eg. if you disable cgo, new threads will not use pthread_create but another mechanism.

For my application (which contains a very small number of concurrent go-routines and no explicit cgo usage but a lot of os.Exec calls) disabling cgo fixed a lot of Resource temporarily unavailable crashes as well. I'm not sure what causes this and how expected it is - but I'm just disabling cgo from now on

@fiber

This comment has been minimized.

fiber commented Oct 19, 2018

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Oct 19, 2018

@xianglinghui I believe it's platform dependent. If you are running on Darwin, then the standard library will use cgo by default, for DNS requests, unless you build with CGO_ENABLED=0.

This bug is still waiting for a reproduction case. If you have a case where a program crashes by running out of threads for no clear reason, please do share the code if you can so that we can try to reproduce it ourself. Don't forget to provide all the relevant system details.

@fiber Large numbers of concurrent DNS requests did previously cause large numbers of threads to be created, but we fixed that, at least partially, in #25694. Though we could perhaps extend that fix to also check RLIMIT_NPROC.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.12, Go1.13 Nov 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment