Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: newosproc doesn't handle clone returning EAGAIN #49438

asuffield opened this issue Nov 8, 2021 · 1 comment

runtime: newosproc doesn't handle clone returning EAGAIN #49438

asuffield opened this issue Nov 8, 2021 · 1 comment


Copy link

@asuffield asuffield commented Nov 8, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.7 linux/amd64

Does this issue reproduce with the latest release?

I haven't tried, but inspection of the code says it will.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

(With apologies for pruning)

What did you do?

I don't have a reproduction case for this one - it's very sensitive to something I haven't pinned down yet - but I have uncovered the nature of the bug via inspection.

Running go programs on sufficiently loaded systems sometimes crashes with "runtime: failed to create new OS thread (have 2 already; errno=11)". The really interesting thing here is errno=11, which is EAGAIN. If you read the Linux manpage for fork/clone it will refer to system limits; I have verified that is not the case in my scenario. At this point I said to myself (more than once): fork/clone aren't restartable syscalls, surely they can't actually return EAGAIN. Then I started doubting myself and went looking.

It turns out that Linux can and does return EAGAIN in some circumstances which are entirely undocumented in the manpages. The key code path ends up here:

And starts out over here:

Which eventually led me back to this thread:

It appears Linux has been willing to return EAGAIN to fork/clone for over a decade now, which means this code needs to handle that case somehow:


Lines 167 to 173 in a97c527

if ret < 0 {
print("runtime: failed to create new OS thread (have ", mcount(), " already; errno=", -ret, ")\n")
if ret == -_EAGAIN {
println("runtime: may need to increase max user processes (ulimit -u)")

It is super unfortunate that the rlimit scenario also returns EAGAIN, but I don't see any solution other than retrying a few times before panic - but maybe there's something I haven't fully understood here, I'll admit I haven't pieced together exactly what's happening. The only thing I'm fully confident of is: there is some way in which go processes can crash with an EAGAIN returned from clone() which isn't caused by rlimits.

@ianlancetaylor ianlancetaylor changed the title newosproc doesn't handle clone() returning EAGAIN runtime: newosproc doesn't handle clone returning EAGAIN Nov 8, 2021
Copy link

@ianlancetaylor ianlancetaylor commented Nov 8, 2021

Note: for the cgo case we use a loop with an increasing delay to handle pthread_create returning EAGAIN; see #18146 and We could certainly do the same thing for the non-cgo case, which is what you are describing. It would be nice to have a test case.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants