Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: intermittent "failed to create new OS thread" on Windows since 2022-01-24 #52572

Open
bcmills opened this issue Apr 26, 2022 · 13 comments
Open
Labels
NeedsInvestigation okay-after-beta1 OS-Windows release-blocker
Milestone

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 26, 2022

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
runtime: failed to create new OS thread (22)
FAIL	golang.org/x/tools/go/packages	25.671s

greplogs --dashboard -md -l -e '(?ms)\Awindows-.*runtime: failed to create new OS thread'

2022-04-26T02:30:39-5bb9a5e-17d7983/windows-amd64-race
2022-03-21T13:26:21-86b02b3-4aa1efe/windows-arm64-11
2022-02-04T14:02:15-25d2ab2-4afcc9f/windows-arm64-11
2022-01-24T16:55:59-f9df4ea/windows-amd64-2008

Curiously, I don't see any of these errors in the logs before 2022-01-24. 🤔

@bcmills bcmills added OS-Windows NeedsInvestigation labels Apr 26, 2022
@bcmills bcmills added this to the Go1.19 milestone Apr 26, 2022
@bcmills
Copy link
Member Author

@bcmills bcmills commented Apr 26, 2022

Since most of these failures are in x/tools, I wouldn't be entirely surprised if this is actual thread-exhaustion due to the tests running with high parallelism and spawning subprocesses that themselves run with high parallelism. (Some of these failures also correlate with out of memory errors from the same test run.)

However the “requested the Runtime to terminate it in an usual way" failure mode in particular looks like a Go runtime bug. Since windows/amd64 is a first-class port, I'm marking this as a release-blocker at least until we can determine whether it is a regression (attn @golang/runtime).

@heschi
Copy link
Contributor

@heschi heschi commented May 11, 2022

ping: this is a release blocker that hasn't been updated in a while.

@aclements
Copy link
Member

@aclements aclements commented May 11, 2022

Three out of the four also have an "out of memory" failure, which strongly suggests this is a real resource exhaustion on the host. That's consistent with most of the failures being on a particular subrepo.

The "requested the Runtime to terminate it in an usual way" message is not coming from the Go runtime, and must be coming from the C runtime. That's presumably printed by CreateThread failing, but it is odd that the Go runtime still gets to print "failed to create new OS thread" after the C runtime prints its message. (Maybe that's a buffer flushing thing?)

@prattmic
Copy link
Member

@prattmic prattmic commented May 23, 2022

I reproduced this 12 times over the weekend on gomotes, interestingly always in package runtime's TestCgoSignalDeadlock:

--- FAIL: TestCgoSignalDeadlock (44.53s)
    crash_test.go:124: running go build -o C:\Users\gopher\AppData\Local\Temp\1\go-build3458167679\testprogcgo.exe 
    crash_cgo_test.go:37: expected "OK\n", but got:
        OK
        
        This application has requested the Runtime to terminate it in an unusual way.
        Please contact the application's support team for more information.
        runtime: failed to create new OS thread (13)
FAIL
FAIL    runtime 418.665s

@prattmic
Copy link
Member

@prattmic prattmic commented May 23, 2022

I should add that most, but not all, failures include the "This application has requested ..." message, and that none of the failures include any other resource or other test failures in the same run.

@prattmic
Copy link
Member

@prattmic prattmic commented May 23, 2022

The tests did seem to take abnormally long to run. That said, the first run (which passed) on a new gomote took 12s to complete (mostly spent building the test binary), so these aren't that far off.

--- FAIL: TestCgoSignalDeadlock (4.41s)
--- FAIL: TestCgoSignalDeadlock (23.29s)
--- FAIL: TestCgoSignalDeadlock (26.25s)
--- FAIL: TestCgoSignalDeadlock (41.33s)
--- FAIL: TestCgoSignalDeadlock (43.05s)
--- FAIL: TestCgoSignalDeadlock (35.08s)
--- FAIL: TestCgoSignalDeadlock (27.79s)
--- FAIL: TestCgoSignalDeadlock (21.33s)
--- FAIL: TestCgoSignalDeadlock (37.15s)
--- FAIL: TestCgoSignalDeadlock (36.62s)
--- FAIL: TestCgoSignalDeadlock (5.74s)
--- FAIL: TestCgoSignalDeadlock (44.53s)

@prattmic
Copy link
Member

@prattmic prattmic commented May 24, 2022

Ah, well the "requested the Runtime to terminate it in an usual way" message is not surprising. The runtime message comes from https://cs.opensource.google/go/go/+/master:src/runtime/cgo/gcc_windows_amd64.c;l=31-32.

Per the Microsoft docs, "By default, the abort routine prints the message:

"This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.""

I am now more confused why I sometimes don't get the message, and also why the runtime error is printed after the message, given that the docs say "abort does not flush stream buffers or do atexit/_onexit processing."

@prattmic
Copy link
Member

@prattmic prattmic commented May 24, 2022

"runtime: failed to create new OS thread (13)" is errno EACCES. _beginthread "may return EACCES if there are insufficient resources (such as memory)", which fits well with the out-of-memory errors.

@prattmic
Copy link
Member

@prattmic prattmic commented May 24, 2022

I am running with https://go.dev/cl/408216 to try to get a sense of whether the crashing process is using lots of memory, the overall system is low on memory, or this has nothing to do with memory.

(Hopefully there is enough memory to run the dump function 😅)

@gopherbot
Copy link

@gopherbot gopherbot commented May 24, 2022

Change https://go.dev/cl/408216 mentions this issue: WIP: runtime: dump memory stats on failure

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented May 24, 2022

@prattmic Have you tried extending the idea of https://go.dev/cl/33894 to Windows? The exact reason for an "insufficient resources" error isn't that interesting if retrying fixes the problem.

@prattmic
Copy link
Member

@prattmic prattmic commented May 24, 2022

I haven't, but it is not a bad idea, as _beginthread provides a very clear errno for this failure.

That said, while this is the only error in my repros, all of the failures in Bryan's original comment also have other fatal OOMs in the allocator. So while a retry might make thread creation succeed, the program may die anyways due to another OOM.

It seems like the ultimate problem here might end up being test/builder misconfiguration. i.e., we are running too many high-memory tests in parallel and exhausting all memory on the builder.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented May 25, 2022

Since there's a decent chance this is resource exhaustion on the builders I'm not sure this is going to impede users using the beta, so marking as okay-after-beta1.

@mknyszek mknyszek added the okay-after-beta1 label May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation okay-after-beta1 OS-Windows release-blocker
Projects
Status: Todo
Development

No branches or pull requests

7 participants