Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: make Windows VirtualAlloc failure more informative #70558

Open
adonovan opened this issue Nov 25, 2024 · 5 comments
Open

runtime: make Windows VirtualAlloc failure more informative #70558

adonovan opened this issue Nov 25, 2024 · 5 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@adonovan
Copy link
Member

When a Windows build fails due to OOM, the error log looks like this:
https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8730352753718672849/+/u/step/19/log/2
It contains a thread dump of the go test command at the moment after VirtualAlloc failed, but I suspect the real culprits here are the test child processes, among which is x/tools/go/ssa, which is known to have a big appetite for memory. Unfortunately that information is somewhat obscure in the log.

The task of this issue is to consider whether there are ways that the Windows OOM crash could be made more informative, for example by including the committed size of the current process. Alternatively, whether there are changes we could make to go test or the builders that would point the finger of blame more clearly.

@prattmic

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 25, 2024
@prattmic
Copy link
Member

cc @golang/runtime @golang/windows

@GoVeronicaGo GoVeronicaGo added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 25, 2024
@alexbrainman
Copy link
Member

When a Windows build fails due to OOM, the error log looks like this:
https://logs.chromium.org/logs/golang/buildbucket/cr-buildbucket/8730352753718672849/+/u/step/19/log/2

From the log

runtime: VirtualAlloc of 8192 bytes failed with errno=1455

1455 is ERROR_COMMITMENT_LIMIT - see https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--1300-1699- .

I suspect you need to increase your memory or page file on your computer. See for example, #33716 (comment) which had the same error 1455.

The task of this issue is to consider whether there are ways that the Windows OOM crash could be made more informative, for example by including the committed size of the current process

I do not know what memory figures would be helpful here. Windows memory is complicated. But I am not an expert in this area. Perhaps others might have better suggestions.

Alex

@mknyszek
Copy link
Contributor

See also #52546.

I became less motivated to try to improve this because failures due to a failure to map on Linux usually don't indicate a true OOM (thanks to demand paging and the OOM killing policy...), but something more silly like VMA limits or address space limits (for example, 32-bit mode under 64-bit Linux).

For Windows I think this is slightly more tractable because of the explicit commit step required, but as @alexbrainman states, it's still somewhat hard, unfortunately, since a failure to commit memory could still come from some hidden places.

(As an aside, it's really a shame that errors like this are so difficult to debug on modern systems. In theory it shouldn't be, the OS is doing all the accounting already, but it's generally not exposed in a way that's easy for anyone to interpret. Linux's OOM-killing also makes this obnoxiously hard because it force-kills which results in the destruction of information.)

Lastly, there's a ton of information that could be relevant to dump, but it's also hard to fetch and dump it when you can't even mmap (or VirtualAlloc) anymore. I suppose we could reserve some memory for, for example, reading /proc/<pid>/status on Linux and whatever we can use on Windows.

@mknyszek mknyszek added this to the Backlog milestone Nov 26, 2024
@prattmic
Copy link
Member

When Alan brought this up, I was thinking that we could call GetProcessMemoryInfo when throwing and print WorkingSetSize.

I don't know Windows well enough to know if this is a good idea. It was just a thought to try to give a hint whether the problem was with this process using tons of memory, or other processes using tons of memory.

@prattmic
Copy link
Member

We might even get close enough to that by just printing some of our own internal metrics, like live heap size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
Development

No branches or pull requests

6 participants