-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: TestG0StackOverflow timeout on windows/arm64 #63938
Comments
Ping. Can we get this fixed or skipped or something? |
Change https://go.dev/cl/541195 mentions this issue: |
Temporarily skip to make the builder happy. Will work on a fix. Updates #63938. Change-Id: Ic9db771342108430c29774b2c3e50043791189a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/541195 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Heschi Kreinick <heschi@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com>
Change https://go.dev/cl/541997 mentions this issue: |
Tried to reenable this with some additional logging. When it fails, the log is like below. Specifically, it first prints the stack traces as expected, then when it about to crash the program at the end, it gets Exception 0xc00000fd (EXCEPTION_STACK_OVERFLOW), on the crash stack. The exception triggers another round of stack dumps, and somehow hangs... Using a static allocation for the crash stack (like CL https://go.dev/cl/541997 does), it doesn't cause the exception. When the runtime finally crashes from a @qmuntal do you have any insight to share? Thanks! log
|
In general, Windows doesn't like to handle exceptions that lie outside the system stack. Other parts of the runtime already hit this limitation, for example, exceptions can only be resumed by functions within the system stack limits (workaround), and exceptions can't be unwinded if a function in the call stack is not within the system stack limits (workaround) I couldn't find why we are seeing a |
@qmuntal Thanks! Do you know what's special about system allocated stacks? Is it possible to tell the system that some allocated memory is for system stack (maybe setting some flags on |
I don't know for sure, but my guess is that Windows is very strict about just using the system stack as a security hardening measure.
Nop. What you can do is modify the thread stack limits stored in the TEB, which will trick Windows to think that some allocated memory is the system stack. But that's not documented, and I would try to avoid doing that.
If the test passes, I don't have anything against CL 541997. |
@dagood I recall seeing some similar failures in our windows/amd64 CI hosts. When you have time, can you check if CL 541997 also fixes them? |
Yeah, I picked up on this too. I copied some logs to microsoft/go#1083. In that issue, the run includes retry logic that immediately runs the tests again on the same machine. The retries always seem to fail, which seems to point to this happening consistently on specific windows/amd64 machines and not on others. (Hitting the broader "Rerun failed jobs" button in Azure Pipelines randomly gets new agents, so this hasn't totally blocked anything yet.) I wasn't able to find any agent difference in the logs, but it's very possible that the VMs running our pipelines are hosted on machines with varying specs or configurations without our knowledge. Here's another run where I got rid of retries, where Unfortunately, If I add https://go.dev/cl/541997 as a patch and run that, it still times out and fails. It does print more info: Pasted log
In this run, all windows testing jobs failed, which could be up to chance, but seems unusual. I've kicked off a few more runs with the patch to get more data: |
Thanks for testing. The failure is similar to the one above #63938 (comment) , Exception 0xc00000fd EXCEPTION_STACK_OVERFLOW. So I guess even static allocation doesn't always make Windows happy. One possibility may be that if we're crashing on a non-system allocated crash stack, after dumping stack traces, don't throw an exception, just exit. Maybe this will cause a different exit code, but maybe that is fine? Another possibility is disabling crash stack on Windows, so it works as before, just print a message (without stack trace) and abort. @qmuntal @dagood do you know why EXCEPTION_STACK_OVERFLOW will cause the program hang? If it just throws an EXCEPTION_STACK_OVERFLOW but not hang, I'd be okay with that. Thanks. |
Could you try calling SetThreadStackGuarantee with the expected signal stack size? That might help avoiding the hand on the stack overflow exception handler. If this doesn't help and no one have another solution, then I propose to revert this functionality (on Windows) and investigate it more for go 1.23, as I'm pretty sure it can be done. |
Change https://go.dev/cl/543996 mentions this issue: |
Thanks. Sent CL https://go.dev/cl/543996 to switch back to the old behavior on Windows for now. We'll try again in Go 1.23. |
TestG0StackOverflow timeout on windows/arm64 since CL https://go.dev/cl/538457 .
I'll look into it.
The text was updated successfully, but these errors were encountered: