Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: don't crash in exception handler on windows/arm64 #50877

Closed
TopperDEL opened this issue Jan 28, 2022 · 15 comments
Closed

runtime: don't crash in exception handler on windows/arm64 #50877

TopperDEL opened this issue Jan 28, 2022 · 15 comments
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@TopperDEL
Copy link

I have a quite complex bug but together with @marler8997 we found a workaround.

I tried to get a go-library working on a Hololens 2 via .Net. So I build the library into a dll using the awesome zig-compiler, and then tried to PInvoke into it from C#/.Net. I did all this already successfull on Windows, Linux, MacOs, Android and iOs.

On Hololens, though, my app always crashed whenever the UWP (Universal Windows Platform) DLL came together with the go-runtime. We know found out that Go seems to try to handle an exception thrown from UWP and fails to do so leading to a "badsignal" and an application crash.

@marler8997 helped me with an many in-depth-sessions debugging assembly code and created that patch:
marler8997@f645ec0

May someone with knowledge of the win-arm64-build if this is a good solution or if there might be a better way of doing it? Thank you very much!

What version of Go are you using (go version)?

1.17.6

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Windows, ARM64, Hololens 2

@ALTree ALTree changed the title Win/ARM64: don't crash in exception handler runtime: don't crash in exception handler on windows/arm64 Jan 28, 2022
@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 28, 2022
@ALTree
Copy link
Member

ALTree commented Jan 28, 2022

cc @cherrymui @mknyszek (my guess)

@marler8997
Copy link

marler8997 commented Jan 28, 2022

This comment from @ianlancetaylor seems very relevant here: #12465 (comment)

This is the Go signal handler receiving a SIGCHLD signal. It's not a problem for getg to return nil in sigtrampgo. That just means that a signal occurred on a non-Go thread.

But this comment contradicts what the sigtramp code is doing. If getg returns nil, sigtramp calls badsignal2 which calls abort causing an AccessViolation. If Ian's comment is still correct today (it was made in 2015) then this is indeed a bug in sigtramp.

My workaround was jut to modify sigtramp to return 0 when getg returns nil (marler8997@f645ec0). On windows this tells the caller that the exception is not handled and to continue to the next handler, which may be the functionality we want here.

It's also interesting to note the naming of this function sigtramp. This is a vestigal to linuxy platforms where programs need to register a special "trampoline" function to make signals work. This is not the case on windows as the "trampoline" functionality is already provided by other means. Instead executables can register callback functions without setting up a trampoline and already has a mechanism to support multiple callbacks and a way to stop/continue through them. The simple solution of returning when getg returns nil may actually be correct on Windows.

Also note that I see arm32 does the same thing here, so if this fix is correct then it should probably go to arm32 as well.

@ianlancetaylor
Copy link
Contributor

SIGCHLD only happens on Unix systems. The Unix sigtramp does correctly handle getg returning nil.

I don't know enough about Windows exceptions off-hand to know whether the same problem occurs. It wouldn't be surprising. The windows-amd64 sigtramp function does appear to handle a nil g.

@marler8997
Copy link

Yes the same problem occurs on Windows. On windows rather than signals they call this "Vectored Exception Handling". It's a simple mechanism where the applications can register multiple callbacks which are then called by whatever mechanism Windows uses to capture interrupts and propagate them to userspace. The handler will continue to call each one or stop depending on the return value of the previous call.

What's happening in our case is another library is using exception handling as normal control flow. When the exception occurs it calls go's sigtramp on a thread that go's runtime doesn't know about and viola, here we are.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/442896 mentions this issue: runtime: ignore exceptions from non-Go threads on windows arm/arm64

@qmuntal
Copy link
Contributor

qmuntal commented Oct 14, 2022

Thanks for reporting this issue @TopperDEL @marler8997.

I've been investigating it and get to the same conclusion as you: we should ignore exceptions coming from non-Go threads instead of calling badsignal2, which is inline with what's happening on windows amd64 and 386.

@TopperDEL
Copy link
Author

Thanks for the update! The finding goes to @marler8997 and I'm still very thankfull for that!
As I don't know how the process is: do you know when that fix might be released?

@ALTree
Copy link
Member

ALTree commented Oct 17, 2022

@TopperDEL The change linked above is under review (with a comment from Alex). After that comment is addressed, it could be merged (unless other issues arise). If it is merged, it'll be included in the next Go release (Go 1.20), scheduled for February 2023.

Serious issues with no workarounds can be backported to a minor release. If that happens, the fix will also be available in the next 1.19 minor release (could be 1.19.3 or later). These are done roughly every month.

@TopperDEL
Copy link
Author

Thank you very much for that info!

@marler8997
Copy link

@qmuntal any reason why my commit (marler8997@f645ec0) was not included in the one that was merged (6fe3ccd)? This commit was the result of many hours of work that spanned the course of a year, but, now my authorship is now gone from the commit history because you took my changes and incorporated them into your own commit.

@qmuntal
Copy link
Contributor

qmuntal commented Oct 20, 2022

now my authorship is now gone from the commit history because you took my changes and incorporated them into your own commit.

To be clear, I did not incorporate your changes into my commit, I didn't even read the discussions in this issue.

I've recently gone through Windows sigtramp many times due to CL 431775 and CL 442216 and I already knew arm64 had this issue, and the solution is quite trivial once you understand Go assembly (which is nit trivial at all!). Yet, I have to admit I took the g_ok label name once I saw it, as it was cleaner than what I had (BNE 6(PC)).

Having said this, I hope you trust me when I said I did not act in bad faith. I could have done a better job by either asking you to submit your change or adding you as a co-author, but I didn't, and I'm sorry for that. Will take this a s a lesson.

@marler8997
Copy link

To be clear, I did not incorporate your changes into my commit, I didn't even read the discussions in this issue.

Ok fair enough. The changes looks almost exactly the same but I see there's a couple differences now that I look more closely.

I had submitted this change back in January https://go-review.googlesource.com/c/go/+/381775#message-d3c3fc570adf49bb93dd96a011a1d2d20b6fa21a

But, I remember there was a concern about the solution that I was going to try to look into but never came back to it. It was something about trying to tell whether the TLS was initialized or not when g was nil? You can find it in the linked discussion thread if you're interested.

Anyway, I normally wouldn't care about a 7 line change, but, this is a result of dozens of hours of work over the coarse of a year so I'm a little bummed I won't get any credit in the git commit history, but, sounds like you came up with the change on your own without my help so fair is fair.

@TopperDEL
Copy link
Author

I can only thank you both for fixing a bug in a world where I don't know anything about. :)

@qmuntal
Copy link
Contributor

qmuntal commented Oct 20, 2022

I had submitted this change back in January https://go-review.googlesource.com/c/go/+/381775#message-d3c3fc570adf49bb93dd96a011a1d2d20b6fa21a

That CL does not link to this issue but to 43800, probably by mistake, so gerritbot did not create the appropriate link and I couldn't know it was there. Unfortunate.

But, I remember there was a concern about the solution that I was going to try to look into but never came back to it. It was something about trying to tell whether the TLS was initialized or not when g was nil? You can find it in the linked discussion thread if you're interested.

I made a conscious decision to not special-case uninitialized TLS and I was even going to submit a CL removing that from amd64 and 386, as seemed to me that that branch is not reachable (see #8224 (comment) for more discussion on this). But given the conversations that you linked, I might be wrong and will have to reconsider.

Edit: sigtramp callback is set in initExceptionHandler, which is called from osinit and always after wintls. This means TLS can't be uninitialized when reaching sigtramp, unless there is a sever runtime bug. In any case, calling badsignal2 is not a good solution, as the problem wouldn't be a bad signal, but an error initialization the runtime.

romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Nov 3, 2022
If there is no current G while handling an exception it means
the exception was originated in a non-Go thread.

The best we can do is ignore the exception and let it flow
through other vectored and structured error handlers.

I've removed badsignal2 from sigtramp because we can't really know
if the signal is bad or not, it might be handled later in the chain.

Fixes golang#50877
Updates golang#56082

Change-Id: Ica159eb843629986d1fb5482f0b59a9c1ed91698
Reviewed-on: https://go-review.googlesource.com/c/go/+/442896
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/463116 mentions this issue: runtime: remove unused badsignal2 on windows

gopherbot pushed a commit that referenced this issue Jan 24, 2023
This CL removes badsignal2 function, as it is unused on Windows.

badsignal2 was originally intended to abort the process when
an exception was raised on a non-Go thread, following the same approach
as Linux and others.

Since it was added, back on https://golang.org/cl/5797068, it has caused
several issues on Windows, see #8224 and #50877. That's because we can't
know wether the signal is bad or not, as our trap might not be at the
end of the exception handler chain.

To fix those issues, https://golang.org/cl/104200046 and CL 442896
stopped calling badsignal2, and CL 458135 removed one last incorrect
call on amd64 and 386.

Change-Id: I5bd31ee2672118ae0f1a2c8b46a1bb0f4893a011
Reviewed-on: https://go-review.googlesource.com/c/go/+/463116
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Quim Muntal <quimmuntal@gmail.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jan 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

7 participants