Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "morestack on g0" in x/perf/storage/app on windows/arm64 #47557

Open
bcmills opened this issue Aug 5, 2021 · 21 comments
Open

runtime: "morestack on g0" in x/perf/storage/app on windows/arm64 #47557

bcmills opened this issue Aug 5, 2021 · 21 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Aug 5, 2021

$ greplogs --dashboard -l -md -e (?ms)morestack on g0.*FAIL\s+golang\.org/x/perf/storage/app
2021-02-20T03:31:36-40a54f1/6e73886/windows-arm64-10
2021-02-20T03:31:36-40a54f1/8a7ee4c/windows-arm64-10
2021-02-20T03:31:36-40a54f1/b8ca6e5/windows-arm64-10

fatal: morestack on g0
runtime: signal received on thread not created by Go.
…
FAIL	golang.org/x/perf/storage/app	0.274s

CC @prattmic @cherrymui @ianlancetaylor

@bcmills bcmills added this to the Go1.17 milestone Aug 5, 2021
@mknyszek mknyszek removed this from the Go1.17 milestone Aug 18, 2021
@mknyszek mknyszek added this to the Backlog milestone Aug 18, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Sep 22, 2021

This is a release-blocker via #11811.

windows/arm64 is not a first-class port, so in theory this can be addressed by either fixing the underlying bug or adding skips to the relevant test(s).

(However, it looks like a pretty severe runtime bug to me.)

@bcmills bcmills removed this from the Backlog milestone Sep 22, 2021
@bcmills bcmills added this to the Go1.18 milestone Sep 22, 2021
@heschi
Copy link
Contributor

@heschi heschi commented Sep 29, 2021

@toothrot
Copy link
Contributor

@toothrot toothrot commented Oct 13, 2021

This is delightfully reproducible and fails very reliably.

@toothrot
Copy link
Contributor

@toothrot toothrot commented Oct 13, 2021

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 13, 2021

I'll take a look.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

After a few hours of holding gomote wrong, I've reproduced. Now I'm trying to get it into a debuggable state, but go test -c is having issues...

The following runs fine:

gomote run -dir=perf user-mknyszek-windows-arm64-10-0 go/bin/go test ./storage/app

And the following fails:

$ gomote run -dir=perf user-mknyszek-windows-arm64-10-0 go/bin/go test -c ./storage/app
# golang.org/x/perf/storage/app.test
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136463
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136490
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x13649e
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136463
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136490
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x13649e
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136463
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136490
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x13649e
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136463
sym 3263: unsupported obj reloc R_DWARFSECREF/4 to go.info.net/http.(*ServeMux).HandleFunc$abstract
sym 3263: invalid relocation: R_DWARFSECREF .debug_info+0x136490
C:\workdir\go\pkg\tool\windows_arm64\link.exe: too many errors
Error running run: exit status 2

CC @thanm maybe?

@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 14, 2021

Oh, interesting! By coincidence, that difference between go test and go test -c) came up recently in a code review.
(https://go-review.googlesource.com/c/go/+/348991/8..15/src/cmd/go/testdata/script/build_issue48319.txt#b15)

@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 14, 2021

I would say that you could add the -s and -w LDFLAGS manually, but I'm guessing you actually need that debug info “to get it into a debuggable state”. 😅

@thanm
Copy link
Contributor

@thanm thanm commented Oct 14, 2021

Interesting problem. It looks like one of these tests is failing:

https://go.googlesource.com/go/+/24e798e2876f05d628f1e9a32ce8c7f4a3ed3268/src/cmd/link/internal/arm64/asm.go#610
https://go.googlesource.com/go/+/24e798e2876f05d628f1e9a32ce8c7f4a3ed3268/src/cmd/link/internal/arm64/asm.go#618

meaning that the relocation won't reach, but we can't find the linker-introduced label symbol. Why it is happening with only DWARF relocations is a mystery though.

I think this would be better off as another bug. Do you want to file it or should I?

@thanm
Copy link
Contributor

@thanm thanm commented Oct 14, 2021

I am kind of curious about how you are going to debug the test once it's build properly with DWARF. Delve doesn't support windows+arm64, so I assume gdb... does the builder actually have a gdb that works?

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

@thanm 🤦 yeah, you're right. I don't think it has gdb. It might have a Windows debugger. I guess it's just down to print debugging, anyway.

I'll file the bug.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

OK actually, I know why we're getting a morestack on g0 failure and why all these signal received on thread not created by Go messages.

Something causes a signal to land on a thread not created by Go the first time (or so the runtime thinks). This calls into badsignal2, which in turn calls runtime.abort. Unfortunately runtime.abort just raises a signal. If we're already not on a Go thread, we just fall back into badsignal2 until we exhaust the stack, hence the morestack on g0 failure that appears to finally fail.

I'm not yet sure what causes the original signal, though.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

Coincidentally, I have a CL that fixes the recursive runtime.abort issue: https://go-review.googlesource.com/c/go/+/321789/. I should really land it, I totally forgot about it. After that, we should be able to get more info.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 14, 2021

Change https://golang.org/cl/321789 mentions this issue: runtime: exit harder in badsignal2

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

That's very strange. I've confirmed that binaries built with https://golang.org/cl/321789 on windows/arm64 do actually have the right code in badsignal2, but I'm still getting the same failure mode. As far as I can tell, there's no other way such a message gets printed...

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

Furthermore, the failure appears before any tests actually get executed. Having go test -c work here turns out would actually be very helpful, since I'm not sure at what point it's failing now (in the compiler? early in the runtime for the test?)

@thanm
Copy link
Contributor

@thanm thanm commented Oct 14, 2021

You might try working around the DWARF problem with

go test -ldflags=-w -c

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

Thanks @thanm! That worked. OK, it's definitely not the compiler crashing, it's the binary. But before any tests execute, I'm afraid.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

Looks like it's failing very early in runtime initialization. This explains the failure; there's no g set up yet or anything! Whatever the failure really is, it could be masked by this bad signal stuff, I think.

I've narrowed down the failure to this loop on the first module data encountered in moduledataverify.

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Oct 14, 2021

I've further confirmed that on the second iteration of that loop, this check passes, so there's already something wrong. However, then the runtime crashes on the following line, specifically, the datap.pclntable indexing.

This suggests to me that something is broken about the binary. It's worth noting that this is a cgo binary; all the tests in this package that produce the failing binary are build-tagged with cgo. I have a copy of the bad binary and also steps to reproduce; this isn't my area of expertise so any help would be appreciated.

@TopperDEL
Copy link

@TopperDEL TopperDEL commented Oct 19, 2021

I have a similar issue, though I'm not sure if that is the exact same problem. I convert a go-library with CGO into a DLL for windows ARM64. That crashes during load/initialization with an "AccessViolation during read" followed by plenty of "AccessViolation during write" until the program quits with a "StackOverflow". The root-error seems to be around "morestack", too, and the runtime seems to try to raise a badsignal. So it kind of seems to fit.

I could provide two dlls - one is working and the other is not. If one adds those to a UWP-app and tries to PInvoke into "uplink_internal_UniverseIsEmpty" on e.g. a Hololens 2, it crashes with the above described error-chain.

storj_uplink_dlls.zip

This is the root-cause-assembly-code:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants