-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: SIGBUS / SIGSEGV during asmcgocall #46170
Comments
Marking as |
This instruction is Either way, perhaps this is a regabi-related regression? cc @mknyszek |
Ooh, neat! That yields two more matching logs, which both bear a strong similarity to #46080. |
Yeah, I'd say these are all related. It's curious that the openbsd fault addresses are all page-aligned, while the solaris ones aren't. |
I'm looking at the Solaris ones first, because those OpenBSD failures are on 386 and so are less likely to be affected by anything regabi related. I'm not totally convinced it's the same issue, as a result, though maybe it's correlated with some change in compiler behavior. As I'm looking at the code for this, I don't fully understand what value is supposed to be passed to
This is weird. The faulting address does appear to be a stack address, but how did it get there? We're clearly in the 'nosave' path of In the 1.17 release, My only thought is that the stack address somehow propagates from the caller, but Perhaps the trick here is that it shouldn't be going on the Oh, actually, yes! There IS a system stack switch that happens, because |
@bcmills Out of curiosity, how far back do these failures go? What you posted above, is that all of them? It's possible this is some fun combination of https://golang.org/cl/288799 and a regabi-related CL, too, given the current timeline. |
This is actually expected. This is how Solaris port works. asmsysvicall6x is a C function. It is declared as a variable (!) and using linkname to connect to the C function. Arguably it doesn't look nice. |
@cherrymui Thanks, good to know. At least that part makes sense now. |
The "nosave" may not be wrong, either. It may be already on the system stack, so asmcgocall doesn't switch stack. But the traceback stops as asmcgocall, because it doesn't know how to unwind through it (maybe it is possible to teach traceback code for that case, if it can tell we are on the "nosave" path.). |
Can the SIGBUS be alignment? if the C function called by asmcgocall uses a 16-byte aligned instruction to load something from the stack, but the the address is not aligned? |
Perhaps I'm missing something. Do you have any ideas as to where does the stack switch happens? I thought that it must be Though, that goroutine that I'm interested in (number 378), it's runnable, not in |
Ah! Yeah, that's a good point. I think that might actually be it. |
That's all of the recent ones I could find using |
Oh, hey, that sounds familiar: see previously #17641. |
If asmcgocall is called by goroutine 387 in syscall_sysvicall6, then it does look weird, both the stack switch and the G status. Maybe it is called from somewhere else? |
If it is alignment, I would expect it fails more consistently, instead of very rarely. Maybe it is something else. Maybe OS bug... |
Both OpenBSD failures have stack like this
It might be related to #34988. |
That's a good point — and a significant difference compared to the Solaris failures. |
One while running
cmd/go
:2021-05-14T14:37:54-d137b74/solaris-amd64-oraclerel
Another while running
cmd/vet
:2021-04-14T19:38:22-b161b57/solaris-amd64-oraclerel
To me that smells like a runtime or compiler bug, and since these are the only two in the logs it looks like a Go 1.17 regression.
CC @prattmic @cherrymui @randall77
The text was updated successfully, but these errors were encountered: