-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: frequent TestSegv failures since 2021-10-26 #49182
Comments
I think since https://go-review.googlesource.com/c/go/+/339990 CL https://go-review.googlesource.com/c/go/+/339989 causes windows-arm64-10 builder fails. |
Filed the |
Change https://golang.org/cl/359254 mentions this issue: |
CL 339990 made this test more strict, exposing pre-existing issues on these OSes. Skip for now until they can be resolved. Updates #49182 Change-Id: I3ac400dcd21b801bf4ab4eeb630e23b5c66ba563 Reviewed-on: https://go-review.googlesource.com/c/go/+/359254 Trust: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Bryan C. Mills <bcmills@google.com> TryBot-Result: Go Bot <gobot@golang.org>
Hmm, looks like relaxing the check didn't completely solve the problem:
|
2021-11-02T20:47:30-b246873/linux-arm-aws |
Yeah, the "unknown PC" is in the kernel-provided CAS implementation. The problem may just be that we're entirely missing the vdsoSP protection around the VDSO cas call (presumably the same thing applies to |
Yes, I believe that is the case. I'll get those covered as well. |
Change https://golang.org/cl/362796 mentions this issue: |
Change https://golang.org/cl/362795 mentions this issue: |
Change https://golang.org/cl/362977 mentions this issue: |
Mostly fixed. Not a Beta1 blocker as it's not a new breakage, just a stricter test. |
Looks like this is fixed on ARM and skipped on amd64, but still occasionally failing on some of the more exotic architectures.
2021-11-17T04:55:12-1d004fa/linux-mips64le-mengzhuo |
Looks like both of those failures are in the |
https://storage.googleapis.com/go-build-log/3c4f5e79/linux-riscv64-jsing_db2bc678.log (A |
The netbsd and openbsd failures there appear to be #49209. |
In all these cases it looks like the signal is landing in some part of the cgocall path (not all that surprising), though I'm not sure I understand why gentraceback has issues producing a traceback in these cases. |
oh oh, ok, actually for the linux/arm64 failure, the PC looks like the PC for the branch that @prattmic added in https://go.dev/cl/362977 (i.e. the top bits of the PC (in a 48-bit address space) are The other platforms' failures don't seem to look like this, however. |
We have a very complex process to make VDSO calls on ARM. Create a wrapper helper function which reduces duplication and allows for additional calls from other packages. vdsoCall has a few differences from the original code in walltime/nanotime: * It does not use R0-R3, as they are passed through as arguments to fn. * It does not save g if g.m.gsignal.stack.lo is zero. This may occur if it called at startup on g0 between assigning g0.m.gsignal and setting its stack. For #49182 Change-Id: I51aca514b4835b71142011341d2f09125334d30f Reviewed-on: https://go-review.googlesource.com/c/go/+/362795 Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
Change https://go.dev/cl/431975 mentions this issue: |
@golang/runtime, can this be re-triaged? The test still has a |
Change https://go.dev/cl/491095 mentions this issue: |
For #59912. For #59913. Updates #49182. Change-Id: I3fcdfaca3a4f7120404e7a36b4fb5f0e57dd8114 Reviewed-on: https://go-review.googlesource.com/c/go/+/491095 TryBot-Bypass: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> Reviewed-by: Austin Clements <austin@google.com>
Hello from triage. :) I think we didn't get to this because our incoming queue was quite busy. @prattmic will take a look at breaking it up. @cherrymui is working on a rewrite of the test that will split out the failures a bit better. |
There are various TestSegv issues that watchflakes are tracking, e.g. #59443. I think we can close this as a dup. If this happens again watchflakes will post on or reopen an existing issue, or open a new one. |
@cherrymui, this issue is more about the followup work needed to diagnose the existing skips, particularly on |
That is: relying on |
I think we know we are not always able to traceback from asynchronous interrupts in libc calls. I'm not sure if there is anything we can do. Or you mean we should push hard to make that work? The linux/386 VDSO case may be fixable (or already fixed). #50504 should track that. |
I think that if we consider that to be normal operation, we should update the test to examine the output and confirm that it is consistent with what we would expect for that case (instead of calling |
Sounds good. Thanks. |
@prattmic, possibly related to CL 339989 / #47522?
greplogs --dashboard -md -l -e 'FAIL: TestSegv' --since=2021-10-01
2021-10-27T13:12:49-cfb5321/illumos-amd64
2021-10-27T08:50:27-bdefb77/illumos-amd64
2021-10-27T08:50:27-bdefb77/solaris-amd64-oraclerel
2021-10-27T05:33:58-ca5f65d/illumos-amd64
2021-10-26T23:12:17-13eccaa/illumos-amd64
2021-10-26T23:12:17-13eccaa/solaris-amd64-oraclerel
2021-10-26T23:06:10-e5c5125/illumos-amd64
2021-10-26T23:06:10-e5c5125/solaris-amd64-oraclerel
2021-10-26T22:05:53-80be4a4/illumos-amd64
2021-10-26T22:05:53-80be4a4/solaris-amd64-oraclerel
2021-10-26T21:32:57-86f6bf1/darwin-amd64-race
2021-10-26T21:32:57-86f6bf1/illumos-amd64
The text was updated successfully, but these errors were encountered: