Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: frequent TestSegv failures since 2021-10-26 #49182

Open
bcmills opened this issue Oct 27, 2021 · 14 comments
Open

runtime: frequent TestSegv failures since 2021-10-26 #49182

bcmills opened this issue Oct 27, 2021 · 14 comments

Comments

@cuonglm
Copy link
Member

@cuonglm cuonglm commented Oct 27, 2021

I think since https://go-review.googlesource.com/c/go/+/339990

CL https://go-review.googlesource.com/c/go/+/339989 causes windows-arm64-10 builder fails.

Loading

@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 27, 2021

Filed the windows-arm64-10 issue separately as #49188.

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 27, 2021

Change https://golang.org/cl/359254 mentions this issue: runtime: disable TestSegv on darwin, illumos, solaris

Loading

gopherbot pushed a commit that referenced this issue Oct 28, 2021
CL 339990 made this test more strict, exposing pre-existing issues on
these OSes. Skip for now until they can be resolved.

Updates #49182

Change-Id: I3ac400dcd21b801bf4ab4eeb630e23b5c66ba563
Reviewed-on: https://go-review.googlesource.com/c/go/+/359254
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
@bcmills
Copy link
Member Author

@bcmills bcmills commented Oct 28, 2021

Hmm, looks like relaxing the check didn't completely solve the problem:

greplogs --dashboard -md -l -e 'FAIL: TestSegv' --since=2021-10-28T16:54:00

2021-10-28T16:54:58-6bd0e7f/linux-arm-aws

Loading

@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 2, 2021

linux-arm-aws is the only one that appears to still be flaky:

greplogs --dashboard -md -l -e '(?ms)TestSegv.*unexpectedly saw "runtime: "'

2021-11-02T20:47:30-b246873/linux-arm-aws
2021-11-02T17:01:14-62b29b0/linux-arm-aws
2021-10-28T16:54:58-6bd0e7f/linux-arm-aws

Loading

@aclements
Copy link
Member

@aclements aclements commented Nov 3, 2021

This looks like the signal is landing while we're in the VDSO, presumably executing a kernel-provided atomic for casgstatus.

@prattmic, you looked at and fixed a few similar issues recently (e.g., 86f6bf1). Any insights on this one?

Loading

@aclements
Copy link
Member

@aclements aclements commented Nov 3, 2021

Yeah, the "unknown PC" is in the kernel-provided CAS implementation. The problem may just be that we're entirely missing the vdsoSP protection around the VDSO cas call (presumably the same thing applies to memory_barrier<>, too).

Loading

@prattmic
Copy link
Member

@prattmic prattmic commented Nov 3, 2021

Yes, I believe that is the case. I'll get those covered as well.

Loading

@prattmic prattmic self-assigned this Nov 8, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 9, 2021

Change https://golang.org/cl/362796 mentions this issue: runtime/internal/atomic: treat ARM kernel helpers as VDSO

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 9, 2021

Change https://golang.org/cl/362795 mentions this issue: runtime: refactor ARM VDSO call setup to helper

Loading

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 10, 2021

Change https://golang.org/cl/362977 mentions this issue: runtime: start ARM atomic kernel helper traceback in caller

Loading

@jeremyfaller
Copy link
Contributor

@jeremyfaller jeremyfaller commented Nov 12, 2021

Mostly fixed. Not a Beta1 blocker as it's not a new breakage, just a stricter test.

Loading

@gopherbot gopherbot closed this in 3634594 Nov 12, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 17, 2021

Looks like this is fixed on ARM and skipped on amd64, but still occasionally failing on some of the more exotic architectures.
(It's not clear to me whether the remaining failures are arch-specific bugs.)

greplogs --dashboard -md -l -e 'FAIL: TestSegv ' --since=2021-11-13

2021-11-17T04:55:12-1d004fa/linux-mips64le-mengzhuo
2021-11-13T00:23:16-39bc666/linux-riscv64-unmatched

Loading

@bcmills bcmills reopened this Nov 17, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 17, 2021

Looks like both of those failures are in the SegvInCgo subtest.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants