Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "fatal: systemstack called from unexpected goroutine" on Android #51001

Open
bcmills opened this issue Feb 3, 2022 · 17 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Android
Milestone

Comments

@bcmills
Copy link
Member

bcmills commented Feb 3, 2022

greplogs --dashboard -md -l -e '^fatal: systemstack called from unexpected goroutine' --since=2021-01-01

2022-02-02T21:12:39-53d6a72/android-amd64-emu

##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
fatal: systemstack called from unexpected goroutineTrap 
exitcode=133
FAIL	runtime	59.879s

2021-10-08T16:26:20-59d4e92-99c1b24/android-amd64-emu

fatal: systemstack called from unexpected goroutineSegmentation fault 
exitcode=139FAIL	golang.org/x/net/publicsuffix	3.419s

I'll also note that badsystemstackMsg seems to be missing a final newline as of CL 93659 (CC @aclements @randall77). 😅

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Android labels Feb 3, 2022
@bcmills bcmills added this to the Backlog milestone Feb 3, 2022
@bcmills
Copy link
Member Author

bcmills commented May 3, 2022

This happened in a TryBot in https://storage.googleapis.com/go-build-log/f1e11825/android-amd64-emu_262486a5.log:

##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
fatal: systemstack called from unexpected goroutineTrap 
exitcode=133
FAIL	runtime	16.177s
FAIL
2022/05/03 14:18:09 Failed: exit status 1
go tool dist: FAILED

Marking as release-blocker because this affects TryBot runs. Since android/amd64 is not a first-class port, either the underlying bug can be diagnosed and fixed, or the builder can be removed from the default TryBot set. (I'll leave that choice up to @golang/runtime to decide and implement.)

@bcmills bcmills modified the milestones: Backlog, Go1.19 May 3, 2022
@bcmills
Copy link
Member Author

bcmills commented May 3, 2022

This may or may not be OS-specific. There is another failure in the builder logs since February, but on plan9 rather than android; it isn't obvious to me whether that is an independent bug.

greplogs -l -e 'fatal: systemstack called from unexpected goroutine' --since=2022-02-03
2022-03-05T21:20:16-e155b03-45f4544/plan9-amd64-0intro

@bcmills
Copy link
Member Author

bcmills commented May 4, 2022

greplogs -l -e 'fatal: systemstack called from unexpected goroutine' --since=2022-03-06
2022-05-03T19:48:07-bccce90/android-arm64-corellium

@bcmills bcmills changed the title runtime: "fatal: systemstack called from unexpected goroutine" on android-amd64-emu runtime: "fatal: systemstack called from unexpected goroutine" on android/amd64 May 4, 2022
@mknyszek
Copy link
Contributor

mknyszek commented May 18, 2022

@golang/runtime This is a second class port, but because it's a trybot, this is a release blocker. Should we consider removing this as a trybot? Is that bringing us enough value?

@gopherbot
Copy link

gopherbot commented May 20, 2022

Change https://go.dev/cl/407615 mentions this issue: dashboard: remove android-amd64-emu from main go repo's TryBot set

@dmitshur
Copy link
Contributor

dmitshur commented May 20, 2022

I've mailed CL 407615 that makes android-amd64-emu a post-submit builder only (in the main repo) while investigation of this issue is underway. If submitted, this issue can be unmarked as a release-blocker for Go 1.19.

@bcmills bcmills changed the title runtime: "fatal: systemstack called from unexpected goroutine" on android/amd64 runtime: "fatal: systemstack called from unexpected goroutine" on Android May 23, 2022
@bcmills
Copy link
Member Author

bcmills commented May 23, 2022

Curiously, this does not appear to be arch-specific: we've seen these failures on both amd64 and arm64.

greplogs -l -e 'fatal: systemstack called from unexpected goroutine' --since=2022-05-04
2022-05-20T22:30:37-2b0e457/android-arm64-corellium

@prattmic prattmic added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label May 25, 2022
@prattmic
Copy link
Member

prattmic commented Jun 1, 2022

The first failure shows exitcode=133. This is likely bash parlance for exiting with signal 5 (SIGTRAP). From man bash: The return value of a simple command is its exit status, or 128+n if the command is terminated by signal n.

If I recall correctly, Android applies a seccomp syscall filter to (all?) processes. I wonder if we are violating this filter on the throw path, resulting in truncation of the stack trace. seccomp with mode SECCOMP_RET_TRAP sends a SIGTRAP on violation.

@prattmic
Copy link
Member

prattmic commented Jun 1, 2022

@golang/android do you know if the Android seccomp filters apply to processes on our builders, and if so which one?

@prattmic
Copy link
Member

prattmic commented Jun 6, 2022

No repros of this on 25 gomotes all weekend. I did find #53250, plus several no context SIGSEGVs in the runtime test, like:

##### GOMAXPROCS=2 runtime -cpu=1,2,4 -quick
Segmentation fault 
exitcode=139
FAIL»...runtime»19.914s
FAIL
2022/06/05 22:34:10 Failed: exit status 1

(Some where in the standard runtime test rather the -cpu variant)

@aclements
Copy link
Member

aclements commented Jun 7, 2022

This isn't a first-class port, so dropping release-blocker.

@gopherbot gopherbot removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Jun 10, 2022
@bcmills
Copy link
Member Author

bcmills commented Jun 14, 2022

This isn't a first-class port, so dropping release-blocker.

This port is still run as a default TryBot until/unless CL 407615 is merged. IMO known failures on TryBots should still block releases, since they still add testing noise for anyone who uses TryBots on a pending change.

@bcmills
Copy link
Member Author

bcmills commented Jun 14, 2022

In the interest of decoupling this issue from the Android TryBots in general, I've filed #53377 (as a release-blocker) to decide whether to remove the TryBots or fix their known failure modes.

@bcmills
Copy link
Member Author

bcmills commented Jun 14, 2022

Summarizing the known failures with this pattern on Android:

greplogs -l -e '(?ms)\Aandroid-.*^fatal: systemstack called from unexpected goroutine'
2022-05-20T22:30:37-2b0e457/android-arm64-corellium
2022-05-03T19:48:07-bccce90/android-arm64-corellium
2022-02-02T21:12:39-53d6a72/android-amd64-emu
2021-10-08T16:26:20-59d4e92-99c1b24/android-amd64-emu

So it looks like this bug was probably introduced sometime in 2021..?
(Or else, maybe the check itself was introduced then? 😅)

@gopherbot
Copy link

gopherbot commented Jun 14, 2022

Change https://go.dev/cl/412174 mentions this issue: dashboard: add known issues for android-*-emu

gopherbot pushed a commit to golang/build that referenced this issue Jun 14, 2022
Issue golang/go#42212 manifests as test timeouts, and is by far the most
frequent of these known issues.

Issue golang/go#51001 causes failures with "systemstack called from unexpected
goroutine". It seems to have been introduced sometime last year, but
it isn't clear to me whether it is a regression or an older (latent)
bug unearthed by some other change.

Issue golang/go#52724 appears to be a bug or race in the Android emulator
itself. It might require a builder image update and/or escalation to
the maintainers of the emulator proper.

Updates golang/go#53377.

Change-Id: I677915b1ff02dd02e0f14c63b0d25caf11e27a72
Reviewed-on: https://go-review.googlesource.com/c/build/+/412174
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Bryan Mills <bcmills@google.com>
@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Jun 24, 2022

Rolling forward to 1.20.

@heschi
Copy link
Contributor

heschi commented Aug 29, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Android
Projects
Status: Todo
Status: No status
Development

No branches or pull requests

8 participants