-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
#!watchflakes
post <- goos == "darwin" && log ~ `os/exec\.\(\*Cmd\)\.awaitGoroutines` && log ~ `internal/poll\.\(\*pollDesc\)\.waitRead` &&
!(builder ~ `(gotip|go1\.\d\d)-` && date < "2024-01-18")
We have been tracking a long-running bug on macOS (darwin) in #54461 (comment). That issue title doesn't accurately capture the widespread nature of the problem, nor its defining symptom or suspected root cause; so, I am filing a new issue for it.
On macOS, we see a pattern of test failures with the following characteristics:
- A goroutine is blocked on a
readoperation on a pipe. - The pipe is known to be closed, because the process on the other end has either terminated (most often) or explicitly closed the pipe (less frequent, but sometimes seen in cases like net/http/cgi: TestCopyError failures due to unexpected child process #57369).
Many of these bugs have been worked around by explicitly canceling (or adding a timeout to) the stuck read.
The issues with this symptom have included:
- x/crypto/ssh/test: TestAcceptCloseTCP failures #60099
- x/tools/gopls: regtest flakes due to hanging go commands #54461
- x/tools/gopls/internal/regtest/modfile: TestSumUpdateFixesDiagnostics failures #61073
- x/tools/gopls/internal/lsp/cmd/test: TestDefinition failures #61128
A few other issues may be related but don't (or didn't) include goroutine dumps for the stuck process: they may or may not be due to this failure mode.
- x/build/internal/task: TestTagXRepos failures #56231
- net/http/cgi: TestCopyError failures due to unexpected child process #57369
- runtime: deadlock when running concurrent builds on MacOS #59657
- os/signal: TestTerminalSignal failures with
subprogram failed: exit status 1afterWaiting for exit...#61595 - os: (*Cmd).Wait hangs on macOS when setctty is used after commit b15c399a3 #61779
- x/build/internal/task: TestTagTelemetry failures #63258
- x/build/internal/task: TestTagSingleRepo failures with
context deadline exceededondarwin#63731 - x/build/internal/task: test failures with
signal: killedon darwin #64110 - x/build/internal/task: TestSyncPrivate failures with
context deadline exceededon darwin #64269 - os/signal: hang in TestTerminalSignal on
darwin#64700 - x/crypto/ssh/test: tests fail on macOS 14 #64959
Given the symptoms, it seems likely to me that this is either a bug in internal/poll, or a bug in the macOS platform that we have somehow started to trigger.
Note that #61779 was bisected specifically to https://go.dev/cl/420334.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status