Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: test timeouts without useful debugging output on darwin builders #38768

Open
bcmills opened this issue Apr 30, 2020 · 11 comments
Open

cmd/go: test timeouts without useful debugging output on darwin builders #38768

bcmills opened this issue Apr 30, 2020 · 11 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 30, 2020

There are numerous occasional test timeouts on the darwin-amd64-race builder in the logs.

I note that the windows-amd64-race builder sets GO_TEST_TIMEOUT_SCALE=2:
https://github.com/golang/build/blob/916311cec4e1738aae572fe49767cebdc0351144/dashboard/builders.go#L2067

I suspect that we may need to do the same for the darwin-amd64-race builder to reduce flakiness.

CC @andybons

2020-04-27T23:05:16-42c4899/darwin-amd64-race
2020-04-17T04:26:54-29d925d/darwin-amd64-race
2020-04-16T15:31:20-4eaf855/darwin-amd64-race
2020-04-10T18:45:24-37470c0/darwin-amd64-race
2019-10-11T16:34:01-ba108c9/darwin-amd64-race
2019-10-04T21:22:55-5f4aa5d/darwin-amd64-race
2019-09-24T04:19:31-e29d276/darwin-amd64-race
2019-09-16T22:14:49-115e4c9/darwin-amd64-race
2019-09-06T21:22:26-8a8cf5b/darwin-amd64-race
2019-06-29T13:17:15-623d653/darwin-amd64-race
2019-06-28T20:10:05-2e0cd2a/darwin-amd64-race
2019-06-28T20:09:48-91c385b/darwin-amd64-race
2019-06-28T20:00:59-3b040b7/darwin-amd64-race
2019-06-27T20:43:26-f205ae3/darwin-amd64-race
2019-06-26T20:17:35-9acd2d6/darwin-amd64-race
2019-06-26T18:38:59-06f709a/darwin-amd64-race
2019-06-26T16:35:14-60f14fd/darwin-amd64-race
2019-06-26T03:59:57-adf8883/darwin-amd64-race
2019-06-25T21:30:39-343b7fa/darwin-amd64-race
2019-06-25T17:51:25-a6ad626/darwin-amd64-race
2019-06-25T16:43:17-3f83c83/darwin-amd64-race
2019-06-24T21:26:46-5f7e9ce/darwin-amd64-race
2019-06-14T16:46:12-80f8913/darwin-amd64-race
2019-06-13T17:12:09-f83c44e/darwin-amd64-race
2019-06-13T14:14:02-55d31e1/darwin-amd64-race
2019-06-13T03:58:11-418c7d8/darwin-amd64-race
2019-06-12T22:17:38-c0c6cee/darwin-amd64-race
2019-06-12T18:03:46-f44404e/darwin-amd64-race
2019-06-11T17:54:27-9f76566/darwin-amd64-race
2019-06-10T23:23:39-ef84fa0/darwin-amd64-race
2019-06-10T20:32:36-5ce1819/darwin-amd64-race
2019-06-10T16:19:39-8f296f5/darwin-amd64-race
2019-06-05T17:53:53-e9a136d/darwin-amd64-race
2019-06-05T17:25:56-195e753/darwin-amd64-race
2019-06-05T00:53:14-ac921da/darwin-amd64-race
2019-05-24T21:34:53-385b2e0/darwin-amd64-race
2019-05-17T22:32:30-1ab063c/darwin-amd64-race
2019-05-13T17:01:25-db2bf15/darwin-amd64-race
2019-05-09T21:13:46-56c1cf3/darwin-amd64-race
2019-05-09T21:13:23-60f6763/darwin-amd64-race
2019-05-09T21:13:19-fe67ce2/darwin-amd64-race
2019-05-09T15:32:36-b6f59cb/darwin-amd64-race
2019-04-30T18:29:07-888bac1/darwin-amd64-race
2019-04-29T22:35:35-98c5a56/darwin-amd64-race
2019-04-29T04:05:29-d016330/darwin-amd64-race

@gopherbot gopherbot added this to the Unreleased milestone Apr 30, 2020
@andybons andybons self-assigned this Apr 30, 2020
@andybons andybons removed their assignment Aug 4, 2020
@bcmills bcmills changed the title x/build: darwin-amd64-race may need a GO_TEST_TIMEOUT_SCALE cmd/go: test timeouts on darwin builders Feb 19, 2021
@bcmills bcmills removed the Builders label Feb 19, 2021
@bcmills bcmills modified the milestones: Unreleased, Backlog Feb 19, 2021
@bcmills bcmills changed the title cmd/go: test timeouts on darwin builders cmd/go: test timeouts without useful debugging output on darwin builders Feb 19, 2021
@bcmills bcmills modified the milestones: Backlog, Go1.17 Feb 19, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 15, 2021

Change https://golang.org/cl/302052 mentions this issue: cmd/go: bail out from script tests earlier when a timeout occurs

@bcmills
Copy link
Member Author

@bcmills bcmills commented Mar 16, 2021

CL 302052 “fixes” this issue in the sense that the timeouts will probably continue to occur, but should produce more useful debugging output when that happens. If it is successful in that regard, we can file a separate issue for whatever failure mode that uncovers.

@gopherbot gopherbot closed this in 72d98df Mar 16, 2021
@bcmills bcmills self-assigned this Mar 16, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Mar 17, 2021

Still not useful, but we're narrowing in on it, I think.
Looks like we also need to add useful timeouts for (*testgoData).run:

goroutine 117 [syscall, 5 minutes]:
syscall.syscall6(0x10d7940, 0x3e89, 0xc0006a7ad8, 0x0, 0xc000c881b0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/sys_darwin.go:41 +0x2e
syscall.wait4(0x3e89, 0xc0006a7ad8, 0x0, 0xc000c881b0, 0xc000c881b0, 0xc0006a7af8, 0x1050558)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/syscall/zsyscall_darwin_amd64.go:43 +0xa5
syscall.Wait4(0x3e89, 0xc0006a7b40, 0x0, 0xc000c881b0, 0x105026a, 0xc000666170, 0xc0005ae240)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/syscall/syscall_bsd.go:145 +0x66
os.(*Process).wait(0xc0001900f0, 0xc0007e41e0, 0xc0007e4210, 0xc0007e4160)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec_unix.go:44 +0x159
os.(*Process).Wait(...)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec.go:129
os/exec.(*Cmd).Wait(0xc0007e4160, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec/exec.go:507 +0x105
os/exec.(*Cmd).Run(0xc0007e4160, 0x81, 0xc0006680c0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/os/exec/exec.go:341 +0x8a
cmd/go_test.(*testgoData).doRun(0xc00050e0b0, 0xc0006680c0, 0x4, 0x4, 0x40, 0xc0006680c0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/go_test.go:411 +0x518
cmd/go_test.(*testgoData).run(0xc00050e0b0, 0xc0006680c0, 0x4, 0x4)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/go_test.go:427 +0xa6
cmd/go_test.TestCacheCoverage(0xc0004fe480)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/go_test.go:2318 +0x3c5
testing.tRunner(0xc0004fe480, 0x1b01cd0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1236 +0x22e
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

https://build.golang.org/log/ad208dadaa44cdaade5cf4e5d39ab96f78ab183d

@bcmills bcmills reopened this Mar 17, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Mar 19, 2021

Change https://golang.org/cl/303330 mentions this issue: testing: allow parallel-subtest goroutines to exit when the subtest is complete

gopherbot pushed a commit that referenced this issue Mar 19, 2021
…s complete

Fixes #45127
Updates #38768

Change-Id: I7f41901d5bcc07741ac9f5f2a24d2b07ef633cb1
Reviewed-on: https://go-review.googlesource.com/c/go/+/303330
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bcmills
Copy link
Member Author

@bcmills bcmills commented Mar 25, 2021

https://build.golang.org/log/ebdf7a853ae4dba70c0d76c53f5885501cd54668 has what seems to be a fairly useful trace; it seems to indicate that os.Interrupt is not succeeding in terminating a go subprocess.

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 9, 2021

Change https://golang.org/cl/308809 mentions this issue: cmd/go: in TestScript, set GOTRACEBACK and use SIGQUIT to terminate hung subprocesses

gopherbot pushed a commit that referenced this issue Apr 9, 2021
…ung subprocesses

This should produce more useful outputs if a cmd/go invocation hangs
during a test.

(In some cases the outputs will be very verbose, but verbose is still
better than uninformative.)

For #36460
For #38768

Change-Id: Ibaf2d0fbf7387dfab1aad8981140ecb11901fc4b
Reviewed-on: https://go-review.googlesource.com/c/go/+/308809
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Apr 27, 2021

Are we going to do anything else here for 1.17? Thanks.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Apr 27, 2021

I'm planning to investigate this further during the freeze. I've managed to pare down the failure output enough that most of what remains is goroutines stuck in os/exec and syscall, and I suspect that it boils down to a platform bug of some sort (and/or a bug in the os package on macOS).

@Thor-Bjorgvinsson
Copy link

@Thor-Bjorgvinsson Thor-Bjorgvinsson commented May 17, 2021

Could this halting command issue on mac be related? #38824

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants