Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/cmd/godoc: frequent failures in TestWeb/GOPATH on dragonfly-amd64 builder #50014

Open
bcmills opened this issue Dec 7, 2021 · 9 comments
Open

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 7, 2021

--- FAIL: TestWeb (37.91s)
    --- FAIL: TestWeb/GOPATH (15.11s)
        godoc_test.go:86: server failed to respond in 15s
FAIL
FAIL	golang.org/x/tools/cmd/godoc	68.450s

greplogs --dashboard -md -l -e 'FAIL: TestWeb .*\n\s+.*\n\s+.*server failed to respond'

2021-12-06T14:58:33-c882a49-f8a8a73/dragonfly-amd64
2021-12-05T12:50:44-c882a49-ecf6b52/dragonfly-amd64
2021-12-04T04:50:55-c882a49-549cfef/openbsd-386-70-n2d
2021-12-04T04:41:31-c882a49-cd5f2cf/dragonfly-amd64
2021-12-04T01:07:28-c882a49-fa88ba1/openbsd-386-70-n2d
2021-12-04T01:07:10-c882a49-ba83aa7/dragonfly-amd64
2021-12-03T22:57:02-c882a49-9ae0b35/openbsd-386-70-n2d
2021-12-03T21:25:05-c882a49-d20a0bf/dragonfly-amd64
2021-12-03T21:23:11-c882a49-b3e1fbf/dragonfly-amd64
2021-12-03T18:55:11-c882a49-0f2d0d0/openbsd-386-70-n2d
2021-12-03T18:55:02-f64c0f4-c4a8550/dragonfly-amd64
2021-12-03T18:27:20-e212aff-c4a8550/dragonfly-amd64
2021-12-03T01:09:21-e212aff-0985990/openbsd-386-70-n2d
2021-12-03T00:46:20-e212aff-8da66a3/dragonfly-amd64
2021-12-02T16:48:07-e212aff-36be0be/openbsd-386-70-n2d
2021-12-01T15:05:46-615f9a6-b7651e5/openbsd-386-70-n2d
2021-11-30T22:42:17-1fd30d2-7ccbcc9/openbsd-386-70-n2d
2021-11-30T18:09:02-2c9b078-931d80e/openbsd-386-70-n2d
2021-11-30T16:43:31-2c9b078-682435d/openbsd-386-70-n2d
2021-11-23T17:30:32-1e71a25-00045b7/openbsd-386-68
2021-11-16T01:10:28-4adea50-9e13a88/openbsd-386-68
2021-11-13T00:50:04-49ce184-c893a85/openbsd-386-68
2021-11-12T18:14:22-fda06c1-8b66b3d/openbsd-386-68
2021-11-10T22:16:25-fc3ed20-4d06839/openbsd-386-68
2021-11-09T20:10:50-e900012-5430203/openbsd-386-68
2021-11-09T18:23:16-e900012-55e6e82/openbsd-386-68
2021-11-02T20:49:06-6561d8c-b246873/openbsd-386-68
2021-08-06T19:38:52-d529aec-fa6aa87/dragonfly-amd64
2021-08-04T15:26:45-309db04-6e73886/dragonfly-amd64
2021-08-04T13:50:16-309db04-7921829/dragonfly-amd64
2021-08-03T19:59:22-32c652e-16ab7e4/dragonfly-amd64
2021-08-03T16:06:16-594b3a2-8a7ee4c/dragonfly-amd64
2021-08-03T16:06:16-594b3a2-7921829/dragonfly-amd64
2021-08-02T19:06:04-a668498-8a7ee4c/dragonfly-amd64
2021-07-27T22:01:54-07bc1bf-7cd10c1/dragonfly-amd64
2021-07-26T23:46:13-07bc1bf-840e583/dragonfly-amd64
2021-07-26T20:36:31-07bc1bf-ed8cbbc/dragonfly-amd64
2021-07-26T20:36:31-07bc1bf-ecaa681/dragonfly-amd64
2021-07-22T17:13:07-251092d-c6d89db/dragonfly-amd64
2021-07-21T23:55:41-412ee17-3e48c03/dragonfly-amd64
2021-07-21T20:52:30-7f68387-3e48c03/dragonfly-amd64
2021-07-21T16:31:48-7aa8294-bc51e93/dragonfly-amd64
2021-07-13T13:36:40-d36a54b-c6d89db/dragonfly-amd64
2021-07-12T16:16:36-980829d-ab4085c/dragonfly-amd64
2021-07-12T16:16:36-980829d-3d1d066/dragonfly-amd64
2021-07-10T00:32:22-5b540d3-ab4085c/dragonfly-amd64
2021-07-09T19:50:55-e33c0f2-ab4085c/dragonfly-amd64
2021-07-09T17:21:04-8e32e9f-fb052db/dragonfly-amd64
2021-07-08T23:16:08-6994825-3d1d066/dragonfly-amd64
2021-07-08T19:56:07-71eae3a-3d1d066/dragonfly-amd64
2021-07-07T20:25:29-fd00574-f264879/dragonfly-amd64
2021-07-02T16:25:10-7edcfe5-6125d0c/dragonfly-amd64
2021-07-01T01:38:42-f0847e0-9d65578/dragonfly-amd64
2021-06-30T22:02:09-f0847e0-4711bf3/dragonfly-amd64
2021-06-22T16:01:58-d25f906-5bd09e5/dragonfly-amd64

@bcmills
Copy link
Member Author

@bcmills bcmills commented Dec 7, 2021

At this failure rate, this looks to me like a release-blocker via #11811. The test should either be fixed or skipped on the affected builders.

Looking at the test, I see at least three problems with the timeout strategy:

  1. If the server actually wedges, its output logs (and, critically, its goroutine dump) is not logged anywhere. (The process is killed with cmd.Process.Kill instead of a signal that would give it the opportunity to dump goroutines.)

  2. The deadlines are hard-coded, inconsistently, without comments explaining why those deadlines should be fundamental. (One server gets only 15s, while the others get two minutes each.)

Probably for this issue we can just fix (2), although it would be nice to address (1) as well.

@bcmills bcmills removed this from the Unreleased milestone Dec 7, 2021
@bcmills bcmills added this to the Go1.18 milestone Dec 7, 2021
@bcmills bcmills changed the title x/tools/cmd/godoc: frequent failures in TestWeb on dragonfly-amd64 builder x/tools/cmd/godoc: frequent failures in TestWeb/GOPATH on dragonfly-amd64 builder Dec 7, 2021
@bcmills
Copy link
Member Author

@bcmills bcmills commented Dec 7, 2021

Looking at this further, the failures are always in TestWeb/GOPATH. Perhaps we should just skip that subtest in -short mode, since at this point most users are in module mode anyway.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Dec 10, 2021

TestWeb/GOPATH takes less than a second on my workstation. I think this is uncovering a real bug in either godoc or the test itself.`

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Dec 16, 2021

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 17, 2021

Change https://golang.org/cl/373005 mentions this issue: internal/moreexec: add a utility package for manipulating os/exec.Cmd

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 12, 2022

Change https://golang.org/cl/377835 mentions this issue: internal/testenv: add a Command function that replaces exec.Command

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 12, 2022

Change https://golang.org/cl/377836 mentions this issue: cmd/godoc: streamline subprocesses

@bcmills bcmills self-assigned this Jan 12, 2022
@bcmills
Copy link
Member Author

@bcmills bcmills commented Jan 12, 2022

The test needs better diagnostics. Proposal #50436 is at the bottom of this yak stack, but in the meantime we can polyfill the needed API into an internal package.

@dmitshur dmitshur self-assigned this Jan 12, 2022
@bcmills
Copy link
Member Author

@bcmills bcmills commented Jan 21, 2022

Intriguingly, the dragonfly-amd64 failures are no longer occurring after the builder was upgraded for #50538 (thanks, @tuxillo!).

We should still add better failure diagnostics to this test, but since it is no longer failing on the builders it won't mask other regressions, and doesn't need to be a release-blocker.

@bcmills bcmills removed this from the Go1.18 milestone Jan 21, 2022
@bcmills bcmills added this to the Backlog milestone Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants