Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/signal: test failures with "signal: interrupt" and no other output #41561

Open
bcmills opened this issue Sep 22, 2020 · 11 comments
Open

os/signal: test failures with "signal: interrupt" and no other output #41561

bcmills opened this issue Sep 22, 2020 · 11 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Sep 22, 2020

2020-09-22T15:14:09-d42b32e/linux-386-longtest
2020-09-22T15:13:57-53c9b95/darwin-arm64-corellium

signal: interrupt
FAIL	os/signal	13.095s

It's a strange failure mode, and seems even stranger given that we've only seen two occurrences, on very different builders but at very nearly the same time. (CC @golang/osp-team in case this is due to a cmd/coordinator issue.)

@bcmills bcmills added this to the Backlog milestone Sep 22, 2020
@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 9, 2020

2020-11-09T14:17:30-f858c22/solaris-amd64-oraclerel
2020-11-03T01:27:45-cc0930c/darwin-amd64-nocgo
2020-11-03T01:27:45-cc0930c/solaris-amd64-oraclerel
2020-11-02T00:46:44-e463c28/linux-amd64-clang
2020-10-30T15:25:49-e1faebe/illumos-amd64
2020-10-29T15:13:09-50af50d/netbsd-arm-bsiegert
2020-10-29T04:12:30-53efbdb/linux-amd64-jessie
2020-10-27T12:36:54-320cc79/netbsd-amd64-9_0
2020-10-27T12:00:35-333e904/netbsd-arm64-bsiegert
2020-10-26T18:29:24-a8b28eb/netbsd-arm64-bsiegert
2020-10-26T18:20:05-ae585ee/darwin-amd64-nocgo
2020-10-23T00:22:00-400581b/netbsd-amd64-9_0
2020-10-22T19:43:26-c92bfac/netbsd-amd64-9_0
2020-10-22T17:11:03-f8aecbb/solaris-amd64-oraclerel
2020-10-22T15:10:01-de74ea5/linux-arm64-aws
2020-10-22T01:20:16-4c7a18d/illumos-amd64
2020-10-21T14:34:44-15ead85/netbsd-amd64-9_0
2020-10-17T07:18:20-c8f6135/netbsd-amd64-9_0
2020-10-16T12:48:42-e981936/linux-amd64-nocgo
2020-10-16T04:13:45-af87480/netbsd-arm-bsiegert
2020-10-15T21:40:46-21e441c/netbsd-arm-bsiegert
2020-10-15T18:35:44-1bcf6be/illumos-amd64
2020-10-14T20:17:49-2ec71e5/solaris-amd64-oraclerel
2020-10-12T22:34:47-027367a/darwin-amd64-race
2020-10-12T18:31:36-8994607/illumos-amd64
2020-10-12T18:31:22-2f4368c/darwin-amd64-10_15
2020-10-09T15:13:57-eb67eab/darwin-amd64-race
2020-10-08T18:56:11-46ab0c0/netbsd-amd64-9_0
2020-10-07T15:57:48-ccf89be/darwin-amd64-10_14
2020-10-06T22:55:40-234de9e/netbsd-arm64-bsiegert
2020-10-06T07:37:55-8e20388/darwin-amd64-10_15
2020-10-05T17:31:26-b064eb7/linux-386-clang
2020-10-02T20:23:33-21eb3dc/linux-s390x-ibm
2020-10-02T18:57:36-d888f1d/darwin-arm64-corellium
2020-09-29T19:01:28-567ef8b/linux-amd64
2020-09-29T17:25:24-66770f4/solaris-amd64-oraclerel
2020-09-29T06:10:34-6fc094c/darwin-amd64-race
2020-09-24T20:41:14-23cc16c/aix-ppc64
2020-09-24T18:05:54-25a33da/netbsd-amd64-9_0
2020-09-24T16:21:59-83e8bf2/darwin-arm64-corellium
2020-09-23T17:10:35-bc320fc/aix-ppc64
2020-09-22T16:52:11-4e1d812/netbsd-amd64-9_0

@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 9, 2020

This does not appear to be a cmd/coordinator issue, but does appear to be a regression in Go 1.16, so marking as release-blocker until we better understand the cause.

@bcmills bcmills modified the milestones: Backlog, Go1.16 Nov 9, 2020
@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 9, 2020

(CC @ianlancetaylor @cherrymui for signaly symptoms)

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 9, 2020

I think this is due to TestNotifyContextSimultaneousNotifications, which sends to the running process a lot of SIGINT signals. The signal could land after the test is done and we have unregistered the signal handler.

Maybe we should wait some time until all pending signals arrive, or run that particular test in a separate process. Or use a non-fatal signal.

@cherrymui
Copy link
Contributor

@cherrymui cherrymui commented Nov 9, 2020

TestNotifyContextSimultaneousNotifications is new in Go 1.16, so that explains the regression.

@bcmills bcmills added the Testing label Nov 10, 2020
@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 10, 2020

CC @henvic

@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 10, 2020

Maybe we should wait some time until all pending signals arrive

That is exactly what the existing quiesce helper function attempts to do. It should probably be used in that test too.

or run that particular test in a separate process.

I think that's probably the right long-term direction for the os/signal tests. (See also the discussion on CL 232378.)

Or use a non-fatal signal.

Maybe, although I would worry that would just mask the timing issue and lead to crosstalk with the other tests.

@henvic
Copy link
Contributor

@henvic henvic commented Nov 10, 2020

Thanks @bcmills, I'm going to take a look.

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 16, 2020

Change https://golang.org/cl/270198 mentions this issue: os/signal: fix flaky tests for NotifyContext.

@henvic
Copy link
Contributor

@henvic henvic commented Nov 16, 2020

Hey @bcmills,

Is something like this the direction you think tests should go?
https://go-review.googlesource.com/c/go/+/270198

Just wondering: is it on purpose that no -v (verbose) is used when running the tests on the CI? I ask because I don't know how to access further information about the failed tests (if they're stored somewhere).

@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 16, 2020

That CL does look like the right general direction, although I would be inclined to use os.Args[0] or os.Executable to re-exec the test process itself, rather than invoking go run on a separate program.

(See TestDetectNohup and TestNoup for existing examples of that style in the os/signal tests.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.