Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: "broken pipe" on testlog.txt while running test #35469

Open
bcmills opened this issue Nov 8, 2019 · 11 comments
Open

os/exec: "broken pipe" on testlog.txt while running test #35469

bcmills opened this issue Nov 8, 2019 · 11 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Nov 8, 2019

Several examples with this failure mode:

testing: can't write /workdir/tmp/go-build295821694/b061/testlog.txt: write /workdir/tmp/go-build295821694/b061/testlog.txt: broken pipe
FAIL	os/exec	0.627s

I suspect that this has something to do with exec'ing the test itself as a subprocess.

CC @bradfitz @ianlancetaylor

2019-11-08T20:05:25-7a5e0fe/linux-386-sid
2019-11-08T19:28:49-f6ff806/linux-386-sid
2019-11-07T08:25:32-7a2baa9/linux-mips-rtrk
2019-11-01T03:40:20-4a43a50/linux-386-sid

@bcmills bcmills added this to the Go1.14 milestone Nov 8, 2019
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 8, 2019

  -test.testlogfile file
        write test action log to file (for use only by cmd/go)

I don't recall what that is.

os/exec seems to only run subprocesses with a fresh argv, though, not inheriting that value from cmd/go.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 8, 2019

This likely means that the closeUnexpectedFds function in os/exec/exec_test.go accidentally closed the testlog file.

(The testlog file is used to log what the test does in order to see whether the test results can be cached.)

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Nov 8, 2019

I tried to do something like,

diff --git a/src/os/exec/exec_test.go b/src/os/exec/exec_test.go
index 19d2111743..aa8bb5c97b 100644
--- a/src/os/exec/exec_test.go
+++ b/src/os/exec/exec_test.go
@@ -470,9 +470,14 @@ func basefds() uintptr {
        return n
 }
 
+func isTestLogFD(fd uintptr) bool {
+       target, _ := os.Readlink(fmt.Sprintf("/proc/self/fd/%v", fd))
+       return strings.HasSuffix(target, "testlog.txt")
+}
+
 func closeUnexpectedFds(t *testing.T, m string) {
        for fd := basefds(); fd <= 101; fd++ {
-               if poll.IsPollDescriptor(fd) {
+               if poll.IsPollDescriptor(fd) || isTestLogFD(fd) {
                        continue
                }
                err := os.NewFile(fd, "").Close()

But it never fires, because I then see we already have:

// basefds returns the number of expected file descriptors                                                                                        
// to be present in a process at start.                                                                                                           
// stdin, stdout, stderr, epoll/kqueue, epoll/kqueue pipe, maybe testlog                                                                          
func basefds() uintptr {
        n := os.Stderr.Fd() + 1
        // The poll (epoll/kqueue) descriptor can be numerically                                                                                  
        // either between stderr and the testlog-fd, or after                                                                                     
        // testlog-fd.                                                                                                                            
        for poll.IsPollDescriptor(n) {
                n++
        }
        for _, arg := range os.Args {
                if strings.HasPrefix(arg, "-test.testlogfile=") {
                        n++
                }
        }
        return n
}

So is IsPollDescriptor wrong? Seems unlikely.

I'd snapshot ls -l /proc/self/fd/ and t.Log it so we saw the state of the world before we closed anything, except the tests passes (PASS in output)... it's testing package that fails later (in writeProfiles from m.after), so we'd need to put debug stuff in there.

Perhaps we can do that conditional on being on the builders for now?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 9, 2019

It's interesting that the write is getting an EPIPE error. The testlog file is always a file, created by testing.(*M).before. As far as I can tell it's never a pipe.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 9, 2019

This has been happening for a long time but except for one recent exception only on linux-386-sid.

2018-05-18T18:16:50-1771edb/linux-386-sid
2018-05-22T02:58:04-87a18c6/linux-386-sid
2018-05-22T14:15:46-837ed98/linux-386-sid
2018-05-22T15:55:01-840f25b/linux-386-sid
2018-05-22T20:27:52-402dd10/linux-386-sid
2018-05-23T20:46:02-1dc20e9/linux-386-sid
2018-05-29T17:45:36-db9341a/linux-386-sid
2018-05-30T17:23:57-b12e341/linux-386-sid
2018-05-31T17:36:45-57d40f1/linux-386-sid
2018-06-01T19:33:34-c7519f0/linux-386-sid
2018-06-06T17:59:08-11f6cb4/linux-386-sid
2018-06-06T18:18:20-9e4c344/linux-386-sid
2018-06-06T23:47:44-fe8a0d1/linux-386-sid
2018-06-08T21:07:22-96faeb0/linux-386-sid
2018-06-16T13:35:45-b459e00/linux-386-sid
2018-06-28T19:37:15-578b961/linux-386-sid
2018-07-16T23:37:14-274fde9/linux-386-sid
2018-07-17T14:17:23-5890e25/linux-386-sid
2018-07-18T02:08:58-5760ffc/linux-386-sid
2018-07-18T20:11:46-798f74b/linux-386-sid
2018-07-27T21:13:43-d970519/linux-386-sid
2018-07-28T01:15:05-a74cc47/linux-386-sid
2018-07-28T01:15:16-0090c13/linux-386-sid
2018-07-31T02:02:51-68170aa/linux-386-sid
2018-07-31T22:10:54-740e589/linux-386-sid
2018-08-01T18:44:46-acd30e9/linux-386-sid
2018-08-02T20:40:06-2f46dfb/linux-386-sid
2018-08-03T18:35:45-b800f20/linux-386-sid
2018-08-08T21:37:36-f2131f6/linux-386-sid
2018-08-09T23:56:40-89e13c8/linux-386-sid
2018-08-17T19:21:57-d46587c/linux-386-sid
2018-08-17T23:12:06-0a842d5/linux-386-sid
2018-08-21T02:26:17-97c7e0e/linux-386-sid
2018-08-22T03:38:38-90f2fa0/linux-386-sid
2018-08-24T07:06:58-c907a75/linux-386-sid
2018-08-24T17:31:42-38143ba/linux-386-sid
2018-08-24T18:40:16-05c0244/linux-386-sid
2018-08-28T02:37:18-3ca3e89/linux-386-sid
2018-08-28T14:10:31-2e23475/linux-386-sid
2018-08-30T19:02:30-0dac1e2/linux-386-sid
2018-09-04T20:29:33-1018a80/linux-386-sid
2018-09-05T14:26:11-81957dd/linux-386-sid
2018-09-11T20:28:15-023dbb1/linux-386-sid
2018-09-14T16:46:11-a708353/linux-386-sid
2018-09-15T14:58:35-fb061b5/linux-386-sid
2018-09-23T06:53:10-fdefaba/linux-386-sid
2018-09-27T02:52:14-d159405/linux-386-sid
2018-09-28T09:18:14-38861b5/linux-386-sid
2018-09-28T13:45:09-bf8e6b7/linux-386-sid
2018-10-04T04:08:08-62e5215/linux-386-sid
2018-10-06T12:04:57-963776e/linux-386-sid
2018-10-09T18:19:59-7d2f46d/linux-386-sid
2018-10-09T20:10:02-6c85693/linux-386-sid
2018-10-11T18:01:17-872a547/linux-386-sid
2018-10-15T04:19:20-85066ac/linux-386-sid
2018-10-15T16:53:03-7c96d87/linux-386-sid
2018-10-20T14:19:29-035f9e8/linux-386-sid
2018-10-23T06:42:56-3b091bf/linux-386-sid
2018-10-27T01:36:57-7ec3b5e/linux-386-sid
2018-11-02T17:56:07-745ec8b/linux-386-sid
2018-11-05T22:36:24-f1a9f1d/linux-386-sid
2018-11-08T17:26:22-be5f646/linux-386-sid
2018-11-12T20:46:48-c5718b6/linux-386-sid
2018-11-13T13:42:59-9d025bd/linux-386-sid
2018-11-14T21:47:50-0a40d45/linux-386-sid
2018-11-20T20:03:29-7780709/linux-386-sid
2018-11-23T22:03:37-1ac8499/linux-386-sid
2018-11-30T20:46:23-3ce9e5a/linux-386-sid
2018-12-03T06:01:35-1adbb2b/linux-386-sid
2018-12-10T23:31:21-f64385b/linux-386-sid
2018-12-11T16:41:35-ba60724/linux-386-sid
2018-12-12T02:36:44-ef780fb/linux-386-sid
2018-12-13T00:35:06-09da280/linux-386-sid
2018-12-14T18:23:41-bdc7d56/linux-386-sid
2018-12-14T21:58:38-37b7c99/linux-386-sid
2018-12-17T21:12:59-179acf4/linux-386-sid
2019-01-08T00:35:16-e3eb2ff/linux-386-sid
2019-01-17T16:59:49-79ac638/linux-386-sid
2019-01-30T17:54:01-3a91061/linux-386-sid
2019-02-01T20:35:09-e104ebf/linux-386-sid
2019-02-04T06:53:49-3fc276c/linux-386-sid
2019-02-05T18:11:45-10faf00/linux-386-sid
2019-02-11T23:22:35-7bc2aa6/linux-386-sid
2019-02-26T05:16:56-ac51237/linux-386-sid
2019-02-27T18:04:36-c33a951/linux-386-sid
2019-02-27T18:08:03-58bf401/linux-386-sid
2019-02-27T22:38:32-e32203f/linux-386-sid
2019-03-01T22:15:57-412f659/linux-386-sid
2019-03-05T23:08:13-7ac0a8b/linux-386-sid
2019-03-07T04:46:03-fd19bc6/linux-386-sid
2019-03-07T17:53:44-9a71015/linux-386-sid
2019-03-08T22:40:10-d3dd258/linux-386-sid
2019-03-12T07:27:56-14a58d6/linux-386-sid
2019-03-15T23:27:36-7765576/linux-386-sid
2019-03-17T17:00:42-746f405/linux-386-sid
2019-03-19T03:20:43-b3cfb0b/linux-386-sid
2019-03-21T01:34:26-ba96564/linux-386-sid
2019-03-22T15:40:37-b06d212/linux-386-sid
2019-03-22T17:26:03-d923309/linux-386-sid
2019-03-27T03:41:10-08ba9c0/linux-386-sid
2019-04-02T16:03:24-3aacfce/linux-386-sid
2019-04-02T17:40:15-578e281/linux-386-sid
2019-04-04T00:26:24-964fe4b/linux-386-sid
2019-04-15T19:30:44-6b7114b/linux-386-sid
2019-04-16T13:25:16-b39d0ea/linux-386-sid
2019-04-16T15:33:18-c4953a6/linux-386-sid
2019-04-16T18:49:50-6997671/linux-386-sid
2019-04-19T15:40:42-4aeac68/linux-386-sid
2019-04-22T23:02:46-ef2806e/linux-386-sid
2019-04-26T17:34:54-6c04c26/linux-386-sid
2019-04-30T01:32:48-d021dd6/linux-386-sid
2019-04-30T22:20:57-65b89c3/linux-386-sid
2019-05-01T13:25:02-7ee2213/linux-386-sid
2019-06-09T16:23:11-323212b/linux-386-sid
2019-07-15T23:42:01-7a853b6/linux-386-sid
2019-08-23T17:16:16-5c379a4/linux-386-sid
2019-08-29T15:37:46-647dc1a/linux-386-sid
2019-09-25T18:48:21-6c15c7c/linux-386-sid
2019-10-11T15:08:12-9036351/linux-386-sid
2019-10-15T16:45:31-11d7775/linux-386-sid
2019-10-17T21:06:37-82bc00d/linux-386-sid
2019-11-01T03:40:20-4a43a50/linux-386-sid
2019-11-07T08:25:32-7a2baa9/linux-mips-rtrk
2019-11-08T19:28:49-f6ff806/linux-386-sid
2019-11-08T20:05:25-7a5e0fe/linux-386-sid

@bcmills
Copy link
Member Author

@bcmills bcmills commented Nov 11, 2019

@gopherbot
Copy link

@gopherbot gopherbot commented Nov 13, 2019

Change https://golang.org/cl/206939 mentions this issue: os/exec: don't run TestExtraFiles if extra files were open for the test

gopherbot pushed a commit that referenced this issue Nov 13, 2019
Our attempts to close existing open files are flaky. They will fail if,
for example, file descriptor 3 is open when the test binary starts.
Instead, report any such cases, and skip TestExtraFiles.

Updates #35469

Change-Id: I7caec083f3f4a31579bf28fc9c82ae89b1bde49a
Reviewed-on: https://go-review.googlesource.com/c/go/+/206939
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@gopherbot
Copy link

@gopherbot gopherbot commented Nov 13, 2019

Change https://golang.org/cl/207078 mentions this issue: os/exec: skip poll descriptors when checking for open descriptors

gopherbot pushed a commit that referenced this issue Nov 13, 2019
It turns out that there is a path that initializes netpoll and opens
file descriptors before running the os/exec init function: on some
systems, the uses of NewFile when setting os.Stdin and friends can
initialize netpoll which can open file descriptors. This in itself
is not a problem, but when we check whether the new files are open
using os.NewFile, a side-effect is to put them into non-blocking mode.
This can then break future uses of netpoll.

Updates #35469
Fixes #35566

Change-Id: I1b2e2c943695d1c2d29496b050abbce9ee710a00
Reviewed-on: https://go-review.googlesource.com/c/go/+/207078
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Nov 16, 2019

This is rare and has been happening for a long time. The CLs should give us more information if it happens again. Kicking to 1.15.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.14, Go1.15 Nov 16, 2019
@odeke-em
Copy link
Member

@odeke-em odeke-em commented May 23, 2020

We haven't heard much in the past 6 months from bug reports. Moving this Go1.16 and perhaps then we can move it to Unreleased or decide how to proceed from then on.

@odeke-em odeke-em modified the milestones: Go1.15, Go1.16 May 23, 2020
@odeke-em
Copy link
Member

@odeke-em odeke-em commented Feb 5, 2021

Punting to Go1.17.

@odeke-em odeke-em modified the milestones: Go1.16, Go1.17 Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants