Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: TestExtraFiles closes random fds #5073

Open
bradfitz opened this issue Mar 18, 2013 · 17 comments

Comments

Projects
None yet
4 participants
@bradfitz
Copy link
Member

commented Mar 18, 2013

The new pollster should probably handle out of fds more gracefully.

Fixing an unrelated fd leak, I wrote a test to intentionally run out of file descriptors.

With the new scheduler, you get pages of this ~forever, EBADF from epoll_wait, I believe:

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with 9
epollwait failed with
9
epollwait failed with epollwait failed with 9
9
epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 9
9
epollwait failed with 9epollwait failed with 9

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with epollwait failed with 99

epollwait failed with 9epollwait failed with
epollwait failed with 9
9
epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with
epollwait failed with 9
9
epollwait failed with epollwait failed with 99

epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 99

epollwait failed with epollwait failed with 99
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
epollwait failed with
epollwait failed with 9
9epollwait failed with 9
@dvyukov

This comment has been minimized.

Copy link
Member

commented Mar 18, 2013

Comment 1:

Please share the test.
I am curious bad fd is passed to epoll and not rejected earlier...
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Mar 18, 2013

Comment 2:

The test is in https://golang.org/cl/7799046/
Make sure that fix is not in your tree (if I submit it soon) or sync to 43eb97ed849a,
and then patch in that test  to os/exec/exec_test.go (after TestPipes), making the loop
count much more than 4.  Say, 50:
// issue #5071
func TestPipeLookPathLeak(t *testing.T) {
        fd0 := numOpenFDS(t)
        for i := 0; i < 50; i++ {
                cmd := Command("something-that-does-not-exist-binary")
            cmd.StdoutPipe()
                cmd.StderrPipe()
            cmd.StdinPipe()
                if err := cmd.Run(); err == nil {
                        t.Fatal("unexpected success")
            }
        }
        fdGrowth := numOpenFDS(t) - fd0
    if fdGrowth > 2 {
                t.Errorf("leaked %d fds; want ~0", fdGrowth)
        }
}
func numOpenFDS(t *testing.T) int {
        lsof, err := Command("lsof", "-n", "-p", strconv.Itoa(os.Getpid())).Output()
        if err != nil {
                t.Skip("skipping test; error finding or running lsof")
                return 0
        }
    return bytes.Count(lsof, []byte("\n"))
}
...
And then run "go test -v os/exec" (run all tests, not just the leaky one, because a
future one must use epoll_wait)
@dvyukov

This comment has been minimized.

Copy link
Member

commented Mar 22, 2013

Comment 3:

Can not reproduce.
I apply the following patch:
--- a/src/pkg/os/exec/exec.go   Thu Mar 21 22:37:13 2013 -0700
+++ b/src/pkg/os/exec/exec.go   Fri Mar 22 12:22:54 2013 +0400
@@ -235,8 +235,8 @@
 // Start starts the specified command but does not wait for it to complete.
 func (c *Cmd) Start() error {
    if c.err != nil {
-       c.closeDescriptors(c.closeAfterStart)
-       c.closeDescriptors(c.closeAfterWait)
+       // c.closeDescriptors(c.closeAfterStart)
+       // c.closeDescriptors(c.closeAfterWait)
        return c.err
    }
    if c.Process != nil {
diff -r 9ca85035f95a src/pkg/os/exec/exec_test.go
--- a/src/pkg/os/exec/exec_test.go  Thu Mar 21 22:37:13 2013 -0700
+++ b/src/pkg/os/exec/exec_test.go  Fri Mar 22 12:22:54 2013 +0400
@@ -154,7 +154,7 @@
 // issue #5071
 func TestPipeLookPathLeak(t *testing.T) {
    fd0 := numOpenFDS(t)
-   for i := 0; i < 4; i++ {
+   for i := 0; i < 100; i++ {
        cmd := Command("something-that-does-not-exist-binary")
        cmd.StdoutPipe()
        cmd.StderrPipe()
And then:
$ go test -cpu=1,2,4,8,16,32,1,2,4,8,16,32 os/exec
but it only says:
--- FAIL: TestPipeLookPathLeak (0.04 seconds)
    exec_test.go:168: leaked 6 fds; want ~0
--- FAIL: TestPipeLookPathLeak-2 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-4 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-8 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-16 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-32 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-2 (0.14 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-4 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-8 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
--- FAIL: TestPipeLookPathLeak-32 (0.10 seconds)
    exec_test.go:168: leaked 600 fds; want ~0
@bradfitz

This comment has been minimized.

Copy link
Member Author

commented Mar 22, 2013

Comment 4:

I can still reproduce.  You just need to set the number high enough for the ulimit.
ante:src $ hg diff
diff -r ccdacf6833dc src/pkg/os/exec/exec.go
--- a/src/pkg/os/exec/exec.go   Sat Mar 23 02:17:01 2013 +1100
+++ b/src/pkg/os/exec/exec.go   Fri Mar 22 08:32:50 2013 -0700
@@ -235,8 +235,8 @@
 // Start starts the specified command but does not wait for it to complete.
 func (c *Cmd) Start() error {
    if c.err != nil {
-       c.closeDescriptors(c.closeAfterStart)
-       c.closeDescriptors(c.closeAfterWait)
+       //c.closeDescriptors(c.closeAfterStart)
+       //c.closeDescriptors(c.closeAfterWait)
        return c.err
    }
    if c.Process != nil {
diff -r ccdacf6833dc src/pkg/os/exec/exec_test.go
--- a/src/pkg/os/exec/exec_test.go  Sat Mar 23 02:17:01 2013 +1100
+++ b/src/pkg/os/exec/exec_test.go  Fri Mar 22 08:32:50 2013 -0700
@@ -154,7 +154,7 @@
 // issue #5071
 func TestPipeLookPathLeak(t *testing.T) {
    fd0 := numOpenFDS(t)
-   for i := 0; i < 4; i++ {
+   for i := 0; i < 2000; i++ {
        cmd := Command("something-that-does-not-exist-binary")
        cmd.StdoutPipe()
        cmd.StderrPipe()
ante:src $ hg id
ccdacf6833dc+ tip
$ go test -v os/exec
Interestingly, I see three outcomes:
1) sometimes it passes.
2) sometimes it fails with two goroutines racing on stdout with garbled output (like in
my original report above), and 
3) sometimes it's only one goroutine looping the errors:
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
epollwait failed with 9
@dvyukov

This comment has been minimized.

Copy link
Member

commented Mar 25, 2013

Comment 5:

Hit deadlock in this test:
https://golang.org/issue/5123
@dvyukov

This comment has been minimized.

Copy link
Member

commented Mar 25, 2013

Comment 6:

I see in strace that somebody actually closes epoll fd.
@dvyukov

This comment has been minimized.

Copy link
Member

commented Mar 26, 2013

Comment 7:

It is simply this bad guy:
--- FAIL: TestPipeLookPathLeak (0.21 seconds)
    exec_test.go:168: leaked 12 fds; want ~0
=== RUN TestExtraFiles
--- PASS: TestExtraFiles (0.02 seconds)
    exec_test.go:209: Something already leaked - closed fd 3
    exec_test.go:209: Something already leaked - closed fd 4
    exec_test.go:209: Something already leaked - closed fd 5
    exec_test.go:209: Something already leaked - closed fd 6
    exec_test.go:209: Something already leaked - closed fd 7
    exec_test.go:209: Something already leaked - closed fd 8
It's not particularly cool to close random fds.
I think the solution is to limit the error message storm in such cases and move this
test into a separate process.
@robpike

This comment has been minimized.

Copy link
Contributor

commented May 18, 2013

Comment 8:

Labels changed: added go1.2maybe, removed go1.1maybe.

@dvyukov

This comment has been minimized.

Copy link
Member

commented Jul 30, 2013

Comment 9:

The message storm was fixed by:
changeset:   16430:163d528e7436
user:        Dmitriy Vyukov <dvyukov@google.com>
date:        Tue Mar 26 20:25:43 2013 +0400
summary:     runtime: does not report duplicate errors in netpoll
I don't know what to do with these strange tests that close random fds.

Owner changed to ---.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2013

Comment 10:

Labels changed: added feature.

@dvyukov

This comment has been minimized.

Copy link
Member

commented Aug 10, 2013

Comment 11:

Labels changed: removed feature.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Sep 10, 2013

Comment 12:

Labels changed: added testing.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2013

Comment 13:

Labels changed: added go1.3, removed go1.2maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2013

Comment 14:

Labels changed: added release-go1.3.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2013

Comment 15:

Labels changed: removed go1.3.

@rsc

This comment has been minimized.

Copy link
Contributor

commented Dec 4, 2013

Comment 16:

Labels changed: added repo-main.

@rsc

This comment has been minimized.

Copy link
Contributor

commented May 11, 2014

Comment 17:

Labels changed: added release-none, suggested, removed release-go1.3.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

@rsc rsc removed release-none labels Apr 10, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.