Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

cmd/cgo: Mac OS X 10.6 can leak fds to child processes #2603

Closed
bradfitz opened this Issue Dec 21, 2011 · 35 comments

Comments

Projects
None yet
9 participants
Owner

bradfitz commented Dec 21, 2011

From rsc@:


The gobuilder binary imports net/http which imports
crypto/tls, which uses cgo to look up the TLS certificates.
I don't know why the builder would have done that,
but it does appear to create a Unix domain socket.
In fact it looks like all kinds of interesting stuff leaks
on Snow Leopard:

package main

import (
       _ "crypto/tls"
       "fmt"
       "net/http"
       "os"
       "os/exec"
)

func main() {
       if len(os.Args) <= 1 {
               http.Get("https://www.google.com/")
       }
       x, y := exec.Command("lsof", "-p", fmt.Sprint(os.Getpid())).CombinedOutput()
       fmt.Printf("%s\n%v\n", x, y)
       if len(os.Args) <= 1 {
               x, y = exec.Command(os.Args[0], "child").CombinedOutput()
               fmt.Printf("%s\n%v\n", x, y)
       }
}

$ go run x.go
COMMAND   PID USER   FD     TYPE             DEVICE  SIZE/OFF     NODE NAME
a.out   31270  rsc  cwd      DIR               14,2      5678 515126
/Users/rsc
a.out   31270  rsc  txt      REG               14,2   3478748 10616124
/private/var/folders/++/++-J9E++6+0++4RjPqRgNE++JGo/-Tmp-/go-build193370353/_/x/_obj/a.out
a.out   31270  rsc  txt      REG               14,2     51288 10616126
/private/var/folders/++/++-J9E++6+0++4RjPqRgNE++JGo/-Caches-/mds/mdsDirectory.db
a.out   31270  rsc  txt      REG               14,2     32768 10250465
/private/var/db/mds/messages/se_SecurityMessages
a.out   31270  rsc  txt      REG               14,2     58804  1306892
/Library/Keychains/System.keychain
a.out   31270  rsc  txt      REG               14,2    412424  2318973
/System/Library/Keychains/SystemRootCertificates.keychain
a.out   31270  rsc  txt      REG               14,2   1054960   463241
/usr/lib/dyld
a.out   31270  rsc  txt      REG               14,2 234414080 10250368
/private/var/db/dyld/dyld_shared_cache_x86_64
a.out   31270  rsc    0r     CHR                3,2       0t0      299 /dev/null
a.out   31270  rsc    1u     CHR                4,8  0t777746      323
/dev/ttyp8
a.out   31270  rsc    2u     CHR                4,8  0t777746      323
/dev/ttyp8
a.out   31270  rsc    3r     CHR                3,2       0t0      299 /dev/null
a.out   31270  rsc    4u    IPv4 0xffffff8024b4da08       0t0      TCP
helix.cam.corp.google.com:53403->74.125.226.112:https (ESTABLISHED)
a.out   31270  rsc    5u    unix 0xffffff8029897940       0t0
->0xffffff802435b840
a.out   31270  rsc    6     PIPE 0xffffff80290d5940     16384
->0xffffff801dd7b440
a.out   31270  rsc    7     PIPE 0xffffff801dd7b440     16384
->0xffffff80290d5940
a.out   31270  rsc    8u  KQUEUE
count=0, state=0x2
a.out   31270  rsc    9u  KQUEUE
count=0, state=0x2
a.out   31270  rsc   10r     CHR                9,1    0t4096      574
/dev/urandom
a.out   31270  rsc   12     PIPE 0xffffff801dd794a0     16384
->0xffffff80290d6650

<nil>
COMMAND   PID USER   FD   TYPE             DEVICE  SIZE/OFF     NODE NAME
a.out   31273  rsc  cwd    DIR               14,2      5678 515126 /Users/rsc
a.out   31273  rsc  txt    REG               14,2   3478748 10616124
/private/var/folders/++/++-J9E++6+0++4RjPqRgNE++JGo/-Tmp-/go-build193370353/_/x/_obj/a.out
a.out   31273  rsc  txt    REG               14,2   1054960   463241
/usr/lib/dyld
a.out   31273  rsc  txt    REG               14,2 234414080 10250368
/private/var/db/dyld/dyld_shared_cache_x86_64
a.out   31273  rsc    0r   CHR                3,2       0t0      299 /dev/null
a.out   31273  rsc    1   PIPE 0xffffff80290d6650     16384
->0xffffff801dd794a0
a.out   31273  rsc    2   PIPE 0xffffff80290d6650     16384
->0xffffff801dd794a0
a.out   31273  rsc    3r   CHR                3,2       0t0      299 /dev/null
a.out   31273  rsc    5u  unix 0xffffff8029897940       0t0
->0xffffff802435b840
a.out   31273  rsc    6   PIPE 0xffffff801dd7b390     16384
->0xffffff802458d230
a.out   31273  rsc   10r   CHR                9,1    0t4096      574
/dev/urandom
a.out   31273  rsc   11r   CHR                3,2       0t0      299 /dev/null

<nil>

And on Lion:


COMMAND  PID USER   FD     TYPE             DEVICE  SIZE/OFF    NODE NAME
a.out   1884  rsc  cwd      DIR               14,5     15878  411317 /Users/rsc
a.out   1884  rsc  txt      REG               14,5   3486932 3929759
/private/var/folders/mw/qfnx8hhd1_s9mm9wtbng0hw80000gn/T/go-build202841463/_/x/_obj/a.out
a.out   1884  rsc  txt      REG               14,5     51288 3929761
/private/var/folders/mw/qfnx8hhd1_s9mm9wtbng0hw80000gn/C/mds/mdsDirectory.db
a.out   1884  rsc  txt      REG               14,5     32768 2273792
/private/var/db/mds/messages/se_SecurityMessages
a.out   1884  rsc  txt      REG               14,5    599232 1143044
/usr/lib/dyld
a.out   1884  rsc  txt      REG               14,5 293486592 1247338
/private/var/db/dyld/dyld_shared_cache_x86_64
a.out   1884  rsc    0r     CHR                3,2       0t0     308 /dev/null
a.out   1884  rsc    1u     CHR                4,0 0t1342794     318 /dev/ttyp0
a.out   1884  rsc    2u     CHR                4,0 0t1342794     318 /dev/ttyp0
a.out   1884  rsc    3u   systm                          0t0
a.out   1884  rsc    4u    unix 0xffffff801047b900       0t0
->0xffffff8015222190
a.out   1884  rsc    5u    IPv4 0xffffff800e3a26c0       0t0     TCP
192.168.147.131:55231->lax04s08-in-f18.1e100.net:https (ESTABLISHED)
a.out   1884  rsc    6     PIPE 0xffffff800c0b3bd0     16384
->0xffffff800c0b3910
a.out   1884  rsc    7     PIPE 0xffffff800c0b3910     16384
->0xffffff800c0b3bd0
a.out   1884  rsc    8u  KQUEUE
count=0, state=0x2
a.out   1884  rsc    9u  KQUEUE
count=0, state=0x2
a.out   1884  rsc   10r     CHR               11,1    0t4096     585
/dev/urandom
a.out   1884  rsc   12     PIPE 0xffffff8010f277e0     16384
->0xffffff800dc50b00

<nil>
COMMAND  PID USER   FD    TYPE             DEVICE  SIZE/OFF    NODE NAME
a.out   1888  rsc  cwd     DIR               14,5     15878 411317 /Users/rsc
a.out   1888  rsc  txt     REG               14,5   3486932 3929759
/private/var/folders/mw/qfnx8hhd1_s9mm9wtbng0hw80000gn/T/go-build202841463/_/x/_obj/a.out
a.out   1888  rsc  txt     REG               14,5    599232 1143044
/usr/lib/dyld
a.out   1888  rsc  txt     REG               14,5 293486592 1247338
/private/var/db/dyld/dyld_shared_cache_x86_64
a.out   1888  rsc    0r    CHR                3,2       0t0     308 /dev/null
a.out   1888  rsc    1    PIPE 0xffffff800dc50b00     16384
->0xffffff8010f277e0
a.out   1888  rsc    2    PIPE 0xffffff800dc50b00     16384
->0xffffff8010f277e0
a.out   1888  rsc    3u  systm                          0t0
a.out   1888  rsc    4u   unix 0xffffff801047b900       0t0
->0xffffff8015222190
a.out   1888  rsc    6    PIPE 0xffffff800ffb8d60     16384
->0xffffff800bdadb80

<nil>
Owner

bradfitz commented Dec 22, 2011

Comment 1:

I can't reproduce this in a stand-alone test on either OS X 10.7 or 10.6.
Here's a patch which tried:  http://golang.org/cl/5503063
Even with the case "darwin" part commented out, it won't fail.
I added logging in crypto/tls/root_darwin.go to verify the C code was being run.
Owner

bradfitz commented Jan 10, 2012

Comment 2:

Andrew, which version of OS X do the builders run?
Owner

bradfitz commented Jan 12, 2012

Comment 3:

Andrew: ping.
Contributor

robpike commented Jan 13, 2012

Comment 4:

Labels changed: added priority-go1.

Contributor

robpike commented Jan 13, 2012

Comment 5:

Owner changed to builder@golang.org.

Contributor

rsc commented Jan 24, 2012

Comment 6:

Andrew, which version of OS X do the builders run?
Can we bump it up to Lion?
Owner

bradfitz commented Jan 24, 2012

Comment 7:

Andrew says 10.6.
Contributor

adg commented Jan 25, 2012

Comment 8:

I'll see about bumping it to 10.7.

Owner changed to @adg.

Contributor

rsc commented Feb 14, 2012

Comment 9:

I looked into this.  There are fd leaks even on 10.7 when using TLS or DNS from the Mac
libraries.
We should fix this more generally in exec.
stackoverflow.com/questions/899038 explains how to find the largest
in-use fd on a variety of systems (all different).
Updating the builders to 10.7 will not solve this problem.

Owner changed to @rsc.

Member

minux commented Mar 3, 2012

Comment 10:

FYI, CentOS 5/RHEL 5 also suffer from a similar bug.
Test log: http://pastebin.com/xF7cN5Xw
This os/exec test fails every time I run it, so I think we should solve it generally
in os/exec.
Contributor

rsc commented Mar 7, 2012

Comment 11:

On the systems that we care about, fixing this requires reading from /dev/fd or
/proc/self/fd to find out about the highest fds.  Or we could just close n to 100.  Both
are kind of kludgy, and we're really just working around bugs in other software that
happens to be sharing the same address space.
This can wait until after Go 1.

Labels changed: added priority-later, removed priority-go1.

Status changed to HelpWanted.

Comment 12 by lucio.dere:

I'd like to add that NetBSD suffers from this problem too, for what that's worth. 
Reading Russ' comment, I gather that we want to close any FDs that may be open above a
threshold: we know the threshold, but we don't know the upper limit, although we can
determine it in a possibly OS-dependent way.
If that's the case, I'm willing to look into it, although I'll only be able to test it
on a (very) limited number of platforms.
As for Russ' comment, my take is that a test ought to warn us when behaviour diverges
from the expected; it's a judgement call whether bad OS conditions should be treated as
divergence from the Go specification.  I guess there has to be reliable documentation to
cover special cases like this one.
Lucio.

Comment 13 by vbatts:

this is also the case on Red Hat RHEL 6. I had thought it was the product of our build
environment, because the os/exec test fails consistently there. We are using koji as our
build infrastructure. Here is log output of the failed build,
http://pastebin.com/ZuQ3Zx2m
On a RHEL6 workstation and virtual machine I have been unable to reproduce. Only on this
build server.
Owner

bradfitz commented Feb 27, 2013

Comment 14:

Re comment #13: I can't remember whether RHEL 6 is one of those kernels where O_CLOEXEC
isn't respected on one of the system calls.  It might be, but the failure you pasted
does look like it's your build system leaking fds:
exec_test.go:158:       Something already leaked - closed fd 3
exec_test.go:211:       CombinedOutput: exit status 1; output "leaked parent file. fd =
15; want 12
...
exec.test 7948 mockbuild   15r   REG    9,1  4220044  18579833
/tmp/go-build506154113/os/exec/_test/exec.test
Contributor

davecheney commented Feb 28, 2013

Comment 15:

RHEL6 is based on 2.6.32, so should support O_CLOEXEC natively. If
./all.bash passes on the RHEL6 host, we can be pretty confident that
O_CLOEXEC works.
Contributor

davecheney commented Jun 19, 2013

Comment 16:

I wonder if https://golang.org/issue/5714 was the root cause ?
Member

minux commented Jun 19, 2013

Comment 17:

i don't think issue #5714 is the root cause.
it's because libc (libSystem) creates its own fd for certain operations,
but those fds aren't O_CLOEXEC.
Contributor

rsc commented Jun 21, 2013

Comment 18:

The root cause is fd leaks in the standard system libraries on OS X, which we must call
into for DNS resolution. It' s not Go's fault at all, but Go should probably
(eventually) clean up the mess.
Contributor

rsc commented Jul 30, 2013

Comment 19:

Labels changed: added go1.2maybe.

Contributor

robpike commented Aug 16, 2013

Comment 20:

Deferring to Go 1.3.

Labels changed: added go1.3maybe, removed go1.2maybe.

Contributor

robpike commented Aug 20, 2013

Comment 21:

Labels changed: removed go1.3maybe.

Contributor

rsc commented Nov 27, 2013

Comment 22:

Labels changed: added go1.3maybe.

Contributor

rsc commented Dec 4, 2013

Comment 23:

Labels changed: added release-none, removed go1.3maybe.

Contributor

rsc commented Dec 4, 2013

Comment 24:

Labels changed: added repo-main.

@rsc rsc was assigned by bradfitz Dec 4, 2013

Contributor

jacobsa commented Mar 23, 2015

Any chance of this getting some attention? (Is anybody sure it still happens with 10.10?)

Contributor

davecheney commented Mar 23, 2015

It doesn't happen after 10.6. We can't fix this, it's a kernel bug in 10.6,
and 10.6 is not supported by Apple any more.

On Tue, Mar 24, 2015 at 9:34 AM, Aaron Jacobs notifications@github.com
wrote:

Any chance of this getting some attention? (Are we sure it still happens
with 10.10?)


Reply to this email directly or view it on GitHub
#2603 (comment).

Owner

bradfitz commented Mar 23, 2015

And we're discussing elsewhere just dropping 10.6 support in Go 1.5, too. Especially since we can't run virtualized builders for it legally.

Contributor

jacobsa commented Mar 24, 2015

@davecheney: Are you sure? Russ's comment on 2012-02-15 says that there are leaks even on 10.7, and bumping the version to 10.7 will not help.

Contributor

jacobsa commented Mar 24, 2015

(Plus further comments that say this happens on other operating systems. Perhaps the issue should be re-titled to reduce confusion.)

Contributor

davecheney commented Mar 24, 2015

You are probably correct. All I know is that it doesn't happen on current
versions of OSX, where current == the version that you find on hardware you
can buy.

On Tue, Mar 24, 2015 at 11:36 AM, Aaron Jacobs notifications@github.com
wrote:

@davecheney https://github.com/davecheney: Are you sure? Russ's comment
on 2012-02-15 says that there are leaks even on 10.7, and bumping the
version to 10.7 will not help.


Reply to this email directly or view it on GitHub
#2603 (comment).

@rsc rsc added this to the Unplanned milestone Apr 10, 2015

Contributor

rsc commented Apr 28, 2015

Per #9511, we will not be making any further bug fixes specific to 10.6.

@rsc rsc closed this Apr 28, 2015

Contributor

jacobsa commented Apr 28, 2015

@rsc: This is not specific to 10.6. See your comment from 2012-02-15 and the last few comments in this thread.

Contributor

jacobsa commented Jun 10, 2015

@rsc: Ping. I think this thread should be re-opened; it is not only about OS X 10.6. Also, the link still remains in the public documentation for os/exec.

Contributor

ianlancetaylor commented Jun 10, 2015

This issue has gotten confusing. Russ said it fails with OS X 10.7, Dave Cheney says it works on current OS X. Various people have chimed in with issues on other systems, but there are no test cases. I think it would be better to open a new issue with an example program and clear information about how and where it fails.

Contributor

jacobsa commented Jun 10, 2015

Agreed, I first came here looking for clarification myself. :-) At the minimum though, the documentation should probably be updated to not point at this confusing, closed bug.

@minux minux added Unfortunate and removed HelpWanted labels Oct 5, 2015

@gopherbot gopherbot locked and limited conversation to collaborators Oct 4, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.