Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: manual instrumentation of KeepAlive is fraught #34810

Open
beoran opened this issue Oct 10, 2019 · 27 comments

Comments

@beoran
Copy link

commented Oct 10, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12 linux/amd64

Does this issue reproduce with the latest release?

Likely, yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/bjorn/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/bjorn/src/go"
GOPROXY="http://claudette.applied-maths.local:13909"
GORACE=""
GOROOT="/home/bjorn/opt/go"
GOTMPDIR=""
GOTOOLDIR="/home/bjorn/opt/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/bjorn/work/src/i41healthapp/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build701296847=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am currently writing a game library in Go that avoids cgo and calls the
OS directly, certainly on Linux thanks to it's stable kernel API. (See:
https://gitlab.com/beoran/galago https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go
)

I have an input Device like this:

// Device models an input device
type Device struct {
    FileName string 
    *os.File
    // Cached device information
    Info struct { 
        Events          []SupportedEvent
        Keys            []SupportedKey
        Axes            []AbsoluteAxis
        Rolls           []RelativeAxis
        Name            string
        ID              string
    }
}

// Keeps the device's file from being garbage collected.
func (d * Device) KeepAlive() {
    runtime.KeepAlive(d.File)
}

// Icotl performs an ioctl on the given device
func (d * Device) Ioctl(code uint32, pointer unsafe.Pointer) error {
    fmt.Printf("ioctl: %d %d %d\n", uintptr(d.Fd()), uintptr(code),
uintptr(pointer))
    _, _, errno := syscall.Syscall(
        syscall.SYS_IOCTL,
        uintptr(d.Fd()),
        uintptr(code),
        uintptr(pointer))
    if (errno != 0) {
        return errno
    }
    d.KeepAlive()
    return nil
}

Notice the KeepAlive? If I leave that out the program crashes, because the device 's io.File gets garbage collected. This took me quite some time to figure out and it is not obvious that that would happen, and that runitme.KeepAlive() is needed here. It would be great if I didn't have to manually insert runtime.KeepAlive calls when using os.File.Fd when using system calls. Or if there was at least vet/lint tooling to suggest when I would probably need it.

I posted this as a new issue to split it off from #34684, where it was a tangential issue.

@acln0

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2019

Consider using https://golang.org/pkg/os/#File.SyscallConn over the the raw file descriptor returned by Fd. The SyscallConn API is perhaps a little more verbose, but it deals with lifetime issues correctly, and wrappers like the Ioctl method in your snippet can hide it from callers entirely.

@beoran

This comment has been minimized.

Copy link
Author

commented Oct 10, 2019

I took a look at SyscallConn and it looks like it will work, at the hopefully small overhead of having to use a closure.

But TIL that File.Fd() sets the file to blocking mode, which I explains why I was unable to do nonblocking IO on the devices. I feel that should have been in the documentation of File.Fd(). Furthermore, it would be a good idea to refer from the documentation of File.FD to that of File.SyscallConn().

@acln0

This comment has been minimized.

Copy link
Contributor

commented Oct 10, 2019

I feel that should have been in the documentation of File.Fd().

It is, sort of. See #22934 and 4153495.

@andybons andybons changed the title Garbage collector should keep os.File alive when using os.File.Fd runtime: garbage collector should keep os.File alive when using os.File.Fd Oct 10, 2019
@andybons andybons added this to the Unplanned milestone Oct 10, 2019
@andybons

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

@beoran is this still an issue for you or would you like to re-purpose this to a documentation fix?

@beoran

This comment has been minimized.

Copy link
Author

commented Oct 10, 2019

@andybons andybons changed the title runtime: garbage collector should keep os.File alive when using os.File.Fd os: document that File.Fd() sets the file to blocking mode Oct 10, 2019
@andybons

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

@ianlancetaylor in case there are objections.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

I think this issue should be kept open until we've discussed it further.

I'm not convinced the Go compiler / runtime necessarily need to do anything differently, but I do think there's opportunity for vet or lint or something should probably warn users when they need to use runtime.KeepAlive.

For example, I just happened to be looking at github.com/google/gopacket, and I noticed that its pcap library suffers from this same issue: they call file.Fd() but don't have any runtime.KeepAlive(file) call afterwards to ensure that file isn't GC'd.

@mdempsky mdempsky changed the title os: document that File.Fd() sets the file to blocking mode runtime: manual instrumentation of KeepAlive is fraught Oct 10, 2019
@beoran

This comment has been minimized.

Copy link
Author

commented Oct 10, 2019

https://gitlab.com/beoran/galago/blob/master/os/linux/input/input_linux.go now uses File.SyscallConn , and as a bonus, reading Linux input events has non-blocking as well. Yay ^_^. But, if @acln0 hadn't kindly informed me of the existence this function, then I would still be struggling.

So yes, both better documentation and a go vet or go lint check would be welcome to help others avoid this issue.

@andybons

This comment has been minimized.

Copy link
Member

commented Oct 10, 2019

Sounds good to me. Thanks for chiming in, @mdempsky.

@beoran

This comment has been minimized.

Copy link
Author

commented Oct 11, 2019

A very related problem I am having now, with my library is that the SyscallConn can't be used reliably in conjunction with unix.Poll(fds []PollFd, timeout int) (n int, err error). I would either have to make the closure passed to Control() call Control() recursively over all Files I want to poll on , or I have to cheat and store the FD somewhere and hope that it will stay valid for the file. I could of course use File.Fd() again, but that makes the file blocking, which is what I want to avoid in the first place. What should I do to poll on several files without using File.Fd()?

@eliasnaur

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2019

@beoran

This comment has been minimized.

Copy link
Author

commented Oct 11, 2019

That could work, but I'm not sure how to do that. File.Fd() sets a nonblocking flag to false in the underlying structure for os.File directly. AFAIK, `unix.SetNonblock(fd, true) has no effect on this, or am I mistaken?

@eliasnaur

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2019

@beoran

This comment has been minimized.

Copy link
Author

commented Oct 11, 2019

Well, yes, but that was exactly the problem that SyscallConn was supposed to solve. With it you can get access to the FD while the os.File() still works. Since this is platform specific code anyway, I might as well then open the file with unix.Openin stead, and just handle everything myself, although I fear this might interfere with the way goroutines are supposed to work.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2019

Yes, if you want to manage I/O yourself, I recommend that you use unix.Open. That won't interfere with goroutines. It just means that I/O on those descriptors won't use the runtime poller, but it seems that you want to call unix.Poll yourself anyhow. There is no good way to both use your own poller and the runtime poller, and there doesn't seem to be much point to doing so.

@beoran

This comment has been minimized.

Copy link
Author

commented Oct 11, 2019

Well, i'm calling unix.Poll myself because I don't know how I could use the runtime poller in my situation. Would it even be possible, and how? If not I'll stick to managing everything myself.

The top issue remains though, this is all rather fraught and not well documented. Certainly x/sys/unix could do with more documentation and examples.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 11, 2019

@beoran I don't really know what you are doing so I don't know why you can't use the runtime poller. In general it's fine to call ioctl on os.File or net.Conn values using SyscallConn and to also use the ordinary Read and Write methods which will use the runtime poller.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Oct 11, 2019

It seems like this discussion is drifting a bit from the initial report, which was that os.File.Fd is dangerous because it (in general) requires using runtime.KeepAlive, and we don't have any tooling to help check for that.

Some questions:

  1. Is the transformation fn(file.Fd()); runtime.KeepAlive(file) => file.SyscallConn().Control(fn) (or maybe Read or Write instead of Control) safe?

  2. If so, should we just deprecate os.File.Fd and encourage users to switch to os.File.SyscallConn?

  3. If not (or maybe regardless), should we implement a lint/vet warning that file.Fd() should always be used in conjunction with runtime.KeepAlive(file)?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2019

  1. Yes, that is safe provided that fn does not preserve its fd argument in any way.
  2. Well, no, because some people really do need to hang on to the file descriptor.
  3. I'm not sure. Assuming your program does not change os.Stdout there's nothing wrong with os.Stdout.Fd(), or in general calling Fd on a file descriptor that is stored in a global variable or otherwise preserved.
@mdempsky

This comment has been minimized.

Copy link
Member

commented Oct 12, 2019

  1. Well, no, because some people really do need to hang on to the file descriptor.

Do you have any examples of this?

For this to be safe, the user has to also keep the os.File alive. And if they already have to do that, it seems easier to just hang onto the *os.File or *syscall.RawConn, and convert it to uintptr via RawConn.Control as needed.

It's also an option to dup the FD, so they can manage its lifetime itself.

Assuming your program does not change os.Stdout there's nothing wrong with os.Stdout.Fd(), or in general calling Fd on a file descriptor that is stored in a global variable or otherwise preserved.

We have room to make a vet/lint check arbitrarily sophisticated to try to avoid false positives; but in general, I'm pretty skeptical of the ability to use it safely.

E.g., I just started a random audit of os.File.Fd call sites, and I already found that package os itself doesn't use it safely:

go/src/os/exec_posix.go

Lines 46 to 49 in 54abb5f

sysattr.Files = make([]uintptr, 0, len(attr.Files))
for _, f := range attr.Files {
sysattr.Files = append(sysattr.Files, f.Fd())
}

(Admittedly this case would be very awkward to rewrite using SyscallConn, but not impossible.)

After looking at about 100 call sites, cmd/compile is the only code that I found that uses os.File.Fd and then explicitly keeps the os.File alive. (And I'm not 100% confident that if I wrote that code today that I'd remember to have done that.)

I'd estimate I saw 10% of cases that were bad; 20% of cases that were okay (i.e., the os.File was obviously still alive after the Fd call); 20% of cases that used os.Std{in,out,err} (so also okay); but a huge chunk of remaining cases where the *os.File is dead after the Fd call (at least locally within the function).

It would be interesting to instrument os.File.Fd() with a call to runtime.GC() immediately before returning. I wouldn't be surprised if some programs start failing.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 12, 2019

An example where you need the file descriptor is when you are passing a set of file descriptors to another process via syscall.UnixRights.

The use in package os that you mention is safe because the file descriptors only have to live as long as the fork, and the fork code will be using the SysProcAttr that points to the os.File values. Not that I would be opposed to adding some KeepAlive calls.

I agree that some sort of instrumentation would be interesting. It could perhaps be done via GODEBUG.

@mdempsky

This comment has been minimized.

Copy link
Member

commented Oct 12, 2019

An example where you need the file descriptor is when you are passing a set of file descriptors to another process via syscall.UnixRights.

Can't you do that inside the RawConn.Control?

the fork code will be using the SysProcAttr that points to the os.File values.

I assume you mean ProcAttr, and ProcAttr takes the files as uintptr, not os.File: https://golang.org/pkg/syscall/#ProcAttr

Am I missing something?

Edit: I don't think I'm missing anything: #34858

@beoran

This comment has been minimized.

Copy link
Author

commented Oct 12, 2019

@mdempsky

This comment has been minimized.

Copy link
Member

commented Oct 12, 2019

In situations where you need a set of fds, such as for ProcAttr, RawConn.Control is impractical because for N files you would have to call it N times it recursively from the closure.

I'll admit it's a bit tedious, but I don't think it's impractical. The recursion can be easily abstracted away behind a helper like so: (caveat: untested)

func MultiControl(conns []syscall.RawConn, f func(fds []uintptr)) error {
	if len(conns) == 0 {
		f(nil)
		return nil
	}

	var err error
	fds := make([]uintptr, len(conns))
	i := 0

	var do func(fd uintptr)
	do = func(fd uintptr) {
		fds[i] = fd
		i++
		if i == len(conns) {
			f(fds)
			return
		}
		err0 := conns[i].Control(do)
		if err == nil {
			err = err0
		}
	}
	err0 := conns[i].Control(do)
	if err == nil {
		err = err0
	}
	return err
}

However, while writing this, I did notice a somewhat thornier issue: Plan 9 doesn't support SyscallConn. It just returns EPLAN9.

@ericlagergren

This comment has been minimized.

Copy link
Contributor

commented Oct 13, 2019

To add another data point: at work we've a (rather large) package that wraps a certain DLL. Pretty much every constructed object needs to be closed in order to not leak memory, etc. As a safety feature, we've added finalizers. Just like os.File.

Most DLL functions require a handle, which is just a number. This leaves us open to the GC collecting the object out from under us if we forget to use runtime.KeepAlive. Just like os.(*File).Fd.

Even though we try to do due diligence, we've run into the issue a couple of times (thankfully just in testing). It's fairly easy to remember to use runtime.KeepAlive for os.File, because it's more or less the only part of the stdlib that you use with any frequency that has this problem.

I don't know what the valid criteria are for sniffing this out, but whenever I've run into this problem there's been two similarities:

(a) passing a uintptr (or a type whose underlying type is uintptr) to another function, and
(b) accessing that uintptr via some method call

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Oct 15, 2019

@mdempsky You're right, I was thinking of the ExtraFiles field in os/exec.Cmd. That is used to set the Files field in syscall.ProcAttr. The use of Files in os.ProcAttr does seem problematic. It seems to need a runtime.KeepAlive(attr).

@gopherbot

This comment has been minimized.

Copy link

commented Oct 15, 2019

Change https://golang.org/cl/201198 mentions this issue: os: keep attr.Files alive when calling StartProcess

gopherbot pushed a commit that referenced this issue Oct 15, 2019
Updates #34810
Fixes #34858

Change-Id: Ie934861e51eeafe8a7fd6653c4223a5f5d45efe8
Reviewed-on: https://go-review.googlesource.com/c/go/+/201198
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.