Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: OpenFile may return EINTR on OS X #11180

Closed
jacobsa opened this issue Jun 12, 2015 · 10 comments
Closed

os: OpenFile may return EINTR on OS X #11180

jacobsa opened this issue Jun 12, 2015 · 10 comments
Milestone

Comments

@jacobsa
Copy link
Contributor

@jacobsa jacobsa commented Jun 12, 2015

runtime.setsig supports calling sigaction(2) with the SA_RESTART flag, which on OS X causes several system calls to be restarted when they would otherwise return EINTR. In particular, open(2) is one of these, but only for "slow devices" and not regular files. I don't have direct evidence, but I believe this is because on OS X an open of a regular file can't usually block in a way that can be interrupted.

I see that SA_RESTART is generally used for Go signal handlers: the default handlers all use it, and sigenable sets it. (The only exceptions appear to be SIGPIPE and SIGABRT). From this I conclude that it's intended that os.OpenFile not leak EINTR out to the caller. I've never seen a retry loop around os.OpenFile, so this would make sense.

However, it seems on OS X open(2) of a file can be interrupted if the file is on a fuse file system. This came up for me when using os/exec.Command in parallel with opening a file backed by fuse. When the command completes the thread calling open(2) would sometimes receive SIGCHLD, causing errors like this:

open /var/folders/00/0mb18000h01000cxqpysvccm002jc5/T/fs_test474101889/foo: interrupted system call

Am I correct in assuming that os.OpenFile is not intended to leak EINTR so that users don't need to call in a loop? If so, should it be modified to contain an internal retry loop?

@jacobsa

This comment has been minimized.

Copy link
Contributor Author

@jacobsa jacobsa commented Jun 12, 2015

I can reliably reproduce this on a recent Mac Pro running OS X 10.10.3 using a fuse file system that delays open(2) by sleeping and a program that opens files and runs sub-processes concurrently. Instructions:

  1. Install osxfuse.
  2. go get github.com/jacobsa/golang-issue-11180/slowfs
  3. go get github.com/jacobsa/golang-issue-11180/do_opens
  4. slowfs
  5. (In another terminal:) GOMAXPROCS=16 do_opens --dir <dir printed by slowfs>

This always results in output like:

11:25:15.334740 main.go:25: My PID: 19604
11:25:16.359473 main.go:43: open /var/folders/00/0mb18000h01000cxqpysvccm002jc5/T/slowfs424454822/foo: interrupted system call
@davecheney

This comment has been minimized.

Copy link
Contributor

@davecheney davecheney commented Jun 12, 2015

@jacobsa this sounds like a bug in fuse, Go is doing the right thing.

@jacobsa

This comment has been minimized.

Copy link
Contributor Author

@jacobsa jacobsa commented Jun 12, 2015

@davecheney: Could you expand on your reasoning?

I filed this bug thinking I'd probably hear that answer, but I would like to know why. The sigaction(2) man page doesn't explicitly guarantee that open(2) won't return EINTR. If os.OpenFile is indeed intended to never return that error, it can't rely on such a guarantee, right?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jun 12, 2015

If a signal is installed with SA_RESTART, as is always true of signals installed by the Go runtime, then only a limited set of calls should ever return EINTR. On a GNU/Linux system the complete set can be found in signal(7). I don't know about Darwin. open is not on the list for GNU/Linux. Can you find the list for Darwin? On GNU/Linux it would be a clear bug if open ever returns EINTR for a signal whose handler is installed with SA_RESTART.

We do not want to sprinkle EINTR loops through the os and net packages unless they are documented as being required. I think we would prefer not to change the core Go libraries to work around bugs in FUSE.

@davecheney

This comment has been minimized.

Copy link
Contributor

@davecheney davecheney commented Jun 12, 2015

I wonder if the inverse is true. Aaron, can you reproduce the issue under
linux and fuse ?

On Fri, Jun 12, 2015 at 4:16 PM, Ian Lance Taylor notifications@github.com
wrote:

If a signal is installed with SA_RESTART, as is always true of signals
installed by the Go runtime, then only a limited set of calls should ever
return EINTR. On a GNU/Linux system the complete set can be found in
signal(7). I don't know about Darwin. open is not on the list for
GNU/Linux. Can you find the list for Darwin? On GNU/Linux it would be a
clear bug if open ever returns EINTR for a signal whose handler is
installed with SA_RESTART.

We do not want to sprinkle EINTR loops through the os and net packages
unless they are documented as being required. I think we would prefer not
to change the core Go libraries to work around bugs in FUSE.


Reply to this email directly or view it on GitHub
#11180 (comment).

@jacobsa

This comment has been minimized.

Copy link
Contributor Author

@jacobsa jacobsa commented Jun 14, 2015

@ianlancetaylor: The OS X sigaction(2) man page says this:

Restart of pending calls is requested by setting the SA_RESTART bit in
sa_flags. The affected system calls include open(2), read(2), write(2),
sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a communications channel
or a slow device (such as a terminal, but not a regular file) and during a
wait(2) or ioctl(2).

The sentence structure is ambiguous, but I read that as saying that open(2) is not guaranteed to be restarted for regular files. Note that the Linux man page also unambiguously says that open(2) is only restarted when "it can block". Technically that is true of fuse files, but it is not true of regular files in general and so I wouldn't be too surprised if it didn't restart for any regular file.

@davecheney: I can't reproduce it on Linux with my minimized test case. The larger program where I found this issue deadlocks on Linux due to issue #11155, so I'm not sure about that one.

@minux

This comment has been minimized.

Copy link
Member

@minux minux commented Jun 15, 2015

@harshavardhana

This comment has been minimized.

Copy link
Contributor

@harshavardhana harshavardhana commented Jul 8, 2015

EINTR, EAGAIN and EBUSY are special cases in POSIX, interestingly defined vaguely and implemented vaguely from filesystem to filesystems, and of course operating systems to operating systems.

OSXFUSE sending EINTR is perfect in accordance with POSIX, it is the client application which is layered on top of OSXFUSE perhaps your own filesystem built out of OSXFUSE which could potentially handle this at POSIX level and perhaps even want to make sense of it - for simple example https://github.com/bazil/fuse/blob/master/fuse.go#L522

There are other good examples too, GlusterFS which is built on top of FUSE for example handles EINTR perfectly fine for all network operations, sockets, polling etc.

While writing this i stumbled upon python PEP 0475 - they thought convenient API's on top of regular syscall's should be python's responsibility - for example here http://bugs.python.org/issue23285, https://www.python.org/dev/peps/pep-0475/ - so python3.5 handles this automatically.

Potentially it can be handled gracefully to remove some burden perhaps of switch casing through os.PathError* - but given a choice personally as a user of os.OpenFile(), i would prefer it returns the EINTR up the chain and let me handle it.

Can be debated further .... :-)

@ianlancetaylor ianlancetaylor added this to the Go1.6 milestone Jul 11, 2015
@jacobsa

This comment has been minimized.

Copy link
Contributor Author

@jacobsa jacobsa commented Aug 13, 2015

@harshavardhana, addressing this:

OSXFUSE sending EINTR is perfect in accordance with POSIX, it is the client application which is layered on top of OSXFUSE perhaps your own filesystem built out of OSXFUSE which could potentially handle this at POSIX level and perhaps even want to make sense of it

To be clear, this is osxfuse returning EINTR to the user of the file system (i.e. a program opening a file in the directory at which the file system is mounted), not osxfuse returning EINTR when the file system server is reading from /dev/fuse. So there's no opportunity for my code to fix it; instead I'd have to update any Go program that may open a file. (Which is why I've filed this bug, because updating package os would have that effect.)

Thanks for pointing out PEP 0475; I agree with the logic and conclusion there. I do think it's funny that they call out Go as a language whose standard library retries on EINTR, though. :-)

I will file an issue with osxfuse about this to see what the author thinks. But I still do think that Go should handle this, given existing versions of osxfuse and given that the man page quoted above does not appear to exclude the possibility of open returning EINTR for a file.

jacobsa added a commit to jacobsa/go that referenced this issue Sep 10, 2015
The man page for sigaction(2) doesn't guarantee that SA_RESTART will
work for open(2) on regular files:

    The affected system calls include open(2), read(2), write(2),
    sendto(2), recvfrom(2), sendmsg(2) and recvmsg(2) on a
    communications channel or a slow device (such as a terminal, but not
    a regular file) and during a wait(2) or ioctl(2).

I've never observed EINTR from open(2) for a traditional file system
such as HFS+, but it's easy to observe with a fuse file system that is
slightly slow (cf. https://goo.gl/UxsVgB). After this change, the
problem can no longer be reproduced when calling os.OpenFile.

Fixes golang#11180.

Change-Id: I967247430e20a7d29a285b3d76bf3498dc4773db
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Sep 11, 2015

CL https://golang.org/cl/14484 mentions this issue.

@golang golang locked and limited conversation to collaborators Sep 22, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.