Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: EINTR on darwin from os.Process.Wait #36644

Open
crawshaw opened this issue Jan 19, 2020 · 14 comments
Open

os: EINTR on darwin from os.Process.Wait #36644

crawshaw opened this issue Jan 19, 2020 · 14 comments
Milestone

Comments

@crawshaw
Copy link
Contributor

@crawshaw crawshaw commented Jan 19, 2020

Background: our Go code is built with -buildmode=c-archive and linked into a macOS app built by xcode. We are not installing any signal handlers, but it is possible some Apple library is doing so behind our backs. We are using the os/exec package to periodically execute netstat -an.

Occasionally, the os/exec.Cmd.Wait method is returning EINTR. This EINTR comes directly from os.Process.Wait. Unfortunately the os/exec package does not expect EINTR, and so it treats it as if the fork process has exited and cleans up its state, so that Cmd.Wait cannot be called again.

The darwin man pages wait(2) and sigaction(2) suggest this should only happen if a signal handler does not have the SA_RESTART flag set. I have not yet determined if Apple is setting another handler for us, or if this is something unusual like Issue #11180.

Even if this is a core Apple library setting a handler (we use no third-party code), it may be worth handling EINTR for darwin in os.Process.Wait, to make the object useful in Go code linked with -buildmode=c-archive.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 21, 2020

Perhaps we do have to change the os package for this, but I'd like to clearly understand the root cause first. We have not historically tried to handle all possible EINTR errors due to a non-Go signal handler that does not use SA_RESTART.

@crawshaw

This comment has been minimized.

Copy link
Contributor Author

@crawshaw crawshaw commented Jan 21, 2020

I did a little investigating. I was unable to replicate this running a normal binary, but it happens reliably when run under lldb:

$ cat eintr.go 
package main

import "os/exec"

import "C"

func main() {}

//export Netstat
func Netstat() {
	cmd := exec.Command("netstat", "-na")
	if _, err := cmd.Output(); err != nil {
		panic(err)
	}
}
$ cat eintr.c 

extern void Netstat();

int main(void) {
	Netstat();
	return 0;
}
$ go build -buildmode=c-archive eintr.go && cc eintr.c eintr.a && lldb -o run ./a.out
(lldb) target create "./a.out"
Current executable set to './a.out' (x86_64).
(lldb) run
panic: wait: interrupted system call

goroutine 17 [running, locked to thread]:
main.Netstat()
	/Users/crawshaw/eintr.go:13 +0xae
main._cgoexpwrap_078bcfb61532_Netstat()
	_cgo_gotypes.go:45 +0x20
Process 41628 exited with status = 2 (0x00000002) 

Process 41628 launched: '/Users/crawshaw/a.out' (x86_64)
(lldb) 

This is the typical way xcode runs development binaries, which is why we see it so frequently.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 21, 2020

What happens with gdb?

@crawshaw

This comment has been minimized.

Copy link
Contributor Author

@crawshaw crawshaw commented Jan 21, 2020

(It is surprisingly hard to install gdb on macOS 10.15 with codesigning.)

Results are similar:

~ $ gdb ./a.out 
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin19.2.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./a.out...
Loading Go Runtime support.
(gdb) start
Temporary breakpoint 1 at 0x1000843b0: file <autogenerated>, line 1.
Starting program: /Users/crawshaw/a.out 
[New Thread 0x1803 of process 75954]
[New Thread 0x1b03 of process 75954]
[New Thread 0x2603 of process 75954]
warning: unhandled dyld version (16)
[New Thread 0x180f of process 75954]
[New Thread 0x1c03 of process 75954]
[New Thread 0x1d03 of process 75954]
[New Thread 0x2303 of process 75954]
[New Thread 0x2403 of process 75954]
[New Thread 0x2503 of process 75954]
panic: wait: interrupted system call

goroutine 17 [running, locked to thread]:
main.Netstat()
	/Users/crawshaw/eintr.go:13 +0xae
main._cgoexpwrap_078bcfb61532_Netstat()
	_cgo_gotypes.go:45 +0x20
[Inferior 1 (process 75954) exited with code 02]
(gdb) 
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 21, 2020

Thanks. When I look at that gdb output the first thing that strikes me is: what signal is it? gdb normally reports every signal that a program receives (I don't know about lldb). Why is that not being reported here? When you run info handle, which signals are marked as No under Print?

@crawshaw

This comment has been minimized.

Copy link
Contributor Author

@crawshaw crawshaw commented Jan 21, 2020

Not printed, according to info handle: SIGALRM, SIGURG, SIGCHLD, SIGIO, SIGVTALRM, SIGPROF, SIGWINCH, SIGPOLL, SIGWAITING, SIGLWP, SIGPRIO, SIGCANCEL, SIGLIBRT.

I tried setting handle SIGCHLD print and got:

(gdb) handle SIGCHLD print
Signal        Stop	Print	Pass to program	Description
SIGCHLD       No	Yes	Yes		Child status changed
(gdb) run
Starting program: /Users/crawshaw/a.out 
[New Thread 0x2703 of process 77262]
[New Thread 0x2503 of process 77262]
[New Thread 0x1a03 of process 77262]
warning: unhandled dyld version (16)
[New Thread 0x1b03 of process 77262]
[New Thread 0x1c03 of process 77262]
[New Thread 0x1d03 of process 77262]
[New Thread 0x2303 of process 77262]
[New Thread 0x2403 of process 77262]
[New Thread 0x270f of process 77262]

Thread 3 received signal SIGCHLD, Child status changed.
panic: wait: interrupted system call

goroutine 17 [running, locked to thread]:
main.Netstat()
	/Users/crawshaw/eintr.go:13 +0xae
main._cgoexpwrap_078bcfb61532_Netstat()
	_cgo_gotypes.go:45 +0x20
[Inferior 1 (process 77262) exited with code 02]
(gdb) 

(The debugger freezes on run if I attempt to set all signals to print.)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 21, 2020

Thanks. So it sounds like something is installing a SIGCHLD signal handler without setting SA_RESTART. Any idea what that is?

I'm trying to drill down on this because if that is what is happening, and we can't avoid it, then we need to add an EINTR loop to a dozen or more places in the standard library. You are just seeing it in wait because that is where your program is spending its time. We will still see the problem with lower frequency in read, write, sendmsg, etc.

@crawshaw

This comment has been minimized.

Copy link
Contributor Author

@crawshaw crawshaw commented Jan 21, 2020

I only see this behavior under lldb and gdb.

Googling around, here is a reference claiming that gdb installs a SIGCHLD handler on macOS: https://jmmv.dev/2008/10/boostprocess-and-sigchld.html (I notice Julio Merino is a Google employee, maybe jmmv@ will know more?)

Grepping through the GDB sources I don't see any macos-specific code for this, but I am not familiar with the codebase.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 21, 2020

I wonder if this is a side effect of using ptrace.

What happens if your Go c-archive does

    os.Notify(make(chan syscall.Signal, 1), syscall.SIGCHLD)

?

@crawshaw

This comment has been minimized.

Copy link
Contributor Author

@crawshaw crawshaw commented Jan 21, 2020

Calling signal.Notify captures the signal and lets the program exit cleanly.

@networkimprov

This comment has been minimized.

Copy link

@networkimprov networkimprov commented Jan 22, 2020

cc @jmmv since @crawshaw meant to

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 22, 2020

At least there is a workaround.

I don't know what else we can do short of adding EINTR loops throughout.

@bcmills

This comment has been minimized.

Copy link
Member

@bcmills bcmills commented Jan 22, 2020

I don't know what else we can do short of adding EINTR loops throughout.

That is #20400. There, you said (#20400 (comment)):

Unfortunately I agree with you that checking for EINTR in the standard library is the only realistic approach.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 22, 2020

The change can be simultaneously the only realistic approach and yet be undesirable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.