New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: signal handling: Go assumes SA_RESTART, but does not enforce it #20400

Open
bcmills opened this Issue May 17, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@bcmills
Member

bcmills commented May 17, 2017

The runtime registers all of its signal handlers with the SA_RESTART and SA_ONSTACK flags. It enforces that other handlers are registered with SA_ONSTACK, but does not enforce SA_RESTART.

Since the Go runtime does not register very many handlers when using -buildmode=c-archive and -buildmode=c-shared, other libraries that register handlers will not see an existing handler using SA_RESTART and thus will not know to propagate it. So in those programs, it is fairly likely that some handler will be registered without SA_RESTART.

Unfortunately, it appears that the Go standard library is written based on the assumption that all handlers use SA_RESTART. System calls in the standard library do not consistently check for the EINTR error and do not document its possibility. For example, many users would be surprised to learn that (*os.Cmd).CombinedOutput can return an os.SyscallError wrapping EINTR, as illustrated by the program below:

bcmills:~/src$ go version
go version devel +b53acd89db Tue May 16 17:15:11 2017 +0000 linux/amd64

eintr/eintr.go:

package main

import "C"

import (
	"fmt"
	"os"
	"os/exec"
	"runtime"
	"syscall"
	"time"
)

//export go_annoy
func go_annoy(sig, seconds C.int) {
	annoy(syscall.Signal(sig), time.Duration(seconds)*time.Second)
}

func annoy(sig syscall.Signal, d time.Duration) {
	runtime.LockOSThread()
	pid := syscall.Getpid()
	tid := syscall.Gettid()
	exit := make(chan bool)
	go func() {
		for {
			select {
			case <-exit:
				return
			default:
			}
			if err := syscall.Tgkill(pid, tid, sig); err != nil {
				panic(err)
			}
		}
	}()

	started := time.Now()
	for time.Since(started) < d {
		cmd := exec.Command("/bin/echo", "Are we there yet?")
		_, err := cmd.CombinedOutput()
		if err != nil {
			fmt.Fprintln(os.Stderr, "exec.Cmd error: ", err)
			os.Exit(1)
		}
	}
	exit <- true
}

func main() {}

eintr/main/eintr.c:

#include <signal.h>
#include <stddef.h>
#include <string.h>

#include "eintr.h"

static void ignore_signal(int signo, siginfo_t *info, void *context) {
}

int main() {
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = SA_SIGINFO | SA_ONSTACK;
  sa.sa_sigaction = ignore_signal;
  sigaction(SIGUSR1, &sa, NULL);

  go_annoy(SIGUSR1, 10);

  return 0;
}
bcmills:~/src$ go build -buildmode=c-archive eintr
bcmills:~/src$ $(go env CC) -pthread -static -I. eintr/main/eintr.c ./eintr.a -o eintr_main
bcmills:~/src$ ./eintr_main
exec.Cmd error:  fork/exec /bin/echo: interrupted system call
bcmills:~/src$ ./eintr_main
exec.Cmd error:  waitid: interrupted system call

In typical usage, the Go portion of the program would have only the exec.Command portion of the annoy function; the signal would originate externally (either from some other process, or perhaps from some other language runtime within the process).


I can think of three ways we could deal with this issue:

  1. Document that all signal handlers in a program containing a Go runtime must be registered with SA_RESTART and treat it the same way we do for SA_ONSTACK.
  2. Consistently check for EINTR throughout the standard library and transparently retry.
  3. Document that many Go functions can return EINTR and require end-users to check for it explicitly.

I believe that (3) is strictly worse than (2): if we can't handle EINTR correctly and consistently within the runtime, we can't reasonably expect users to do so.

I am not sure whether (1) is feasible. In particular, it might prevent the use of the Go runtime with many other languages (such languages that execute on a JVM).

That leaves us with (2). I am not familiar enough with syscall usage in the standard library to evaluate whether it is feasible.

(CC: @ianlancetaylor @mdempsky )

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented May 18, 2017

Testing for EINTR in the standard library is feasible, it's just annoying and hard to verify for correctness. There is a limited number of syscalls that can return EINTR. We would have to track down each one of them and one way or another add an EINTR loop.

On GNU/Linux systems, the list is read, readv, write, writev, ioctl, open, wait, wait3, wait4, waitid, waitpid, accept, connect, recv, recvfrom, recvmsg, send, sendto, sendmsg, flock, fcntl(F_SETLKW), futex, sem_wait, sem_timedwait, and the message queue functions.

We made a deliberate decision early on to not check for EINTR, but to instead assume that all modern Unix systems implemented SA_RESTART correctly. This turned out to be only approximately true:

But, close enough.

One possibility would be to modify the syscall functions themselves to loop on EINTR. But I don't think we can realistically do that as functions like recv intentionally return EINTR if a timeout has been set on the socket using setsockopt(SO_RCVTIMEO). So we don't want to always loop on EINTR for calls to syscall.Recv, even if we could plausibly always loop on EINTR for calls to UDPConn.ReadFrom. Also, of course, it seems a shame to introduce a loop in the syscall package when for many of the relevant functions we already have a loop on EAGAIN (in the internal/poll package) which could trivially check for EINTR.

Unfortunately I agree with you that checking for EINTR in the standard library is the only realistic approach.

@bcmills

This comment has been minimized.

Member

bcmills commented May 18, 2017

Even though this is arguably a bug, it seems too invasive to try to do during the 1.9 freeze (and I'm not sure who would have the time to do it anyway). If nobody else gets to it by the time the 1.10 window opens, I'll see what I can do then.

@mdempsky

This comment has been minimized.

Member

mdempsky commented May 23, 2017

But I don't think we can realistically do that as functions like recv intentionally return EINTR if a timeout has been set on the socket using setsockopt(SO_RCVTIMEO).

POSIX says SO_RCVTIMEO causes it to return EAGAIN or EWOULDBLOCK. Linux man pages also mention EINPROGRESS. I don't see any mention of EINTR.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented May 23, 2017

If a socket has a timeout set by SO_RCVTIMEO, and the recv times out, then it will return EAGAIN or EWOULDBLOCK. But if a socket has a timeout set, and the thread that called recv receives a signal, then the recv will return EINTR. The reason I called out this case is that if a recv with a timeout set is interrupted by a signal, then it returns EINTR whether or not the signal handler was installed using SA_RESTART. So changing syscall.Recv to loop on EINTR would be a change in behavior even for existing Go programs that use SA_RESTART as expected.

But thinking about it more I'm not sure this is worth worrying about. Go programs can not predict that thread that will receive a signal, so they can not predict whether syscall.Recv will return EINTR or not. So, never mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment