-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: write should handle EINTR #38033
Comments
Am I reading this correctly in that you are writing to files opened under |
@kolyshkin, does that imply that the Linux kernel is out-of-spec for those (But we should arguably handle |
Marking as release-blocker for 1.15. Once we decide what (if anything) to do here, we may want to backport to 1.14. |
@bcmills glibc does not retry |
I'm afraid so, yes. Seems like a regression, too, from where I stand. The alternative is to document that go1.14 may fail with EINTR from the standard library, not just |
For my own edification, can you explain when code would write to |
Sorry I haven't mentioned it earlier, this is actually cgroupfs (used to set various per-cgroup controller limits). In particular,
In our case, it happens during a container start (a "container" is Docker, Kubernetes or something similar). We can certainly workaround it in runc. I'm just not sure what other Linux pseudo-filesystems can behave like that, and so what other similar workarounds are needed. |
In my experience with weird kernel FSes like this, you can experience all sorts of weirdness, but all of it falls within the documented bounds of the relevant syscalls. |
@thockin In this case the reported behavior seems clearly not permitted. The signal handler is installed with |
The test code above doesn't trap any signals. If there's a signal being delivered - what signal? And if it is being trapped - who is installing that handler? |
And importantly - why does this appear in go-1.14 and not go-1.12 (which is what I had available)? I dug around /proc looking for some clue on the differences between the 1.14 compiled binary and the 1.12 binary. Nothing obvious. Both show the same Sig* fields in /proc/pid/status:
And yet the 1.14 fails reliably and 1.12 passes reliably. |
Throwing a signal handler into the test shows that 1.14 experiences SIGURG and 1.2 does not. The existence of the generic handler does not change the failure mode. Commenting out the bulk of the test, the SIGURG does not get triggered. Adding back just the |
@thockin The Go 1.14 runtime delivers |
The signal handler--all Go signal handlers--is installed by the Go runtime when a Go program starts. |
Well, I am seeing a SIGURG perfectly with syscall.Open() on 1.14 when I do not see that signal on 1.12. I don't know if that is a red herring, but the correlation is there. Reducing the test case, just to get a handle on SIGURG:
Again, maybe SIGURG is not the issue, but it is the most obvious visible difference between 12 and 14, and the error is related to signals... |
I am only pursuing this for fun, and I have run out of time for it right now. It's clearly reproducible, and it MAY be a problem in cgroupfs, but it didn't happen in Go 1.12 and it does in Go 1.14. Happy hunting. |
The What is weird is that it interrupts the |
Just to be clear, it's expected that my user program would see SIGURG ? If so, OK. It's not clear what is interrupting write() - I only see that SIGURG once, and importantly - no other signals. It's also clear that the signal handler is not happening synchronously to the main execution - putting a sleep in there, my test still dies almost immediately.
|
As of the Go 1.14 release, the Go runtime uses To me the important point here is that the Go runtime installs the I ran your test under
so the
so the signal is indeed interrupting the |
If I repeatedly send SIGURG to the go-1.12 binary, I have not been able to reproduce the problem. I can see in kernel code where cgroups' memory code returns -EINTR, but I can also see arch/x86 code that handles SA_RESTART. Weird |
@thockin, the runtime sends thread-targeted signals. Perhaps you are testing by sending a process-targeted |
Indeed! Sending a SIGURG to each thread of the process does
eventually trigger the failure.
I'm not sure what to do with that - I have no idea what the difference
is within the kernel that would be causing SA_RESTART to be ignored.
That is much more plausibly a kernel bug, though.
…On Wed, Mar 25, 2020 at 8:57 PM Bryan C. Mills ***@***.***> wrote:
@thockin, the runtime sends thread-targeted signals. Perhaps you are testing by sending a process-targeted SIGURG instead? The difference may be relevant.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Change https://golang.org/cl/232862 mentions this issue: |
Historically we've assumed that we can install all signal handlers with the SA_RESTART flag set, and let the system restart slow functions if a signal is received. Therefore, we don't have to worry about EINTR. This is only partially true, and we've added EINTR checks already for connect, and open/read on Darwin, and sendfile on Solaris. Other cases have turned up in golang#36644, golang#38033, and golang#38836. Also, golang#20400 points out that when Go code is included in a C program, the C program may install its own signal handlers without SA_RESTART. In that case, Go code will see EINTR no matter what it does. So, go ahead and check for EINTR. We don't check in the syscall package; people using syscalls directly may want to check for EINTR themselves. But we do check for EINTR in the higher level APIs in os and net, and retry the system call if we see it. This change looks safe, but of course we may be missing some cases where we need to check for EINTR. As such cases turn up, we can add tests to runtime/testdata/testprogcgo/eintr.go, and fix the code. If there are any such cases, their handling after this change will be no worse than it is today. For golang#22838 Fixes golang#20400 Fixes golang#36644 Fixes golang#38033 Fixes golang#38836 Change-Id: I7e46ca8cafed0429c7a2386cc9edc9d9d47a6896 Reviewed-on: https://go-review.googlesource.com/c/go/+/232862 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> (cherry picked from commit 8c1db77)
After deploying a version of gvisor built with Go 1.14, we're seeing errors setting up cgroups (we manually run `runsc` via `runsc run`, which creates the cgroup). This turns out to be a known issue with Go: golang/go#38033. Given that the [fix won't be backported](golang/go#39026 (comment)), we should retry writes that may fail with EINTR. This is also what runc does: opencontainers/runc#2258 FUTURE_COPYBARA_INTEGRATE_REVIEW=#3102 from stripe:andrew/cgroup-eintr 079123b PiperOrigin-RevId: 320183771
After deploying a version of gvisor built with Go 1.14, we're seeing errors setting up cgroups (we manually run `runsc` via `runsc run`, which creates the cgroup). This turns out to be a known issue with Go: golang/go#38033. Given that the [fix won't be backported](golang/go#39026 (comment)), we should retry writes that may fail with EINTR. This is also what runc does: opencontainers/runc#2258 FUTURE_COPYBARA_INTEGRATE_REVIEW=#3102 from stripe:andrew/cgroup-eintr 079123b PiperOrigin-RevId: 323575152
Now that Go [1.15](Go://go.googlesource.com/go/+/refs/tags/go1.15) has been released, we should update the default Go version to 1.15. Since the [EINTR issue](golang/go#38033) was fixed in 1.15, we can simply move from 1.13 to 1.15. On the other hand, we should not add 1.14, as the [EINTR bug fix](https://go-review.googlesource.com/c/go/+/232862/) was not backported to 1.14.
Now that Go [1.15](Go://go.googlesource.com/go/+/refs/tags/go1.15) has been released, we should update the default Go version to 1.15. Since the [EINTR issue](golang/go#38033) was fixed in 1.15, we can simply move from 1.13 to 1.15. On the other hand, we should not add 1.14, as the [EINTR bug fix](https://go-review.googlesource.com/c/go/+/232862/) was not backported to 1.14.
Now that Go [1.15](Go://go.googlesource.com/go/+/refs/tags/go1.15) has been released, we should update the default Go version to 1.15. Since the [EINTR issue](golang/go#38033) was fixed in 1.15, we can simply move from 1.13 to 1.15. On the other hand, we should not add 1.14, as the [EINTR bug fix](https://go-review.googlesource.com/c/go/+/232862/) was not backported to 1.14.
Wrap some syscalls (lgetattr, lsetattr, fstatfs, statfs) to retry on EINTR. Related: - containers/storage#757 - https://go-review.googlesource.com/c/go/+/249178 - golang/go#38033 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Wrap some syscalls (lgetattr, lsetattr, fstatfs, statfs) to retry on EINTR. Related: - containers/storage#757 - https://go-review.googlesource.com/c/go/+/249178 - golang/go#38033 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Go 1.14 release notes (https://golang.org/doc/go1.14#runtime) say
This gives me an impression that unless I'm using
syscall
orgolang.org/x/sys/unix
directly, my code is fine.From the signal(7) man page on Linux:
Assuming that golang runtime always sets signal handlers with
SA_RESTART
, taken all the above into account, it seems right to conclude that using a high-level write function such asioutils.WriteFile
oros.Write
should be fine.Turns out it's not.
Originally reported in opencontainers/runc#2258
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
The code below should be compiled (
go test -c
) and run as root on a modern Linux system.What did you expect to see?
(the test passes with go 1.13.6 and probably all earlier versions)
What did you see instead?
The text was updated successfully, but these errors were encountered: