Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime, syscall: occasional syscall failures with EINTR within ZMQ when using Go1.14beta1 but not Go1.11 #36281

Closed
interviewQ opened this issue Dec 26, 2019 · 7 comments

Comments

@interviewQ
Copy link

@interviewQ interviewQ commented Dec 26, 2019

What version of Go are you using (go version)?

1.14beta1

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Ubuntu 18.04

What did you do?

I am trying to upgrade go compiler from 1.11 to 1.14. While I tried out 1.14 beta1 with our app, I see occasionally system calls within zmq fail with EINTR.

What did you expect to see?

Our app runs fine in go 1.11. I expect it to run fine in go 1.14beta1 as well.

What did you see instead?

The app makes several zmq calls to talk to various nodes in a cluster. After upgrading to 1.14beta1, I see occasionally system calls within zmq fail with EINTR. This usually happens if the system call is interrupted by a signal.

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Dec 26, 2019

Thank you for the report @interviewQ and welcome to the Go project!

It would be great if perhaps you could make a minimal repro of sorts that anyone can run here as we investigate what's going. That could involve you isolating code sections based off stack traces/core dumps and also those core dumps could be useful too.

In the meantime I shall kindly cc some runtime, signals and syscall folks to be aware of this issue: @ianlancetaylor @randall77 @mknyszek @aclements

@odeke-em odeke-em changed the title Occassionally system calls within zmq fail with EINTR runtime, syscall: occassionally system calls within zmq fail with EINTR when using Go1.14beta1 but not Go1.11 Dec 26, 2019
@odeke-em odeke-em changed the title runtime, syscall: occassionally system calls within zmq fail with EINTR when using Go1.14beta1 but not Go1.11 runtime, syscall: occassional system call failures with EINTR within ZMQ when using Go1.14beta1 but not Go1.11 Dec 26, 2019
@odeke-em odeke-em changed the title runtime, syscall: occassional system call failures with EINTR within ZMQ when using Go1.14beta1 but not Go1.11 runtime, syscall: occasional system call failures with EINTR within ZMQ when using Go1.14beta1 but not Go1.11 Dec 26, 2019
@odeke-em odeke-em changed the title runtime, syscall: occasional system call failures with EINTR within ZMQ when using Go1.14beta1 but not Go1.11 runtime, syscall: occasional syscall failures with EINTR within ZMQ when using Go1.14beta1 but not Go1.11 Dec 26, 2019
@networkimprov
Copy link

@networkimprov networkimprov commented Dec 26, 2019

I suspect a bug in the zmq binding (goczmq or zmq4) which should check for EINTR on error and retry the call. That error is documented http://api.zeromq.org/master:zmq-msg-recv

Go 1.14 uses signals to preempt goroutines for scheduling.

@gopherbot
Copy link

@gopherbot gopherbot commented Dec 26, 2019

Change https://golang.org/cl/212657 mentions this issue: doc/go1.14: mention increased number of EINTR errors

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 27, 2019

I agree with @networkimprov that this is likely because Go 1.14 will use more signals, and I just sent a CL to add that to the release notes.

Can you point to some specific code that is failing with EINTR so we can verify that this is likely the problem? Thanks.

@interviewQ
Copy link
Author

@interviewQ interviewQ commented Dec 27, 2019

We are using https://github.com/pebbe/zmq4. I occasionally get EINTR error on the following system calls - zmq_poll, zmq_msg_recv, zmq_send. As per the recommendation above, I made changes to retry on EINTR. With this, the issue I am seeing is resolved.

The owner of https://github.com/pebbe/zmq4 should probably need to be notified of this.

@networkimprov
Copy link

@networkimprov networkimprov commented Dec 27, 2019

Discussed here, with solution for callers: pebbe/zmq4#17

gopherbot pushed a commit that referenced this issue Dec 27, 2019
Updates #36281

Change-Id: I3c4487caaf47566212dc62322b2e884e695ea7f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/212657
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Dec 27, 2019

zmq_poll, zmq_msg_recv, zmq_send are not system calls. But it's true that poll, recvmsg if the socket has a timeout, and sendmsg if the socket has a timeout will fail with EINTR on receipt of a signal, even if the signal handler has been installed with SA_RESTART. So I think this was always a potential problem with this package, but that problem has become much more likely to occur in Go 1.14.

It seems to me that there is nothing to change in Go itself here, so I am going to close this issue. Please comment if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.