Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race between Send()/Recv() and Close() in push/pull #185

Open
tjim opened this issue Feb 26, 2020 · 4 comments
Open

Race between Send()/Recv() and Close() in push/pull #185

tjim opened this issue Feb 26, 2020 · 4 comments

Comments

@tjim
Copy link

tjim commented Feb 26, 2020

I am using a pull socket with multiple pushers. One pusher sends a message and immediately closes the socket. On the pull side, a Recv() never results in the sent message.

When I observe the network traffic I can see that the pull-side network stack does receive the message before the TCP close.

If I introduce a sufficient delay on the push side before closing, then the message does come back as the result of the Recv(). The delay needs to be greater when the sender is closer, latency-wise, to the receiver. I am currently sleeping 1s, for my current deployment.

Because this is a pull with multiple pushers, the Recv() never returns an error, it continues to return messages sent by other pushers. It's just that the one message is never Recv()'d. So, this case is hard to detect/debug.

I'm not even sure if this is a bug, in any case would appreciate best practices documentation.

I should remark on the comment in issue 111:

So there is a bug here, which is that the send program is simply exiting too quickly, before the message goes out over the wire.

Here I have looked at the network traffic, and have verified that the message does go out over the wire, so that comment does not apply here. However, it may be that #111 is the same issue I am reporting here. I don't know whether the network traffic was examined in #111 as I have done here. My issue is manifesting on the receive side for sure.

@gdamore
Copy link
Contributor

gdamore commented Feb 26, 2020

I think this does sound like 111. Seeing the traffic over the wire doesn't truly count because the connection closing hard (via RST) can cause that data to get lost.

It may be that we need to take care to shutdown the connection more gracefully.

I've been contemplating wire protocol changes to provide a separate close indication so that we know when it's safe to close actual connection. However, that's nothing that is going to come in the immediate future, as it would break compatibility with the existing implementations.

Short lived connections are really a weak point in all of the nanomsg protocols.

@tjim
Copy link
Author

tjim commented Feb 27, 2020

This is a normal close, no RST, it is FIN ACK as usual. So, I think we can say the problem is restricted to the receiving side, and, I don't think your comment on 111 applies. Now, 111 might be the same issue but that would require more investigation. In other words 111 might have been a normal close, as here, so that your comment on 111 does not actually apply to 111 either.

Short-lived connections are a bit of a red herring. This problem can come up in a long-lived connection as well. The issue is "not enough" time between the last message sent and the close.

@gdamore
Copy link
Contributor

gdamore commented Feb 27, 2020

Yes,well, technically you're correct about not enough time between last sent and close. What is needed is formally referred to as "lingering", but we lack that and the wire protocols lack the necessary bits to do this properly portably.

It's possible that we've a message in a receive buffer (we've taken from the wire, but haven't delivered it to the core part of the socket -- so it could be either in a kernel buffer which seems unlikely, or queued on a per-connection buffer in mangos itself, and that we're discard it when we lose the pipe. I'll have a closer look.

@gdamore
Copy link
Contributor

gdamore commented Mar 29, 2020

I don't think I can fix this without a wire protocol change. I'm going to keep the ticket open, but don't expect anything soon. This is probably a mangos v4 deliverable. I'm considering a few different wire protocol changes, and this is just one more item to the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants