Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

child_process descriptor passing doesn't pause underlying handle properly. #7905

Closed
samcday opened this issue Jul 7, 2014 · 6 comments
Closed

Comments

@samcday
Copy link

samcday commented Jul 7, 2014

(affects latest 0.10 and 0.11)


TL;DR version:

  • Sockets call _handle.readStart() as soon as connection is made.
  • This means that under load, some data can be pulled from the Socket before its handle is detached from current process and sent over IPC channel to another destination.
  • This means the other end will never see that initial data that was pulled.

First, an example:

https://gist.github.com/samcday/d20f91aae0e4dbad27e8

Here's the scenario I'm trying to demonstrate:

  1. Connection is accepted by a cluster worker.
  2. Connection is passed to the master.
  3. Master reads initial data from connection, writes to the connection, and then closes it.

Run server.js in one shell, then run client.js in another. Run it a few times if you don't see the issue at first. What you should discover is that often some of the 10 connections won't write back to the client. This is because the readable event is never fired. This is because no data ever came in once the master got the connection. This is because actually what happened is the worker shovelled some of the bytes out before the connection descriptor was fully detached and sent to the master.

The example demonstrates a somewhat contrived scenario, I know you would generally not want to be sending connections off to the master to process, this is what you have workers for! My actual use-case involves a worker passing a connection to the master to be directed over to another worker.

What I've determined is that Socket._read is actually calling readStart on the underlying handle after connection. Even though this isn't putting the socket into flowing mode, it's still going to grab some bytes from the underlying socket before it gets passed to another process.

I can currently work around this issue by doing something like this:

server.on("connection", function(c) {
    c._handle.readStop();
});

This is a bit of a hack, and only works because I know that c._handle.readStart() is called in Socket._read in a once handler on Socket connection. But, unless I'm missing something, this is what child_process.js should be doing in the handleConversion stuff.

I believe this is what is causing #7784

It's midnight here and I've spent the past 3 days bashing my head against the wall 'till the wee hours of the morning figuring out a problem that was finally traced down to this. So, I apologise if the description reads a little ranty or nonsensical :P

@indutny
Copy link
Member

indutny commented Aug 27, 2014

This is indeed a bug, sorry for a long reply. cc @trevnorris @tjfontaine , could be a major issue with a RR balancing in v0.11/v0.12.

@samcday
Copy link
Author

samcday commented Aug 28, 2014

@indutny no worries. Given that it affects both 0.10 and 0.11, I'm not inclined to think that this issue can be attributed to the new round-robin descriptor in 0.11.

My gut feel is this is a fundamental issue with how either libuv or node is handling passed descriptors.

@trevnorris
Copy link

Sockets call _handle.readStart() as soon as connection is made.

@indutny Why do we do this? A possible, hack-ish, solution would be to place a conditional check that runs readStart() in a process.nextTick() to see whether the process still has control of the handle or not. That way if cluster passes the handle and then removes the reference then readStart() will not run. Thoughts?

@cjihrig
Copy link

cjihrig commented Oct 8, 2014

@trevnorris what if Socket had methods to wrap the handle's readStart() and readStop()? You could pass an option to net.Server() which would create stopped sockets. You could do whatever with the socket, then start it once you're ready. The existing behavior seems to be fine in most common cases, while this proposed flag could be used when you want to pass the connection around before reading data from it.

@samcday
Copy link
Author

samcday commented Oct 22, 2014

@maxime-crunding not sure what we're specifically looking at in there that is relevant to this issue.

socketcluster is not attempting to pass connection file descriptors around via cluster messaging.

trevnorris pushed a commit that referenced this issue Oct 27, 2014
Currently when a server receives a new connection the underlying socket
handle begins reading data immediately. This causes problems when
sockets are passed between processes, as data can be read by the first
process and thus never read by the second process.

This commit allows sockets that are constructed with a handle to be
paused initially.

PR-URL: #8576
Fixes: #7905
Fixes: #7784
Reviewed-by: Trevor Norris <trev.norris@gmail.com>
@trevnorris
Copy link

Fixed by c2b4f48.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants