Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to wire-up listeners before starting a child process #38081

Closed
TheYarin opened this issue Apr 4, 2021 · 6 comments
Closed

Add the ability to wire-up listeners before starting a child process #38081

TheYarin opened this issue Apr 4, 2021 · 6 comments
Labels
child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js.

Comments

@TheYarin
Copy link

TheYarin commented Apr 4, 2021

Is your feature request related to a problem? Please describe.
When registering multiple listeners (callbacks) to the data event of a child process's stdout, there's no way to get the child process to wait for all the callbacks to be registered before starting. This means there's a window between registering the first listener and the second one in which the first listener might "pull" the first available chunk and when the second listener is registered, it won't receive the first chunk.

A thinned-down example:

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000
// Edge-case scenario
const { exec } = require("child_process");
p = exec("seq 1000");
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
setTimeout(() => {
  p.stdout.on("data", (d) => a2 += d);
}, 500);
// When the child process completes, a1 will contain all the numbers from 1 to 1000 while a2 will remain an empty string

From what I understand from reading the documentation of child_process, when a child process is started nodejs saves it's output in a buffer until a listener is registered (either by directly binding to the 'data' event or by pipe()ing stdout to a writable stream). This behaviour creates two potential problems:

  1. A second listener might not get the same data as the first one.
  2. the child process might output more data than the buffer can contain before any data can be processed.

Describe the solution you'd like
The solution I propose is to allow wiring up all the listeners and pipes before starting the child process.
Considering backwards compatibility, I imagine the best way to achieve this is by passing a new option (something like autostart that will default to true) to the options parameter of spawn, exec etc., that will make those functions return a ChildProcess instance that was not yet started, together with a new start() method added to the ChildProcess class.

Describe alternatives you've considered
The alternatives as I see them are:

  1. Only register a single handler and pass the data around to your multiple destinations.
  2. Try to proxy the readable stream to a second one that is already wired up.
  3. Try your best to minimize that window and hope for the best.
@Ayase-252 Ayase-252 added feature request Issues that request new features to be added to Node.js. child_process Issues and PRs related to the child_process subsystem. labels Apr 5, 2021
@Trott
Copy link
Member

Trott commented Apr 6, 2021

If I'm understanding you correctly, there's no problem here. I think you might need to take some time to understand the event loop.

In this code that you provide, the listeners are wired up before the child process is able to emit any events

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000

This will always exhibit the expected behavior because "data" events cannot fire until the current tick in the event loop is done. So both listeners will always receive the same events. No 'data' events can be emitted until all the code above runs. Only then will 'data' events be emitted. Events that might normally be fired while the code is running are queued up and emitted after.

On the other hand:

// Edge-case scenario
const { exec } = require("child_process");
p = exec("seq 1000");
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
setTimeout(() => {
  p.stdout.on("data", (d) => a2 += d);
}, 500);
// When the child process completes, a1 will contain all the numbers from 1 to 1000 while a2 will remain an empty string

Given this code, I would not expect a1 and a2 to have the same values. This is expected behavior and not an edge case. setTimeout() schedules the event for later and returns control to the event loop (because it's the last command). So events start firing. At some point, the timer code runs and only then is a2 picking up events.

@Trott
Copy link
Member

Trott commented Apr 6, 2021

Try your best to minimize that window and hope for the best.

At least in the sample code you provide, there is no window. No need to hope for the best. Even this works as expected:

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
a3 = '';
a4 = '';
a5 = '';
a6 = '';
a7 = '';
a8 = '';
a9 = '';

p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);
p.stdout.on("data", (d) => a3 += d);
p.stdout.on("data", (d) => a4 += d);
p.stdout.on("data", (d) => a5 += d);
p.stdout.on("data", (d) => a6 += d);
p.stdout.on("data", (d) => a7 += d);
p.stdout.on("data", (d) => a8 += d);
p.stdout.on("data", (d) => a9 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000

p.on('exit', () => {console.log(a1.length, a9.length);});

I'm going to go ahead and close this, but feel free to comment or re-open if I've misunderstood the issue here. Thanks for requesting a feature!

@Trott Trott closed this as completed Apr 6, 2021
@TheYarin
Copy link
Author

TheYarin commented Apr 6, 2021

So you're saying the user should trust that events registered will not be fired until there's a context switch. Alright, if that's so inherent in nodejs programming then you're right, the edge case I described is not inherently a problem.

And still, I can't seem to shake the feeling that just spawning a process to the air without allowing any prior wireup is a bad idea.
What if there's a heavy operation to be done between the spawning and the wireup? In that case problem 2 (the child process might output more data than the buffer can contain before any data can be processed) may still be an issue.
And what if I need some context changes between the spawning and the wireup? Of course you could say, "just move the wireup closer to the spawning!", but that might complicate the code's design by adding an additional unnecessary constraint. (unnecessary given the solution I proposed)

This is just my hunch, I have no concrete examples here. What do you think?

@Trott
Copy link
Member

Trott commented Apr 7, 2021

So you're saying the user should trust that events registered will not be fired until there's a context switch.

Yes, because if not, then the event loop is broken and nothing will work the way it is supposed to. This is similar to the way the user also needs to trust that the first line of the file is parsed and executed before the second line. That analogy is an exaggeration, but much less of one than one might think.

What if there's a heavy operation to be done between the spawning and the wireup?

Like this? Still works as expected.

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
a3 = '';
a4 = '';
a5 = '';
a6 = '';
a7 = '';
a8 = '';
a9 = '';

arr = [];
console.log('starting loop')
for (let i=0; i<99999999; i++) {
  arr.push(Math.random(i));
}
console.log('ending loop');
console.log(arr.length);
p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);
p.stdout.on("data", (d) => a3 += d);
p.stdout.on("data", (d) => a4 += d);
p.stdout.on("data", (d) => a5 += d);
p.stdout.on("data", (d) => a6 += d);
p.stdout.on("data", (d) => a7 += d);
p.stdout.on("data", (d) => a8 += d);
p.stdout.on("data", (d) => a9 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000

p.on('exit', () => {console.log(a1.length, a9.length);});

In that case problem 2 (the child process might output more data than the buffer can contain before any data can be processed) may still be an issue.

I don't believe that is correct. https://stackoverflow.com/a/35464318/436641

@TheYarin
Copy link
Author

TheYarin commented Apr 8, 2021

Since your example didn't quite stress the default maxBuffer size (1024*1024), I thought of a better example to demonstrate problem 2:

const { exec } = require("child_process");

p = exec("seq 100000", { maxBuffer: 100 }); // large output, too small of a buffer

arr = [];
console.log("starting loop");
for (let i = 0; i < 99999999; i++) {
  arr.push(Math.random(i));
}
console.log("ending loop");

p.stdout.on("data", (d) => console.log(d));

But surprisingly, the buffer limit simply didn't kick in.

The docs say (regarding maxBuffer):

Largest amount of data in bytes allowed on stdout or stderr. If exceeded, the child process is terminated and any output is truncated.

But in reality, setting a low maxBuffer value and pumping more output than it should have been able to handle just worked.

So.. now I'm just confused.

@Trott
Copy link
Member

Trott commented Apr 8, 2021

But in reality, setting a low maxBuffer value and pumping more output than it should have been able to handle just worked.

So.. now I'm just confused.

I imagine it's working because maxBuffer applies to the ChildProcess object returned by exec() (in your case, p) but does not directly cause the underlying 'data' event on p.stdout to truncate. But just in case I'm glossing over things and there's a bug here or something: Hey, @nodejs/child_process, is that correct? It does seem a bit surprising, although perhaps not once you realize p.stdout is a stream?

Anyway, @TheYarin, this will get the truncated stdout you were expecting:

const { exec } = require("child_process");

p = exec("seq 100000", { maxBuffer: 100 }, (err, stdout, stderr) => {
  console.log('ERR', err);
  console.log('STDOUT', stdout);
  console.log('STDERR', stderr);
});

Adding in the long for loop doesn't change the result other than making you wait a while to get it because the event loop is blocked.

const { exec } = require("child_process");

p = exec("seq 100000", { maxBuffer: 100 }, (err, stdout, stderr) => {
  console.log('ERR', err);
  console.log('STDOUT', stdout);
  console.log('STDERR', stderr);
});

// Blocks the event loop but doesn't change the ultimate result of the command above.
arr = [];
console.log("starting loop");
for (let i = 0; i < 99999999; i++) {
  arr.push(Math.random(i));
}
console.log("ending loop");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js.
Projects
None yet
Development

No branches or pull requests

3 participants