Add the ability to wire-up listeners before starting a child process #38081

TheYarin · 2021-04-04T18:58:30Z

Is your feature request related to a problem? Please describe.
When registering multiple listeners (callbacks) to the data event of a child process's stdout, there's no way to get the child process to wait for all the callbacks to be registered before starting. This means there's a window between registering the first listener and the second one in which the first listener might "pull" the first available chunk and when the second listener is registered, it won't receive the first chunk.

A thinned-down example:

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000

// Edge-case scenario
const { exec } = require("child_process");
p = exec("seq 1000");
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
setTimeout(() => {
  p.stdout.on("data", (d) => a2 += d);
}, 500);
// When the child process completes, a1 will contain all the numbers from 1 to 1000 while a2 will remain an empty string

From what I understand from reading the documentation of child_process, when a child process is started nodejs saves it's output in a buffer until a listener is registered (either by directly binding to the 'data' event or by pipe()ing stdout to a writable stream). This behaviour creates two potential problems:

A second listener might not get the same data as the first one.
the child process might output more data than the buffer can contain before any data can be processed.

Describe the solution you'd like
The solution I propose is to allow wiring up all the listeners and pipes before starting the child process.
Considering backwards compatibility, I imagine the best way to achieve this is by passing a new option (something like autostart that will default to true) to the options parameter of spawn, exec etc., that will make those functions return a ChildProcess instance that was not yet started, together with a new start() method added to the ChildProcess class.

Describe alternatives you've considered
The alternatives as I see them are:

Only register a single handler and pass the data around to your multiple destinations.
Try to proxy the readable stream to a second one that is already wired up.
Try your best to minimize that window and hope for the best.

The text was updated successfully, but these errors were encountered:

Trott · 2021-04-06T05:40:00Z

If I'm understanding you correctly, there's no problem here. I think you might need to take some time to understand the event loop.

In this code that you provide, the listeners are wired up before the child process is able to emit any events

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000

This will always exhibit the expected behavior because "data" events cannot fire until the current tick in the event loop is done. So both listeners will always receive the same events. No 'data' events can be emitted until all the code above runs. Only then will 'data' events be emitted. Events that might normally be fired while the code is running are queued up and emitted after.

On the other hand:

// Edge-case scenario
const { exec } = require("child_process");
p = exec("seq 1000");
a1 = '';
a2 = '';
p.stdout.on("data", (d) => a1 += d);
setTimeout(() => {
  p.stdout.on("data", (d) => a2 += d);
}, 500);
// When the child process completes, a1 will contain all the numbers from 1 to 1000 while a2 will remain an empty string

Given this code, I would not expect a1 and a2 to have the same values. This is expected behavior and not an edge case. setTimeout() schedules the event for later and returns control to the event loop (because it's the last command). So events start firing. At some point, the timer code runs and only then is a2 picking up events.

Trott · 2021-04-06T05:49:37Z

Try your best to minimize that window and hope for the best.

At least in the sample code you provide, there is no window. No need to hope for the best. Even this works as expected:

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
a3 = '';
a4 = '';
a5 = '';
a6 = '';
a7 = '';
a8 = '';
a9 = '';

p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);
p.stdout.on("data", (d) => a3 += d);
p.stdout.on("data", (d) => a4 += d);
p.stdout.on("data", (d) => a5 += d);
p.stdout.on("data", (d) => a6 += d);
p.stdout.on("data", (d) => a7 += d);
p.stdout.on("data", (d) => a8 += d);
p.stdout.on("data", (d) => a9 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000

p.on('exit', () => {console.log(a1.length, a9.length);});

I'm going to go ahead and close this, but feel free to comment or re-open if I've misunderstood the issue here. Thanks for requesting a feature!

TheYarin · 2021-04-06T09:05:13Z

So you're saying the user should trust that events registered will not be fired until there's a context switch. Alright, if that's so inherent in nodejs programming then you're right, the edge case I described is not inherently a problem.

And still, I can't seem to shake the feeling that just spawning a process to the air without allowing any prior wireup is a bad idea.
What if there's a heavy operation to be done between the spawning and the wireup? In that case problem 2 (the child process might output more data than the buffer can contain before any data can be processed) may still be an issue.
And what if I need some context changes between the spawning and the wireup? Of course you could say, "just move the wireup closer to the spawning!", but that might complicate the code's design by adding an additional unnecessary constraint. (unnecessary given the solution I proposed)

This is just my hunch, I have no concrete examples here. What do you think?

Trott · 2021-04-07T13:59:42Z

So you're saying the user should trust that events registered will not be fired until there's a context switch.

Yes, because if not, then the event loop is broken and nothing will work the way it is supposed to. This is similar to the way the user also needs to trust that the first line of the file is parsed and executed before the second line. That analogy is an exaggeration, but much less of one than one might think.

What if there's a heavy operation to be done between the spawning and the wireup?

Like this? Still works as expected.

// Expected behaviour scenario
const { exec } = require("child_process");
p = exec("seq 1000"); // This command prints the numbers between 1 and 1000, each in a different line
a1 = '';
a2 = '';
a3 = '';
a4 = '';
a5 = '';
a6 = '';
a7 = '';
a8 = '';
a9 = '';

arr = [];
console.log('starting loop')
for (let i=0; i<99999999; i++) {
  arr.push(Math.random(i));
}
console.log('ending loop');
console.log(arr.length);
p.stdout.on("data", (d) => a1 += d);
p.stdout.on("data", (d) => a2 += d);
p.stdout.on("data", (d) => a3 += d);
p.stdout.on("data", (d) => a4 += d);
p.stdout.on("data", (d) => a5 += d);
p.stdout.on("data", (d) => a6 += d);
p.stdout.on("data", (d) => a7 += d);
p.stdout.on("data", (d) => a8 += d);
p.stdout.on("data", (d) => a9 += d);

// When the child process completes, a1 and a2 will both contain all the numbers from 1 to 1000

p.on('exit', () => {console.log(a1.length, a9.length);});

In that case problem 2 (the child process might output more data than the buffer can contain before any data can be processed) may still be an issue.

I don't believe that is correct. https://stackoverflow.com/a/35464318/436641

TheYarin · 2021-04-08T10:55:52Z

Since your example didn't quite stress the default maxBuffer size (1024*1024), I thought of a better example to demonstrate problem 2:

const { exec } = require("child_process");

p = exec("seq 100000", { maxBuffer: 100 }); // large output, too small of a buffer

arr = [];
console.log("starting loop");
for (let i = 0; i < 99999999; i++) {
  arr.push(Math.random(i));
}
console.log("ending loop");

p.stdout.on("data", (d) => console.log(d));

But surprisingly, the buffer limit simply didn't kick in.

The docs say (regarding maxBuffer):

Largest amount of data in bytes allowed on stdout or stderr. If exceeded, the child process is terminated and any output is truncated.

But in reality, setting a low maxBuffer value and pumping more output than it should have been able to handle just worked.

So.. now I'm just confused.

Trott · 2021-04-08T12:56:38Z

But in reality, setting a low maxBuffer value and pumping more output than it should have been able to handle just worked.

So.. now I'm just confused.

I imagine it's working because maxBuffer applies to the ChildProcess object returned by exec() (in your case, p) but does not directly cause the underlying 'data' event on p.stdout to truncate. But just in case I'm glossing over things and there's a bug here or something: Hey, @nodejs/child_process, is that correct? It does seem a bit surprising, although perhaps not once you realize p.stdout is a stream?

Anyway, @TheYarin, this will get the truncated stdout you were expecting:

const { exec } = require("child_process");

p = exec("seq 100000", { maxBuffer: 100 }, (err, stdout, stderr) => {
  console.log('ERR', err);
  console.log('STDOUT', stdout);
  console.log('STDERR', stderr);
});

Adding in the long for loop doesn't change the result other than making you wait a while to get it because the event loop is blocked.

const { exec } = require("child_process");

p = exec("seq 100000", { maxBuffer: 100 }, (err, stdout, stderr) => {
  console.log('ERR', err);
  console.log('STDOUT', stdout);
  console.log('STDERR', stderr);
});

// Blocks the event loop but doesn't change the ultimate result of the command above.
arr = [];
console.log("starting loop");
for (let i = 0; i < 99999999; i++) {
  arr.push(Math.random(i));
}
console.log("ending loop");

Ayase-252 added feature request Issues that request new features to be added to Node.js. child_process Issues and PRs related to the child_process subsystem. labels Apr 5, 2021

Trott closed this as completed Apr 6, 2021

molllyn1 mentioned this issue Jun 8, 2023

[Snyk] Fix for 4 vulnerabilities molllyn1/node#79

Open

This was referenced Jun 8, 2023

[Snyk] Fix for 4 vulnerabilities kissofire/node#98

Open

[Snyk] Fix for 4 vulnerabilities aliceUnhinged613/node#93

Open

ron-a20 mentioned this issue Jun 8, 2023

[Snyk] Fix for 4 vulnerabilities ron-a20/node#96

Open

baby636 mentioned this issue Jun 9, 2023

[Snyk] Fix for 4 vulnerabilities baby636/node#93

Open

137717unity mentioned this issue Jun 9, 2023

[Snyk] Fix for 4 vulnerabilities 137717unity/node#96

Open

wedataintelligence mentioned this issue Jun 9, 2023

[Snyk] Fix for 4 vulnerabilities wedataintelligence/node#78

Open

kissofire mentioned this issue Jun 25, 2023

[Snyk] Fix for 28 vulnerabilities kissofire/node#102

Open

kissofire mentioned this issue Sep 28, 2023

[Snyk] Fix for 1 vulnerabilities kissofire/node#103

Open

molllyn1 mentioned this issue Sep 28, 2023

[Snyk] Fix for 1 vulnerabilities molllyn1/node#83

Open

ron-a20 mentioned this issue Sep 28, 2023

[Snyk] Fix for 1 vulnerabilities ron-a20/node#99

Open

137717unity mentioned this issue Sep 28, 2023

[Snyk] Fix for 1 vulnerabilities 137717unity/node#100

Open

aliceUnhinged613 mentioned this issue Sep 28, 2023

[Snyk] Fix for 1 vulnerabilities aliceUnhinged613/node#96

Open

baby636 mentioned this issue Sep 28, 2023

[Snyk] Fix for 1 vulnerabilities baby636/node#105

Open

wedataintelligence mentioned this issue Sep 29, 2023

[Snyk] Fix for 1 vulnerabilities wedataintelligence/node#82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability to wire-up listeners before starting a child process #38081

Add the ability to wire-up listeners before starting a child process #38081

TheYarin commented Apr 4, 2021 •

edited

Loading

Trott commented Apr 6, 2021

Trott commented Apr 6, 2021

TheYarin commented Apr 6, 2021 •

edited

Loading

Trott commented Apr 7, 2021

TheYarin commented Apr 8, 2021

Trott commented Apr 8, 2021

Add the ability to wire-up listeners before starting a child process #38081

Add the ability to wire-up listeners before starting a child process #38081

Comments

TheYarin commented Apr 4, 2021 • edited Loading

Trott commented Apr 6, 2021

Trott commented Apr 6, 2021

TheYarin commented Apr 6, 2021 • edited Loading

Trott commented Apr 7, 2021

TheYarin commented Apr 8, 2021

Trott commented Apr 8, 2021

TheYarin commented Apr 4, 2021 •

edited

Loading

TheYarin commented Apr 6, 2021 •

edited

Loading