Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for adjustment of stdout buffer size (aka highWaterMark?) of child_process.spawn() #41611

Closed
AndrewJDR opened this issue Jan 20, 2022 · 9 comments
Labels
child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js. stale stdio Issues and PRs related to stdio.

Comments

@AndrewJDR
Copy link

What is the problem this feature will solve?

It does not appear to be possible [1] to adjust the buffer size (highWaterMark) for the stdout from a child_process.spawn() call. This very likely [2] leads to lower throughput when dealing with large amounts of output from a child process, since more event loop iterations are required to consume it. Increasing the buffer size involves trading away event loop latency, but the latency vs. throughput tradeoff decision should ideally be in the hands of the javascript developer as it is with most other read / write streams in nodejs.

  1. Please see Stack Exchange answer from @mmomtchev - https://stackoverflow.com/a/70652730/9129863 and unanswered node help question - How to adjust the the stdout buffer size (aka highWaterMark) of child_process.spawn()? help#3683
  2. While I'd like to provide some performance benchmarks with some large data processing with small vs. large buffer sizes, since the setting is not easily tweakable it's not straightforward

What is the feature you are proposing to solve the problem?

Some way of adjusting the highWaterMark (or equivalent) of the stdout stream of a child_process.spawn() call.

What alternatives have you considered?

As far as I know, there isn't one. You just have to accept the default of 64KB.

@AndrewJDR AndrewJDR added the feature request Issues that request new features to be added to Node.js. label Jan 20, 2022
@mmomtchev
Copy link
Contributor

mmomtchev commented Jan 20, 2022

Related to Node.js PR #39097 and issue #39092

I am currently trying to push through libuv/libuv#3429 & libuv/libuv#3422 which will greatly reduce the performance impact of having smaller buffers for blocking I/O

The problem is that Node's I/O must remain fair and be able to advance a large number of streams with a limited number of threads - thus those streams need to be interruptible

Currently the performance impact of this is very significant because of the latency of the wake up mechanism between the worker thread and the event loop which schedules the next reads sequentially - IMO this is the real problem that needs solving

Reading 64Kb chunks in a tight loop is only marginally slower then reading big chunks (I clocked it at 75ns per additional syscall) - the scheduling of the next chunk is the only problem

@mmomtchev
Copy link
Contributor

@AndrewJDR if you want to test the impact of the buffer size, you must rebuild Node changing this line:

stream->alloc_cb((uv_handle_t*)stream, 64 * 1024, &buf);

(it is in fact in libuv)

@VoltrexKeyva VoltrexKeyva added the child_process Issues and PRs related to the child_process subsystem. label Jan 20, 2022
@Mesteery Mesteery added the stdio Issues and PRs related to stdio. label Jan 20, 2022
@AndrewJDR
Copy link
Author

@mmomtchev Thanks for all your help on this. Just to confirm if I understand correctly, you're saying if I increased the 64K setting in the libuv code that you pointed out and ran some tests with large data from child_process.spawn()'s stdout, I would find minimal performance improvement since many 64K chunks are actually being fed to my code by an (interruptable) tight loop inside the event loop, rather than my code being fed only one 64K chunk per event loop iteration? Thanks again, Andrew

@mmomtchev
Copy link
Contributor

I just don't know - try it - for readFile it can reach 40% on very large files, but it is typically around 20%, for TCP streams the effect is that the bandwidth falls sharply after reaching a certain threshold value that depends on what you are doing in the event loop.

@github-actions
Copy link
Contributor

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@github-actions github-actions bot added the stale label Jul 21, 2022
@github-actions
Copy link
Contributor

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.

@realies
Copy link

realies commented Sep 4, 2022

Don't close this issue, it's a needed feature.

@fabiospampinato
Copy link

Can this issue be reopened? It doesn't look like it ever got addressed, and it's a real issue.

@leeoniya
Copy link

leeoniya commented Sep 2, 2023

+1 to reopen this one.

my use case is a streaming CSV parser using fs.createReadStream()

you'll typically have a malformed (partially parsed) row at the end of each chunk, and it's easier to throw that partial parse away and pre-concat the partial line to the next chunk, and run the parser on that.

with 64KB chunks, you're throwing away a lot more partial parses than you would with 2MB chunks. when i manually accumulate (string concat) the 64KB chunks in memory until my 2MB threshold and flush that to my parser, i see the throughput increase from 94MiB/s to 104MiB/s.

it feels rather silly to to have to do this in userland simply due to lack of control over this 64KB chunk size. i'm certain the memory usage would also be reduced if this was done in libuv instead of in JS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
child_process Issues and PRs related to the child_process subsystem. feature request Issues that request new features to be added to Node.js. stale stdio Issues and PRs related to stdio.
Projects
None yet
Development

No branches or pull requests

7 participants