Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'data' event not emitted for every flush #33465

Closed
Almenon opened this issue May 19, 2020 · 4 comments
Closed

'data' event not emitted for every flush #33465

Almenon opened this issue May 19, 2020 · 4 comments

Comments

@Almenon
Copy link

Almenon commented May 19, 2020

  • Version: v12.14.1
  • Platform: Windows
  • Subsystem: 10

What steps will reproduce the bug?

  1. Create a python script that flushes twice in quick succession and then waits:
import sys;sys.stdout.write("a" * (8192 * 2 + 2));from time import sleep;sleep(3)

Python on windows (or at least on my machine) has a buffer size of 8912 bytes so the above write results in two flushes. You can check the buffer size by running: python -c "import io;print(io.DEFAULT_BUFFER_SIZE)". Also note that the above is a simplified one-liner that reproduces the issue, you can find a full python script with several examples here.

  1. Spawn the python script and listen for data
import { spawn } from "child_process"

const node_process = spawn("python", ["node_buffer_issue_python_script.py"])

node_process.on('error', err => console.error(err))

let numFlushes = 0
node_process.stdout.on('data', (buffer: Buffer) => {
  numFlushes += 1
  const str = buffer.toString()
  console.log(`flush number: ${numFlushes}\
    Buffer length: ${buffer.length}`)
  if (buffer.length < 20) console.log(str)
})

node_process.stdout.on('error', (err) => {
  console.error(err.toString())
})

node_process.stderr.on('data', (err) => {
  console.error(err.toString())
})

/*
Expected result: Many flushes, one for each 8192 bytes
Acutal result: 2 flushes. All of stdout is recieved at once. 4 seconds later comes the done log.

OUTPUT (with python printing 2 ** 29 bytes)
C:\dev\AREPL-backend\personal>ts-node test_node_buffer_python.ts
flush number: 1    Buffer length: 536870912
flush number: 2    Buffer length: 6
done

OUTPUT (with python printing 'a', flushing, printing 'aa', flushing)
C:\dev\AREPL-backend\personal>ts-node test_node_buffer_python.ts
flush number: 1    Buffer length: 3
aaa
flush number: 2    Buffer length: 6
done

OUTPUT (with python printing 'a', flushing, sleeping a tiniest amount, printing 'aa', flushing)
C:\dev\AREPL-backend\personal>ts-node test_node_buffer_python.ts
flush number: 1    Buffer length: 1
a
flush number: 2    Buffer length: 2
aa
flush number: 3    Buffer length: 6
done

*/

I'm not sure if it's related because I have no idea when node flushes stdout but I ran into the same issue when spawning a node child.

How often does it reproduce? Is there a required condition?

This only happens on Windows. On Linux it works as expected, as evidenced by https://repl.it/@almenon/testBufferSize

The other required condition is for flushes to happen within extremely quick succession, as in one flush right after another. Sleeping inbetween flushes results in the 'data' event coming through as expected.

Practically speaking this has resulted in unexpected behavior in AREPL. Presumably other projects with a python subprocess that writes a lot of data in one write run into this problem as well. Python on Windows 10 has a small buffer size of 8912 bytes so it doesn't take that much data to exceed the buffer and cause a flush.

What is the expected behavior?

The node docs for the 'data' event of net.Socket say that 'data' is "Emitted when data is received," in which case I would expect that data is emitted whenever a flush happens in the child process.

What do you see instead?

'data' event is emitted after being received

Additional information

@bzoz
Copy link
Contributor

bzoz commented May 19, 2020

There is no way for Node to know if/how the data was flushed. The stdio pipe is just a stream of bytes.

@Almenon
Copy link
Author

Almenon commented May 19, 2020

@bzoz when the flush happens bytes are written to the pipe, in which case node should know right? I suppose I'm not understanding stdio properly, can you clarify when node will emit a data event please?

@bzoz
Copy link
Contributor

bzoz commented May 19, 2020

TBH, the best answer would be "when it feels like it". Deep down Node looks how many bytes are there in the pipe and then reads them. If for whatever reason Node event loop gets delayed, you will get one data event regardless of how many writes there where. You should not depend on those, you should handle all of the child process output coming at once.

BTW, this has some info on how Node behaves if the stdio is either a pipe (like in this python example) or when it is a TTY on different systems : https://nodejs.org/api/process.html#process_a_note_on_process_i_o.

@bzoz
Copy link
Contributor

bzoz commented May 21, 2020

I don't think there is anything actionable here for Node, as it behaves as expected. Feel free to reopen this issue or create a new one if you find other problems with Node.

@bzoz bzoz closed this as completed May 21, 2020
Almenon added a commit to Almenon/AREPL-backend that referenced this issue May 23, 2020
Impossible to predict how much data arrives in stdout
So I need to split by newline to avoid JSON error

nodejs/node#33465
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants