Skip to content

Streams blocking read and EOF handling for pipes #10171

@bukka

Description

@bukka

Description

The plain wrapper is usually used for files but it is also used for pipes created by proc_open. Currently EOF in plain wrapper is set only after an empty read (0 bytes returned). There is a special case for files, memory and temp streams to do another read if less than requested bytes is returned which sets EOF if no more data is present. To have a consistent behavior it might be convenient to do the same for pipes to identify whether EOF is reached. This is exactly what glibc fread does after opening pipe descriptor using fdopen. It might make sense to apply the same logic for PHP fread. However in some cases returning 0 bytes does not mean that there won't be more data coming in the pipe especially when proc opening interactive program or a program with delays in the processing. So it might actually make more sense treat pipes as sockets and for blocking read apply timeouts after which the eof is set and return what's in the buffer until then.

In any case something needs to be done about the current inconsistency between single and multiple reads reading the same amount of bytes. To explain that, it's best to give an example of such behavior. Lets consider following script called pipe_eof_test.php:

<?php
$descriptorspec=array(
	0 => STDIN,
	1 => array("pipe", "w"),
	2 => STDERR
);
$p = proc_open(['echo', '-n', '01234567890123456789'], $descriptorspec, $fd);

$res = '';
$size = $argv[1];
while (strlen($s = fread($fd[1], $size)) == $size) {
	$res .= $s;
}
$res .= $s;

fprintf(STDERR, "Result: %s, EOF: %d\n", $res, feof($fd[1]));

It basically reads from a pipe containing 20 bytes (echo...). The result differs depending on the size of the requested bytes. So the EOF is 0 if $size is over 20 but it is 1 if it is below 20 which can be seen when running script

$ php pipe_eof_test.php 32
Result: 01234567890123456789, EOF: 0
$ php pipe_eof_test.php 16
Result: 01234567890123456789, EOF: 1

The reason is that all available bytes are read internally to 8k buffer on the first read. In the first case, we use only a single read as less bytes are requested. It means there is no 0 read and EOF is 0. In the second case, we need a second read because all 16 bytes are returned in first read and then only 4 bytes are left in the buffer. Because 16 bytes are requested in the 2nd read, another read is attempted to fill more bytes in which results in 0 bytes read and setting EOF.

If we used socket logic the same logic as for files we would get EOF 1 in both cases because in the first case, it would still attempt to read more bytes to fill the buffer so we would get 0 bytes. The problem is if the such program is interactive and there are more bytes coming in the pipe later. In such case, the program hangs which is actually what happens currently when reading less bytes as the 2nd read blocks. Also the blocking is not limited by timeout which is also not convenient.

It seems though that using the same logic as for socket is more convenient as we would get EOF 0 in both cases. Only if we did another read, it would apply timeout and set EOF only after the timeout.

This should be treated as a feature request because it is a subtle BC break and changing the implementation to use the same logic for pipes as for sockets might be slightly bigger and more involved.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions