Skip to content

Commit

Permalink
Fix GH-13071: Copying large files using mmap-able source streams may …
Browse files Browse the repository at this point in the history
…exhaust available memory and fail

Commit 5cbe5a5 disabled chunking for all writes to streams. However,
user streams have a callback where code is executed on data that is
subject to the memory limit. Therefore, when using large writes or
stream_copy_to_stream/copy the memory limit can easily be hit with large
enough data.

To solve this, we reintroduce chunking for userspace streams.
Users have control over the chunk size, which is neat because
they can improve the performance by setting the chunk size if
that turns out to be a bottleneck.

In an ideal world, we add an option so we can "ask" the stream whether
it "prefers" chunked writes, similar to how we have
php_stream_mmap_supported & friends. However, that cannot be done on
stable branches.

Closes GH-13136.
  • Loading branch information
nielsdos committed Jan 16, 2024
1 parent b33e3eb commit 5e9e9c9
Show file tree
Hide file tree
Showing 6 changed files with 80 additions and 8 deletions.
4 changes: 4 additions & 0 deletions NEWS
Expand Up @@ -30,6 +30,10 @@ PHP NEWS
. Fixed bug GH-13138 (Randomizer::pickArrayKeys() does not detect broken
engines). (timwolla)

- Streams:
. Fixed bug GH-13071 (Copying large files using mmap-able source streams may
exhaust available memory and fail). (nielsdos)

18 Jan 2024, PHP 8.2.15

- Core:
Expand Down
55 changes: 55 additions & 0 deletions ext/standard/tests/file/gh13136.phpt
@@ -0,0 +1,55 @@
--TEST--
GH-13071 (Copying large files using mmap-able source streams may exhaust available memory and fail)
--FILE--
<?php

class CustomStream {
public $context;
protected $file;
protected $seekable;
public static int $writes = 0;

public function stream_open($path, $mode, $options, &$opened_path) {
$path = $this->trim_path($path);
$this->file = fopen($path, $mode);
return true;
}

public function stream_close() {
fclose($this->file);
return true;
}

public function stream_write($data) {
self::$writes++;
return fwrite($this->file, $data);
}

public function url_stat($path, $flags) {
return false;
}

private function trim_path(string $path): string {
return substr($path, strlen("up://"));
}
}

file_put_contents(__DIR__ . "/gh13071.tmp", str_repeat("a", 1024 * 1024 * 8));

stream_wrapper_register("up", CustomStream::class, STREAM_IS_URL);

$old_limit = ini_get("memory_limit");
ini_set("memory_limit", memory_get_usage(true) + 5 * 1024 * 1024);
copy(__DIR__ . "/gh13071.tmp", "up://" . __DIR__ . "/gh13071.out.tmp");
ini_set("memory_limit", $old_limit);

echo "Done ", CustomStream::$writes, " writes\n";

?>
--CLEAN--
<?php
@unlink(__DIR__ . "/gh13071.tmp");
@unlink(__DIR__ . "/gh13071.out.tmp");
?>
--EXPECT--
Done 1024 writes
3 changes: 2 additions & 1 deletion ext/standard/tests/file/userstreams_006.phpt
Expand Up @@ -34,5 +34,6 @@ bool(true)
option: 3, 2, 50
int(-1)
int(8192)
size: 70
size: 42
size: 28
int(70)
1 change: 1 addition & 0 deletions ext/standard/tests/streams/set_file_buffer.phpt
Expand Up @@ -39,4 +39,5 @@ option: %d, %d, %d
int(%i)
int(%d)
size: %d
size: 28
int(%d)
16 changes: 10 additions & 6 deletions ext/standard/tests/streams/stream_set_chunk_size.phpt
Expand Up @@ -35,7 +35,7 @@ echo "should return previous chunk size (8192)\n";
var_dump(stream_set_chunk_size($f, 1));
echo "should be read without buffer (\$count == 10000)\n";
var_dump(strlen(fread($f, 10000)));
echo "should have no effect on writes\n";
echo "should elicit 3 writes\n";
var_dump(fwrite($f, str_repeat('b', 3)));

echo "should return previous chunk size (1)\n";
Expand All @@ -46,7 +46,7 @@ echo "should elicit one read of size 100 (chunk size)\n";
var_dump(strlen(fread($f, 50)));
echo "should elicit no read because there is sufficient cached data\n";
var_dump(strlen(fread($f, 50)));
echo "should have no effect on writes\n";
echo "should elicit 3 writes\n";
var_dump(strlen(fwrite($f, str_repeat('b', 250))));

echo "\nerror conditions\n";
Expand All @@ -68,8 +68,10 @@ int(8192)
should be read without buffer ($count == 10000)
read with size: 10000
int(10000)
should have no effect on writes
write with size: 3
should elicit 3 writes
write with size: 1
write with size: 1
write with size: 1
int(3)
should return previous chunk size (1)
int(1)
Expand All @@ -81,8 +83,10 @@ read with size: 100
int(50)
should elicit no read because there is sufficient cached data
int(50)
should have no effect on writes
write with size: 250
should elicit 3 writes
write with size: 100
write with size: 100
write with size: 50
int(3)

error conditions
Expand Down
9 changes: 8 additions & 1 deletion main/streams/streams.c
Expand Up @@ -1133,8 +1133,15 @@ static ssize_t _php_stream_write_buffer(php_stream *stream, const char *buf, siz
stream->ops->seek(stream, stream->position, SEEK_SET, &stream->position);
}

/* See GH-13071: userspace stream is subject to the memory limit. */
size_t chunk_size = count;
if (php_stream_is(stream, PHP_STREAM_IS_USERSPACE)) {
/* If the stream is unbuffered, we can only write one byte at a time. */
chunk_size = stream->chunk_size;
}

while (count > 0) {
ssize_t justwrote = stream->ops->write(stream, buf, count);
ssize_t justwrote = stream->ops->write(stream, buf, MIN(chunk_size, count));
if (justwrote <= 0) {
/* If we already successfully wrote some bytes and a write error occurred
* later, report the successfully written bytes. */
Expand Down

0 comments on commit 5e9e9c9

Please sign in to comment.