Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying large files using mmap-able source streams may exhaust available memory and fail #13071

Closed
alexcriss opened this issue Jan 4, 2024 · 2 comments

Comments

@alexcriss
Copy link

Description

The following script:

  • Registers a simple stream for the up protocol which does fwrite under the hood.
  • Runs with 256 MB of memory available.
  • Performs a copy operation of a source file into a destination file using the defined stream.
<?php

class UP {
	/**
	 * Default protocol
	 */
	const DEFAULT_PROTOCOL = 'up';
	const ALLOWED_MODES = [ 'r', 'w', 'a', 'x' ];

	public $context;
	protected $mode;
	protected $file;
	protected $path;
	protected $uri;

	
	protected $seekable;
	private $protocol;

	private $localpath;

	/**
	 * constructor.
	 *
	 */
	public function __construct() {
		$this->protocol = self::DEFAULT_PROTOCOL;
	}
	
	/**
	 *  Register the Stream.
	 *
	 * Will unregister stream first if it's already registered
	 *
	 * @since   1.0.0
	 * @access  public
	 *
	 * @return  bool    true if success, false if failure
	 */
	public function register() {
		$this->localpath = getcwd();
		return stream_wrapper_register( $this->protocol, get_called_class(), STREAM_IS_URL );
	}

	public function stream_open( $path, $mode, $options, &$opened_path ) {
		$path = sprintf( "%s%s", $this->localpath, $this->trim_path( $path ) );
		$mode = rtrim( $mode, 'bt+' );

		$file = fopen( $path, $mode );
		$this->file = $file;
		$this->path = $path;
		$this->mode = $mode;
		return true;
	}

	public function stream_close() {
		if ( ! $this->file ) {
			return true;
		}

		fclose( $this->file );
		$this->file = null;
		$this->path = null;
		$this->mode = null;
		return true;
	}

	public function stream_eof() {
		return feof( $this->file );
	}

	public function stream_read( $count ) {
		$string = fread( $this->file, $count );
		if ( false === $string ) {
			return '';
		}

		return $string;
	}

	public function stream_flush() {
		return fflush( $this->file );
	}

	public function stream_seek( $offset, $whence ) {
		if ( ! $this->seekable ) {
			return false;
		}
		
		$result = fseek( $this->file, $offset, $whence );
		return true;
	}

	public function stream_write( $data ) {
		if ( 'r' === $this->mode ) {
			return false;
		}

		$length = fwrite( $this->file, $data );

		if ( false === $length ) {
			return false;
		}

		return $length;
	}

	public function unlink( $path ) {
		return true;
	}

	public function stream_stat() {
		return fstat( $this->file );
	}

	public function url_stat( $path, $flags ) {
		$path = $this->trim_path( $path );
		$path = sprintf( "%s%s", $this->localpath, $this->trim_path( $path ) );
		$file = @fopen( $path, "r" );
		if ( ! $file ) {
			return false;
		}
		return fstat( $file );
		fclose( $file );
	}

	public function stream_tell() {
		return $this->file ? ftell( $this->file ) : false;
	}

	public function rename( $path_from, $path_to ) {
		return false;
	}

	public function mkdir( $path, $mode, $options ) {
		$path = $this->trim_path( $path );
		$path = sprintf( "%s%s", $this->localpath, $this->trim_path( $path ) );
		return mkdir( $path, 0777, true );
	}

	
	public function stream_metadata( $path, $option, $value ) {
		$path = sprintf( "%s%s", $this->localpath, $this->trim_path( $path ) );

		switch ( $option ) {
			case STREAM_META_TOUCH:
				if ( false === file_exists( $path ) ) {
					$file = fopen( $path, 'w' );
					if ( is_resource( $file ) ) {
						$result = fflush( $file );
						return fclose( $file ) && $result;
					}

					return false;
				}

				return true;

			default:
				return false;
		}
	}

	public function stream_cast( $cast_as ) {
		if ( ! is_null( $this->file ) ) {
			return $this->file;
		}

		return false;
	}


	protected function string_to_resource( $data, $mode ) {
		// Create a temporary file
		$tmp_handler = tmpfile();

		switch ( $mode ) {
			case 'a':
				// Make sure pointer is at end of file for appends
				fseek( $tmp_handler, 0, SEEK_END );
				break;
			default:
				// Need to rewind file pointer as fwrite moves it to EOF
				rewind( $tmp_handler );
		}

		return $tmp_handler;
	}

	protected function trim_path( $path ) {
		return ltrim( $path, 'up:/\\' );
	}
}

ini_set( 'memory_limit', 256 * 1024 * 102 );
$up = new UP();
$up->register();

copy( $argv[1], sprintf( "up://%s", $argv[2] ) );
<?php

Ideally, this stream wrapper should be able to copy any file, but if the file is large enough it dies because memory gets exhausted.

Example to reproduce

dd if=/dev/zero of=big bs=1M count=300
php up.php big big-copy

Resulted in this output:

Fatal error: Allowed memory size of 26738688 bytes exhausted (tried to allocate 314572832 bytes) in /Users/alessandro/Ongoing/php-uploads/up.php on line 199

While no output and a successful copy was expected.

This seems to happen after 5cbe5a5, where chunking was disabled for streams.

copy passes a buffer to _php_stream_write_buffer that is as large as PHP_STREAM_MMAP_MAX, defined at 512MB. This means that files smaller than 512MB require at least their size in terms of memory to be copied, and files larger than 512MB requires at least 512MB of memory available.

This has been tested with PHP 8.0, 8.1, and now lastly with

PHP 8.3.1 (cli) (built: Dec 20 2023 12:44:38) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.3.1, Copyright (c) Zend Technologies
    with Zend OPcache v8.3.1, Copyright (c), by Zend Technologies

PHP Version

PHP 8.3.1

Operating System

No response

@nielsdos
Copy link
Member

nielsdos commented Jan 6, 2024

This clearly is a regression as it used to work in previous versions, but broke since the commit you linked.

The obvious solution is to use chunking with userspace stream writes, that would look something like this: https://gist.github.com/nielsdos/d4cc1fbc3db1abe88f517919b2ad61e8
And users have control over the chunk size too which is neat because they can improve the performance by setting the chunk size if that turns out to be a bottleneck.
In the PoC patch I use the php_stream_is function, but I would actually prefer if we could "ask" the stream whether it "prefers" chunks for writing. E.g. an option implemented similarly to php_stream_mmap_supported & friends.

Out of curiosity I tried an alternative fix by fixing this at the stream_copy_to_stream_ex side, in the mmap code.
That would look like this: https://gist.github.com/nielsdos/53e0a8dee060c12854cf6fd706f2e4f0
However, when comparing copy performance, this solution is slower. A way to improve this patch would be to still map a large range but do the copy call in a loop.
This solution is not really nice because it creates an inconsistency between calls like fwrite and stream_copy_to_stream, which would be weird.

So all in all, I prefer the first patch but wrapped in a nicer API.
@bukka What do you think?

@bukka
Copy link
Member

bukka commented Jan 12, 2024

I have had a proper look and agree that the first solution is better - fwrite and stream_copy_to_stream should be consistent. I think write chunking makes sense for user wrappers as there's extra buffering in callbacks (variables) which is what causes issue here. And as you say chunking can be controlled by users so if anyone wants to increase it, they can (although they need to open stream first so not that useful for copy but works for stream_copy_to_stream).

Thinking about it, it might be also interesting to see how those big writes work for other streams like TLS and if it results in bigger memory usage there as well (this won't be PHP allocated so users might not see it but might increase the process / system memory usage). Might not be an issue but would be probably good to do some testing so I will add this to my TODO list.

nielsdos added a commit to nielsdos/php-src that referenced this issue Jan 12, 2024
…ay exhaust available memory and fail

Commit 5cbe5a5 disabled chunking for all writes to streams. However,
user streams have a callback where code is executed on data that is
subject to the memory limit. Therefore, when using large writes or
stream_copy_to_stream/copy the memory limit can easily be hit with large
enough data.

To solve this, we reintroduce chunking for userspace streams.
Users have control over the chunk size, which is neat because
they can improve the performance by setting the chunk size if
that turns out to be a bottleneck.

In an ideal world, we add an option so we can "ask" the stream whether
it "prefers" chunked writes, similar to how we have
php_stream_mmap_supported & friends. However, that cannot be done on
stable branches.
nielsdos added a commit to nielsdos/php-src that referenced this issue Jan 12, 2024
…ay exhaust available memory and fail

Commit 5cbe5a5 disabled chunking for all writes to streams. However,
user streams have a callback where code is executed on data that is
subject to the memory limit. Therefore, when using large writes or
stream_copy_to_stream/copy the memory limit can easily be hit with large
enough data.

To solve this, we reintroduce chunking for userspace streams.
Users have control over the chunk size, which is neat because
they can improve the performance by setting the chunk size if
that turns out to be a bottleneck.

In an ideal world, we add an option so we can "ask" the stream whether
it "prefers" chunked writes, similar to how we have
php_stream_mmap_supported & friends. However, that cannot be done on
stable branches.
nielsdos added a commit that referenced this issue Jan 16, 2024
* PHP-8.2:
  Fix GH-13071: Copying large files using mmap-able source streams may exhaust available memory and fail
nielsdos added a commit that referenced this issue Jan 16, 2024
* PHP-8.3:
  Fix GH-13071: Copying large files using mmap-able source streams may exhaust available memory and fail
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants