Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significantly improve file download speed by enabling mmap based… #119

Merged
merged 3 commits into from
Oct 8, 2019

Conversation

vfreex
Copy link
Contributor

@vfreex vfreex commented Apr 14, 2019

…am_copy_to_stream

Background

I run a Nextcloud server on a VM (1 CPU core, 1G RAM, 10Gbps virtio network, PHP 7.2.10) on a SSD storage. Directly reading a large file (let’s say 10GiB) on that server will get a read speed around 540-550 MiB/s. However the download speed with webdav is only about 240-260 MiB/s.

I tried replacing the stream_copy_to_stream function call in Sapi.php with fpassthru, getting a significantly performance boost (440-470 MiB/s). It seems to me that the performance bottleneck is stream_copy_to_stream.

Root Cause

I started investigating this problem by digging into source code of the PHP internal functions, then finally figured out what was going wrong.

stream_copy_to_stream tries to mmap the source file. If failed, fallback to the manual copy way:
https://github.com/php/php-src/blob/00d2b7a3d28d29383450fc2607c7d4fb91d57a4a/main/streams/streams.c#L1467.
But PHP refuses to mmap a file larger than 4MiB according to <https://github.com/php/php-src/blob/623911f993f39ebbe75abe2771fc89faf6b15b9b/main/streams/mmap.c#L34 >.

Solution (workaround?)

So, I started to modify the code: Instead of asking stream_copy_to_stream to copy the whole to-be-downloaded file, calling stream_copy_to_stream in a loop for each 4MiB chunk of the file.

After applying this PR, I got full download speed about just like reading that file locally (even faster than fpassthru).

I can’t wait to share my patch because I am really happy with the optimization! Please help me review and tell me if this is the right way.

Copy link
Member

@staabm staabm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx for the PR. This looks like a great investigated patch.
Maybe someone else can confirm this improves performance on other setups.

@vfreex
Copy link
Contributor Author

vfreex commented Apr 14, 2019

Ps. In case you don't have a fast storage (SSD), we can simulate by creating a sparse file on a slow disk. Linux is able to read the sparse file without actually I/O:

truncate -s 10G 10G.bin

lib/Sapi.php Outdated Show resolved Hide resolved
@staabm
Copy link
Member

staabm commented Apr 14, 2019

After re-reading the patch a few times, I wonder if this is something which should be optimized on the php-src level.. couldnt php internally use a proper sized buffer so mmap works no matter how big the file is passed in from userland @nikic ?

vfreex added a commit to vfreex/kube-nextcloud that referenced this pull request Apr 14, 2019
@codecov
Copy link

codecov bot commented Apr 14, 2019

Codecov Report

Merging #119 into master will increase coverage by 0.08%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #119      +/-   ##
============================================
+ Coverage     93.52%   93.61%   +0.08%     
- Complexity      251      257       +6     
============================================
  Files            15       15              
  Lines           819      830      +11     
============================================
+ Hits            766      777      +11     
  Misses           53       53
Impacted Files Coverage Δ Complexity Δ
lib/Sapi.php 96.19% <100%> (+0.44%) 40 <0> (+6) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c983307...51489e3. Read the comment docs.

@vfreex
Copy link
Contributor Author

vfreex commented Apr 14, 2019

@staabm Thanks for the review.

Doing the loop is php-src level is a possible way, but I think the challenge is that there would be too many mmap/unmap cycles if the chunk is too small. Maybe we can open a discussion there.

@staabm
Copy link
Member

staabm commented Apr 15, 2019

dont get me wrong.

I would like to have your patch in sabre-io/http, but I am only suggesting to also double check the c-level because I guess this is a problem which might can be fixed in php-src and therefore improve performance in a lot more libraries/use-cases then ours.

no matter how/when/why it will be fixed in php-src, we should land this fix here (in case it doesn't regress other things).

@vfreex
Copy link
Contributor Author

vfreex commented Apr 15, 2019

@staabm Thanks, I got your point.

I am also revisiting PHP core code and looking at if something can be done there. After investigation I will open a bug/pull-request/RFC to PHP core to discuss it.

@vfreex
Copy link
Contributor Author

vfreex commented Apr 15, 2019

For partial content response, additionally make start position for stream copy become a multiple of the page size.

Things become more complex, we should definitely look to if we can optimize PHP C function side.

@evert
Copy link
Member

evert commented Apr 15, 2019

Is it worth writing a couple of unittests for this? There's a few more branches in the new code. Are they all being hit?

@vfreex
Copy link
Contributor Author

vfreex commented Apr 18, 2019

I will take a look at the unit test part soon.

$chunk_size = 4 * 1024 * 1024;
stream_set_chunk_size($output, $chunk_size);
// If this is a partial response, flush the beginning bytes until the first position that is a multiple of the page size.
$contentRange = $response->getHeader('Content-Range');
Copy link
Member

@staabm staabm Apr 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how common are Range-Requests here? is it sufficient for your reported use case to use this hack in the simpler form your submitted initially?

maybe something along the lines will also be good enough:

pseudo code:

if (range-request) {
  // old way, which doesnt support mmap
} else {
  // your optimized version, without range request support
}

Copy link
Contributor Author

@vfreex vfreex Apr 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not common for downloading a file, but for streaming a video. Regarding to the performance, I found it is also beneficial to downloading a file on HDD: from 70-80 MB/s to 110-120 MB. But I think it might be because of a larger chunk size (4 MiB) is used in this PR, while manually copy implementation in stream_copy_to_stream only uses a 8KiB chunk (https://github.com/php/php-src/blob/00d2b7a3d28d29383450fc2607c7d4fb91d57a4a/main/streams/streams.c#L1469 and https://github.com/php/php-src/blob/00d2b7a3d28d29383450fc2607c7d4fb91d57a4a/main/streams/php_streams_int.h#L49).

I personally prefer to use this "hack" for both range and non-range response and add unit test to cover this.

@vfreex vfreex force-pushed the optimize-download-speed branch 2 times, most recently from 0792109 to a780756 Compare April 19, 2019 12:42
…am_copy_to_stream

I run a Nextcloud server on a VM (1 CPU core, 1G RAM, 10Gbps virtio network, PHP 7.2.10) on a SSD storage. Directly reading a large file (let’s say 10GiB) on that server will get a read speed around 540-550 MiB/s. However the download speed with webdav is only about 240-260 MiB/s.

I tried replacing the `stream_copy_to_stream` function call in Sapi.php with `fpassthru`, getting a significantly performance boost (440-470 MiB/s). It seems to me that the performance bottleneck is `stream_copy_to_stream`.

I started investigating this problem by digging into source code of the PHP internal functions, then finally figured out what was going wrong.

`stream_copy_to_stream` tries to mmap the source file. If failed, fallback to the manual copy way:
<https://github.com/php/php-src/blob/00d2b7a3d28d29383450fc2607c7d4fb91d57a4a/main/streams/streams.c#L1467>.
But PHP refuses to mmap a file larger than 4MiB according to <https://github.com/php/php-src/blob/623911f993f39ebbe75abe2771fc89faf6b15b9b/main/streams/mmap.c#L34 >.

So, I get a workaround: Instead of asking `stream_copy_to_stream` to copy the whole to-be-downloaded file, calling `stream_copy_to_stream` in a loop for each 4MiB chunk of the file.

After applying this PR, I got full download speed about just like reading that file locally (even faster than fpassthru). Download speed from HDD also increased from 70-80 MB/s to 110-120 MB/s in my local setup.

I can’t wait to share my patch because I am really happy with the optimization! Please help me review and tell me if this is the right way.
@staabm
Copy link
Member

staabm commented Apr 23, 2019

side-note: I reported the initial problem upstream on php-src so we can get the discussion going also on this level

https://bugs.php.net/bug.php?id=77930

@vfreex
Copy link
Contributor Author

vfreex commented Oct 8, 2019

@staabm Hi, I haven't been following up this issue for a while. This PR is definitely a workaround until php-src can eventually fix this. Can this PR be merged? Anything I can help with?

@staabm
Copy link
Member

staabm commented Oct 8, 2019

Hey @vfreex .

I totally forgotten this PR. will merge now after a green build.

@staabm staabm changed the title Significantly improve file download speed by enabling mmap based stre… Significantly improve file download speed by enabling mmap based… Oct 8, 2019
@staabm staabm merged commit a4c7a16 into sabre-io:master Oct 8, 2019
@staabm
Copy link
Member

staabm commented Oct 8, 2019

will be in 5.0.3 .. #129

@vfreex
Copy link
Contributor Author

vfreex commented Oct 8, 2019

@staabm Thanks so much!

@vfreex vfreex deleted the optimize-download-speed branch October 8, 2019 11:35
@staabm
Copy link
Member

staabm commented Oct 30, 2019

@vfreex do you have some time to provide some in-detail feedback for the upstream php-src bug report?

https://bugs.php.net/bug.php?id=77930

@divinity76
Copy link

divinity76 commented Nov 22, 2019

in any case, Nginx/apache would do a significantly better job than PHP anyway, nextcloud (or anything, really) should use X-accel-redirect / X-sendfile for large files, not PHP

@staabm
Copy link
Member

staabm commented Nov 26, 2019

@ho4ho could you investigate the infinite loop?

@vfreex
Copy link
Contributor Author

vfreex commented Nov 26, 2019

@vfreex do you have some time to provide some in-detail feedback for the upstream php-src bug report?

https://bugs.php.net/bug.php?id=77930

Sorry I missed the message because I have so many GitHub notifications. I will try to follow up within a few days.

in any case, Nginx/apache would do a significantly better job than PHP anyway, nextcloud (or anything, really) should use X-accel-redirect / X-sendfile for large files, not PHP

Use nginx/apache to send files is nice to have. I think it is better to have that option in separated PRs (require a collaboration both on sabre and Nextcloud projects).

By patch might cause problems/loop: owncloud/core#36470, owncloud/files_primary_s3#270, owncloud/files_primary_s3#274 (related by #133?)

I'll also take a look and see if the infinity loop is related to this PR (or #133).

@phil-davis
Copy link
Contributor

I'll also take a look and see if the infinity loop is related to this PR (or #133).

@vfreex yes, there might be an edge case somewhere in this stuff.
I actually do not really know if there is an infinite CPU loop, or some other thing that results in the effect of "hanging". I haven't run the related test scenario(s) locally - only on drone in the cloud. So I do not have real monitoring feedback about what is actually happening in the PHP server and the backend S3 storage that it is talking to.

@vfreex
Copy link
Contributor Author

vfreex commented Nov 26, 2019

@phil-davis I created #137 to break the copy stream loop on failure, although I am not sure quite sure if it is the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants