Increase parallelism impact on streaming downloads #105

onionjake · 2022-05-25T06:25:47Z

What

Experiment & benchmark doing downloads of large files with and without allocating memory for segments to buffer into when uplink outputs to stdout
Determine if the additional allocations result in significant improvements
Make these memory allocated buffers the default

Context

When using uplink cp with stdout (-) as the destination, uplink behaves differently than when it is writing to a file. Files support random access so uplink writes each segment directly to the appropriate spots in the file. When writing to stdout, however, the writes must be serialized and uplink does not make use of the greater parallelism to increase throughput. It will allow you to specify higher parallelism levels, but each segment establishes the connections, does the first Read, then pauses until the segment ahead of it finishes. This behavior was intended to reduce stalling by having the connections ready to go. By allocating buffers for each parallel segment they can make progress on downloading to the buffers.

Why

Customers are downloading and extracting large tar files by piping the output of uplink to tar (i.e. uplink cp <obj> - | tar x. They do this because they do not have enough disk space to download the complete tar file first then extract it. These customers want this way of using uplink to go as fast as possible.

Acceptance Criteria

uplink cp <obj> - throughput is improved when using higher levels of parallelism
OR during experimentation it was determined that there would not be significant improvement. Document in this issue the results that led to this conclusion

The text was updated successfully, but these errors were encountered:

kaloyan-raev · 2022-05-31T09:03:19Z

It would be best if uplink can calculate the memory buffer automatically based on the level of parallelism. So, if the output is set to stdout (-) and the parallelism to X then the memory buffer should be X * 64 MiB. Then we won't require users to calculate it on their own. They just need to be aware (through docs) that they need such an amount of free memory for this use case.

storjBuildBot · 2022-06-01T08:49:47Z

Change https://review.dev.storj.io/c/storj/storj/+/7687 mentions this issue.

mniewrzal · 2022-06-01T11:39:15Z

Initial results after using simple buffering for stdout write
uplink-no-buffering - binary built with current main branch
uplink-with-buffering - binary with simple buffering added to current main branch

10GiB file

storj@playground-dom:~$ time ./uplink-no-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 2 > /mnt/data/download/10gb_tmpfile

real	6m5.760s
user	7m59.248s
sys	2m29.045s

storj@playground-dom:~$ time ./uplink-no-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 10 > /mnt/data/download/10gb_tmpfile 
real	6m11.868s
user	8m0.175s
sys	2m27.395s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 10 > /mnt/data/download/10gb_tmpfile

real	1m8.031s
user	8m25.649s
sys	2m12.046s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 20 > /mnt/data/download/10gb_tmpfile

real	0m54.660s
user	8m30.846s
sys	2m24.523s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 15 > /mnt/data/download/10gb_tmpfile

real	0m55.040s
user	8m25.048s
sys	2m26.972s

100GiB file

storj@playground-dom:~$ time ./uplink-no-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 2 > /mnt/data/download/100gb_tmpfile

real	61m26.403s
user	80m34.256s
sys	24m36.996s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 4 > /mnt/data/download/100gb_tmpfile
        
real	21m29.559s
user	77m31.274s
sys	19m23.959s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 16 > /mnt/data/download/100gb_tmpfile

real	6m52.906s
user	84m59.461s
sys	23m40.474s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 32 > /mnt/data/download/100gb_tmpfile

real	5m33.669s
user	79m32.768s
sys	20m58.538s

It looks that high parallelism is beneficial up to some point and later brings no visible speed up. Of course is can be still botleneck in used code. One way or another results are very promising.

Current pipelining to stdout is synchronous so we don't have any advantage from using --parallelism flag. This change adds buffer while writing to stdout. Each part is first read into the buffer and flushed only when all data was read from this part. storj/uplink#105 Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee

kaloyan-raev · 2022-06-09T16:29:07Z

https://review.dev.storj.io/c/storj/storj/+/7687 has been reviewed and merged.

Current pipelining to stdout is synchronous so we don't have any advantage from using --parallelism flag. This change adds buffer while writing to stdout. Each part is first read into the buffer and flushed only when all data was read from this part. storj/uplink#105 Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee

mniewrzal · 2022-07-20T09:52:49Z

I'm closing this ticket. Next improvements should have separate tickets.

Current pipelining to stdout is synchronous so we don't have any advantage from using --parallelism flag. This change adds buffer while writing to stdout. Each part is first read into the buffer and flushed only when all data was read from this part. storj/uplink#105 Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee

mniewrzal added Needs Estimation and removed Needs Estimation labels May 25, 2022

mniewrzal self-assigned this Jun 1, 2022

kaloyan-raev self-assigned this Jun 3, 2022

mniewrzal closed this as completed Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase parallelism impact on streaming downloads #105

Increase parallelism impact on streaming downloads #105

onionjake commented May 25, 2022

kaloyan-raev commented May 31, 2022

storjBuildBot commented Jun 1, 2022

mniewrzal commented Jun 1, 2022

kaloyan-raev commented Jun 9, 2022

mniewrzal commented Jul 20, 2022

Increase parallelism impact on streaming downloads #105

Increase parallelism impact on streaming downloads #105

Comments

onionjake commented May 25, 2022

What

Context

Why

Acceptance Criteria

kaloyan-raev commented May 31, 2022

storjBuildBot commented Jun 1, 2022

mniewrzal commented Jun 1, 2022

kaloyan-raev commented Jun 9, 2022

mniewrzal commented Jul 20, 2022