Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase parallelism impact on streaming downloads #105

Closed
onionjake opened this issue May 25, 2022 · 5 comments
Closed

Increase parallelism impact on streaming downloads #105

onionjake opened this issue May 25, 2022 · 5 comments
Assignees

Comments

@onionjake
Copy link

What

  • Experiment & benchmark doing downloads of large files with and without allocating memory for segments to buffer into when uplink outputs to stdout
  • Determine if the additional allocations result in significant improvements
  • Make these memory allocated buffers the default

Context

When using uplink cp with stdout (-) as the destination, uplink behaves differently than when it is writing to a file. Files support random access so uplink writes each segment directly to the appropriate spots in the file. When writing to stdout, however, the writes must be serialized and uplink does not make use of the greater parallelism to increase throughput. It will allow you to specify higher parallelism levels, but each segment establishes the connections, does the first Read, then pauses until the segment ahead of it finishes. This behavior was intended to reduce stalling by having the connections ready to go. By allocating buffers for each parallel segment they can make progress on downloading to the buffers.

Why

Customers are downloading and extracting large tar files by piping the output of uplink to tar (i.e. uplink cp <obj> - | tar x. They do this because they do not have enough disk space to download the complete tar file first then extract it. These customers want this way of using uplink to go as fast as possible.

Acceptance Criteria

  • uplink cp <obj> - throughput is improved when using higher levels of parallelism
  • OR during experimentation it was determined that there would not be significant improvement. Document in this issue the results that led to this conclusion
@kaloyan-raev
Copy link
Member

It would be best if uplink can calculate the memory buffer automatically based on the level of parallelism. So, if the output is set to stdout (-) and the parallelism to X then the memory buffer should be X * 64 MiB. Then we won't require users to calculate it on their own. They just need to be aware (through docs) that they need such an amount of free memory for this use case.

@mniewrzal mniewrzal self-assigned this Jun 1, 2022
@storjBuildBot
Copy link
Collaborator

Change https://review.dev.storj.io/c/storj/storj/+/7687 mentions this issue.

@mniewrzal
Copy link
Contributor

Initial results after using simple buffering for stdout write
uplink-no-buffering - binary built with current main branch
uplink-with-buffering - binary with simple buffering added to current main branch

10GiB file

storj@playground-dom:~$ time ./uplink-no-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 2 > /mnt/data/download/10gb_tmpfile

real	6m5.760s
user	7m59.248s
sys	2m29.045s

storj@playground-dom:~$ time ./uplink-no-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 10 > /mnt/data/download/10gb_tmpfile 
real	6m11.868s
user	8m0.175s
sys	2m27.395s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 10 > /mnt/data/download/10gb_tmpfile

real	1m8.031s
user	8m25.649s
sys	2m12.046s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 20 > /mnt/data/download/10gb_tmpfile

real	0m54.660s
user	8m30.846s
sys	2m24.523s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/10gb_tmpfile - --access $GRANT --parallelism 15 > /mnt/data/download/10gb_tmpfile

real	0m55.040s
user	8m25.048s
sys	2m26.972s

100GiB file

storj@playground-dom:~$ time ./uplink-no-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 2 > /mnt/data/download/100gb_tmpfile

real	61m26.403s
user	80m34.256s
sys	24m36.996s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 4 > /mnt/data/download/100gb_tmpfile
        
real	21m29.559s
user	77m31.274s
sys	19m23.959s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 16 > /mnt/data/download/100gb_tmpfile

real	6m52.906s
user	84m59.461s
sys	23m40.474s

storj@playground-dom:~$ time ./uplink-with-buffering cp sj://michal/experiment/100gb_tmpfile - --access $GRANT --parallelism 32 > /mnt/data/download/100gb_tmpfile

real	5m33.669s
user	79m32.768s
sys	20m58.538s

It looks that high parallelism is beneficial up to some point and later brings no visible speed up. Of course is can be still botleneck in used code. One way or another results are very promising.

@kaloyan-raev kaloyan-raev self-assigned this Jun 3, 2022
storjBuildBot pushed a commit to storj/storj that referenced this issue Jun 9, 2022
Current pipelining to stdout is synchronous so we don't have any
advantage from using --parallelism flag. This change adds buffer
while writing to stdout. Each part is first read into the buffer
and flushed only when all data was read from this part.

storj/uplink#105

Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
@kaloyan-raev
Copy link
Member

https://review.dev.storj.io/c/storj/storj/+/7687 has been reviewed and merged.

andriikotko pushed a commit to storj/storj that referenced this issue Jun 13, 2022
Current pipelining to stdout is synchronous so we don't have any
advantage from using --parallelism flag. This change adds buffer
while writing to stdout. Each part is first read into the buffer
and flushed only when all data was read from this part.

storj/uplink#105

Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
andriikotko pushed a commit to storj/storj that referenced this issue Jun 13, 2022
Current pipelining to stdout is synchronous so we don't have any
advantage from using --parallelism flag. This change adds buffer
while writing to stdout. Each part is first read into the buffer
and flushed only when all data was read from this part.

storj/uplink#105

Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
@mniewrzal
Copy link
Contributor

I'm closing this ticket. Next improvements should have separate tickets.

dlamarmorgan pushed a commit to storj/storj that referenced this issue Oct 18, 2022
Current pipelining to stdout is synchronous so we don't have any
advantage from using --parallelism flag. This change adds buffer
while writing to stdout. Each part is first read into the buffer
and flushed only when all data was read from this part.

storj/uplink#105

Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants