-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase parallelism impact on streaming downloads #105
Comments
It would be best if uplink can calculate the memory buffer automatically based on the level of parallelism. So, if the output is set to stdout ( |
Change https://review.dev.storj.io/c/storj/storj/+/7687 mentions this issue. |
Initial results after using simple buffering for stdout write 10GiB file
100GiB file
It looks that high parallelism is beneficial up to some point and later brings no visible speed up. Of course is can be still botleneck in used code. One way or another results are very promising. |
Current pipelining to stdout is synchronous so we don't have any advantage from using --parallelism flag. This change adds buffer while writing to stdout. Each part is first read into the buffer and flushed only when all data was read from this part. storj/uplink#105 Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
https://review.dev.storj.io/c/storj/storj/+/7687 has been reviewed and merged. |
Current pipelining to stdout is synchronous so we don't have any advantage from using --parallelism flag. This change adds buffer while writing to stdout. Each part is first read into the buffer and flushed only when all data was read from this part. storj/uplink#105 Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
Current pipelining to stdout is synchronous so we don't have any advantage from using --parallelism flag. This change adds buffer while writing to stdout. Each part is first read into the buffer and flushed only when all data was read from this part. storj/uplink#105 Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
I'm closing this ticket. Next improvements should have separate tickets. |
Current pipelining to stdout is synchronous so we don't have any advantage from using --parallelism flag. This change adds buffer while writing to stdout. Each part is first read into the buffer and flushed only when all data was read from this part. storj/uplink#105 Change-Id: I07bec0f4864dc4fccb42224e450d85d4d196f2ee
What
uplink
outputs to stdoutContext
When using
uplink cp
with stdout (-
) as the destination,uplink
behaves differently than when it is writing to a file. Files support random access souplink
writes each segment directly to the appropriate spots in the file. When writing to stdout, however, the writes must be serialized anduplink
does not make use of the greater parallelism to increase throughput. It will allow you to specify higher parallelism levels, but each segment establishes the connections, does the firstRead
, then pauses until the segment ahead of it finishes. This behavior was intended to reduce stalling by having the connections ready to go. By allocating buffers for each parallel segment they can make progress on downloading to the buffers.Why
Customers are downloading and extracting large
tar
files by piping the output ofuplink
totar
(i.e.uplink cp <obj> - | tar x
. They do this because they do not have enough disk space to download the completetar
file first then extract it. These customers want this way of using uplink to go as fast as possible.Acceptance Criteria
uplink cp <obj> -
throughput is improved when using higher levels of parallelismThe text was updated successfully, but these errors were encountered: