Skip to content

perf/storage: start streaming zip/index uploads, parallel directory upload#3287

Merged
GuillaumeGomez merged 2 commits intorust-lang:mainfrom
syphar:stream-upload
Apr 8, 2026
Merged

perf/storage: start streaming zip/index uploads, parallel directory upload#3287
GuillaumeGomez merged 2 commits intorust-lang:mainfrom
syphar:stream-upload

Conversation

@syphar
Copy link
Copy Markdown
Member

@syphar syphar commented Apr 8, 2026

I wanted to do streaming uploads for quite some time, and hoped it could be as cool as the streaming downloads.

But sadly no :) all S3 APIs need a known content-length before you start the upload. This makes it hard for example to have a local buf and just compress it while uploading.

Only way around that is multipart uploads, but these are comprex, and a part has ( I think) a minimum size of 5 MiB anyways, which we would have to buffer. Since most files are smaller than that, we can also just buffer the whole compressed file and then upload it.

Where the stream works well is when you have a local file. I updated our zip&index method to use tempfiles and stream these to S3.

With the new API it was also easy to optimize store_all (upload all files from a directory) to compress & upload directly, and in parallel, where before we first loaded all files into memory, compress then, and then upload them.

Let's see if new places come up

@syphar syphar self-assigned this Apr 8, 2026
@github-actions github-actions bot added the S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed label Apr 8, 2026
@syphar syphar changed the title storage: start streaming zip/index uploads, parallel directory upload perf/storage: start streaming zip/index uploads, parallel directory upload Apr 8, 2026
@syphar syphar marked this pull request as ready for review April 8, 2026 13:54
@syphar syphar requested a review from a team as a code owner April 8, 2026 13:54
Comment thread crates/bin/docs_rs_admin/src/main.rs Outdated
@syphar syphar requested a review from GuillaumeGomez April 8, 2026 14:13
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Apr 8, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@GuillaumeGomez
Copy link
Copy Markdown
Member

Thanks!

@GuillaumeGomez GuillaumeGomez enabled auto-merge (rebase) April 8, 2026 15:22
@GuillaumeGomez GuillaumeGomez merged commit 319dafe into rust-lang:main Apr 8, 2026
11 checks passed
@syphar syphar deleted the stream-upload branch April 8, 2026 15:23
@github-actions github-actions bot added S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it and removed S-waiting-on-review Status: This pull request has been implemented and needs to be reviewed labels Apr 8, 2026
@syphar syphar mentioned this pull request Apr 8, 2026
@syphar syphar removed the S-waiting-on-deploy This PR is ready to be merged, but is waiting for an admin to have time to deploy it label Apr 9, 2026
@Mark-Simulacrum
Copy link
Copy Markdown
Member

You may want to talk to @Kobzol, he's played around with various S3 upload strategies in rustc-perf recently. I think we settled on https://docs.rs/object_store/ for it. That seems like it has some support for multi-part uploads, though I don't know if we're using them in rustc-perf.

Their docs do seem to confirm some of the multi-part limitations you're suggesting:

Most stores require that all parts excluding the last are at least 5 MiB, and some further require that all parts excluding the last be the same size, e.g. R2. Clients wanting to maximise compatibility should therefore perform writes in fixed size blocks larger than 5 MiB.

https://docs.rs/object_store/latest/object_store/trait.MultipartUpload.html#tymethod.put_part

@Kobzol
Copy link
Copy Markdown
Member

Kobzol commented Apr 9, 2026

In rustc-perf we pre-compress the files before uploading them to S3, so no direct streaming is involved there (though I try to hide latency by compressing multiple files concurrently on a background tokio blocking worker thread pool). The object_store crate seems to work fine for the uploads, and it doesn't have so many dependencies as the official AWS crates. We are not using multi-part uploads in rustc-perf at the moment, or at least not explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants