Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blob/s3: High memory usage during multi-part uploads #2807

Closed
richardartoul opened this issue Jun 1, 2020 · 5 comments · Fixed by #3248
Closed

blob/s3: High memory usage during multi-part uploads #2807

richardartoul opened this issue Jun 1, 2020 · 5 comments · Fixed by #3248

Comments

@richardartoul
Copy link
Contributor

Describe the bug

High memory usage when doing many parallel multi-part uploads.

To Reproduce

Spin up a bunch of goroutines doing muli-part uploads with a large part size.

Expected behavior

Low memory usage.

Version

Latest

Additional context

The issue seems to be caused by the fact that the S3 implementation creates a new multi-part uploader for each writer: https://github.com/google/go-cloud/blob/master/blob/s3blob/s3blob.go#L626

The multipart uploader maintains a pool of byte slices and as a result when there are a lot of parallel uploads you end up with a pool of parts sitting in memory for each upload until the G.C has time to clean them up.

I think this issue could be solved by caching uploaders based on the uploader options and then reusing them (since they're already thread-safe). I can make a P.R for this change if its acceptable.

@vangent
Copy link
Contributor

vangent commented Jun 1, 2020

until the G.C has time to clean them up

Is this a real problem? The GC will clean them up when there's a need for them.

If you're doing a bunch of parallel uploads, and make the change to re-use the same Uploader, it seems like it will either:

  • Expand its pool of byte slices to be the superset of all the uploads; this will likely use the same amount of memory as currently.

OR

  • Have contention for the byte slices, slowing down uploads.

Maybe you could experiment with your proposed change to see if it makes a difference.

@vangent
Copy link
Contributor

vangent commented Sep 30, 2020

I saw the PR. Can you answer the questions above? It's adding a fair amount of complexity and I want convincing that it is necessary.

@richardartoul
Copy link
Contributor Author

@vangent yeah thats why I left in draft mode. I'm working on testing its impact soon and I'll post results when I have them here (its a bit difficult for me to upgrade this dependency in our monorepo so it'll take me a bit of time)

@segevfiner
Copy link
Contributor

Actually, it might be possible to use the same uploader but override options per upload by virtue of the Upload methods taking func(*Uploader) that allows for it.

@pgavlin
Copy link

pgavlin commented May 5, 2023

I do think that the allocation volume is a problem, especially at scale. I don't think that reusing Uploaders is the solution, however. The buffer pool is sized for the number of concurrent uploads for that uploader, and I don't think that it will accept additional slices in its pool beyond its capacity.

Instead, I think that the solution is to hand Upload input that may conform to the readerAtSeeker interface, which the uploader checks for. If its input is a readerAtSeeker, UploadWithContext will not allocate any temporary buffers.

I've opened #3245 to track an API-level solution here.

pgavlin added a commit to pgavlin/go-cloud that referenced this issue May 8, 2023
Add Upload and Download methods to blob.Bucket and optional
Uploader and Downloader interfaces to driver. If a driver.Bucket
implements those interfaces, Upload and Download will call through those
interfaces. This allows backends that can implement pull-style uploads
and push-style downloads more efficiently than push-style uploads and
pull-style downloads to do so.

These changes include an implementation of driver.Uploader for
s3blob.Bucket. That implementation allows s3blob.Bucket to avoid
allocating temporary buffers if the input is an io.ReaderAt and an
io.Seeker. This can dramatically reduce allocation volume for services
that upload large amounts of data to S3.

Fixes google#3245
Fixes google#2807
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment