blob/s3: High memory usage during multi-part uploads #2807

richardartoul · 2020-06-01T19:15:53Z

Describe the bug

High memory usage when doing many parallel multi-part uploads.

To Reproduce

Spin up a bunch of goroutines doing muli-part uploads with a large part size.

Expected behavior

Low memory usage.

Version

Latest

Additional context

The issue seems to be caused by the fact that the S3 implementation creates a new multi-part uploader for each writer: https://github.com/google/go-cloud/blob/master/blob/s3blob/s3blob.go#L626

The multipart uploader maintains a pool of byte slices and as a result when there are a lot of parallel uploads you end up with a pool of parts sitting in memory for each upload until the G.C has time to clean them up.

I think this issue could be solved by caching uploaders based on the uploader options and then reusing them (since they're already thread-safe). I can make a P.R for this change if its acceptable.

vangent · 2020-06-01T20:41:48Z

until the G.C has time to clean them up

Is this a real problem? The GC will clean them up when there's a need for them.

If you're doing a bunch of parallel uploads, and make the change to re-use the same Uploader, it seems like it will either:

Expand its pool of byte slices to be the superset of all the uploads; this will likely use the same amount of memory as currently.

OR

Have contention for the byte slices, slowing down uploads.

Maybe you could experiment with your proposed change to see if it makes a difference.

vangent · 2020-09-30T16:49:30Z

I saw the PR. Can you answer the questions above? It's adding a fair amount of complexity and I want convincing that it is necessary.

richardartoul · 2020-09-30T16:55:21Z

@vangent yeah thats why I left in draft mode. I'm working on testing its impact soon and I'll post results when I have them here (its a bit difficult for me to upgrade this dependency in our monorepo so it'll take me a bit of time)

segevfiner · 2020-11-18T10:12:32Z

Actually, it might be possible to use the same uploader but override options per upload by virtue of the Upload methods taking func(*Uploader) that allows for it.

pgavlin · 2023-05-05T17:59:14Z

I do think that the allocation volume is a problem, especially at scale. I don't think that reusing Uploaders is the solution, however. The buffer pool is sized for the number of concurrent uploads for that uploader, and I don't think that it will accept additional slices in its pool beyond its capacity.

Instead, I think that the solution is to hand Upload input that may conform to the readerAtSeeker interface, which the uploader checks for. If its input is a readerAtSeeker, UploadWithContext will not allocate any temporary buffers.

I've opened #3245 to track an API-level solution here.

Add Upload and Download methods to blob.Bucket and optional Uploader and Downloader interfaces to driver. If a driver.Bucket implements those interfaces, Upload and Download will call through those interfaces. This allows backends that can implement pull-style uploads and push-style downloads more efficiently than push-style uploads and pull-style downloads to do so. These changes include an implementation of driver.Uploader for s3blob.Bucket. That implementation allows s3blob.Bucket to avoid allocating temporary buffers if the input is an io.ReaderAt and an io.Seeker. This can dramatically reduce allocation volume for services that upload large amounts of data to S3. Fixes google#3245 Fixes google#2807

richardartoul mentioned this issue Sep 28, 2020

blob/s3blob: Cache uploaders between calls to NewTypedWriter #2863

Closed

pgavlin mentioned this issue May 5, 2023

blob: allow driver.Bucket to upload data by pulling from an io.Reader instead of pushing to an io.Writer #3245

Closed

pgavlin mentioned this issue May 8, 2023

blob: add Bucket.{Upload,Download} APIs #3247

Closed

vangent mentioned this issue May 9, 2023

blob: Add Upload and Download methods that may be more efficient for some drivers #3248

Merged

vangent closed this as completed in #3248 Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blob/s3: High memory usage during multi-part uploads #2807

blob/s3: High memory usage during multi-part uploads #2807

richardartoul commented Jun 1, 2020

vangent commented Jun 1, 2020

vangent commented Sep 30, 2020

richardartoul commented Sep 30, 2020

segevfiner commented Nov 18, 2020

pgavlin commented May 5, 2023

blob/s3: High memory usage during multi-part uploads #2807

blob/s3: High memory usage during multi-part uploads #2807

Comments

richardartoul commented Jun 1, 2020

Describe the bug

To Reproduce

Expected behavior

Version

Additional context

vangent commented Jun 1, 2020

vangent commented Sep 30, 2020

richardartoul commented Sep 30, 2020

segevfiner commented Nov 18, 2020

pgavlin commented May 5, 2023