-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blob/s3: High memory usage during multi-part uploads #2807
Comments
Is this a real problem? The GC will clean them up when there's a need for them. If you're doing a bunch of parallel uploads, and make the change to re-use the same
OR
Maybe you could experiment with your proposed change to see if it makes a difference. |
I saw the PR. Can you answer the questions above? It's adding a fair amount of complexity and I want convincing that it is necessary. |
@vangent yeah thats why I left in draft mode. I'm working on testing its impact soon and I'll post results when I have them here (its a bit difficult for me to upgrade this dependency in our monorepo so it'll take me a bit of time) |
Actually, it might be possible to use the same uploader but override options per upload by virtue of the |
I do think that the allocation volume is a problem, especially at scale. I don't think that reusing Instead, I think that the solution is to hand I've opened #3245 to track an API-level solution here. |
Add Upload and Download methods to blob.Bucket and optional Uploader and Downloader interfaces to driver. If a driver.Bucket implements those interfaces, Upload and Download will call through those interfaces. This allows backends that can implement pull-style uploads and push-style downloads more efficiently than push-style uploads and pull-style downloads to do so. These changes include an implementation of driver.Uploader for s3blob.Bucket. That implementation allows s3blob.Bucket to avoid allocating temporary buffers if the input is an io.ReaderAt and an io.Seeker. This can dramatically reduce allocation volume for services that upload large amounts of data to S3. Fixes google#3245 Fixes google#2807
Describe the bug
High memory usage when doing many parallel multi-part uploads.
To Reproduce
Spin up a bunch of goroutines doing muli-part uploads with a large part size.
Expected behavior
Low memory usage.
Version
Latest
Additional context
The issue seems to be caused by the fact that the S3 implementation creates a new multi-part uploader for each writer: https://github.com/google/go-cloud/blob/master/blob/s3blob/s3blob.go#L626
The multipart uploader maintains a pool of byte slices and as a result when there are a lot of parallel uploads you end up with a pool of parts sitting in memory for each upload until the G.C has time to clean them up.
I think this issue could be solved by caching uploaders based on the uploader options and then reusing them (since they're already thread-safe). I can make a P.R for this change if its acceptable.
The text was updated successfully, but these errors were encountered: