-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blob: allow driver.Bucket to upload data by pulling from an io.Reader
instead of pushing to an io.Writer
#3245
Comments
Thanks for the report. I will see what I can do for this, I like the general idea and think it's feasible. If I send a PR, will you able to verify the performance gain? I can verify correctness fairly well, but not the underlying memory churn you're seeing under load. |
Add Upload and Download methods to blob.Bucket and optional Uploader and Downloader interfaces to driver. If a driver.Bucket implements those interfaces, Upload and Download will call through those interfaces. This allows backends that can implement pull-style uploads and push-style downloads more efficiently than push-style uploads and pull-style downloads to do so. These changes include an implementation of driver.Uploader for s3blob.Bucket. That implementation allows s3blob.Bucket to avoid allocating temporary buffers if the input is an io.ReaderAt and an io.Seeker. This can dramatically reduce allocation volume for services that upload large amounts of data to S3. Fixes google#3245 Fixes google#2807
Definitely. FWIW, I just pushed #3247 with some ideas--happy to take that further or to review and validate your changes! |
Ping @pgavlin |
@pgavlin ....? Bueller? |
Hey, sorry for not responding sooner--I've bene pretty slammed with other work. I'll give #3248 a go tomorrow morning. |
Okay, finally got to this. Yes, I think that #3248 would work for us. We have a benchmark that I've used to measure the relative allocation overhead of our current implementation that calls
If you're interested, I can isolate the benchmark from our internal wrappers and toss it in a gist. |
Please use a title starting with the name of the affected package, or "all",
followed by a colon, followed by a short summary of the feature request.
Example:
blob/gcsblob: add support for more blobbing
.Is your feature request related to a problem? Please describe.
Unless I am mistaken, the only way to upload data to a
driver.Bucket
is usingdriver.Bucket.NewTypedWriter
. While this is fine for most scenarios--users can justio.Copy
anio.Reader
into thedriver.Writer
--it can cause performance problems in cases where the API of the underlying storage works in terms ofio.Reader
. In those cases, the driver must use something likeio.Pipe
to bridge between thedriver.Writer
interface and theio.Reader
expected by the underlying storage API. For an example of the problems this can cause, see #2807, which details allocation volume issues fors3blob
.Describe the solution you'd like
We really need a method whose semantics are "upload the data from this reader" s.t. we can pass the reader directly to the underlying storage. Maybe we could add
Upload
methods toblob.Bucket
anddriver.Bucket
:For parity, we could also add
Download
methods:Describe alternatives you've considered
blob.Bucket.Writer
already exposesReadFrom
. My first thought was that we could change the implementation of that method to check whether or not the underlying writer is anio.ReaderFrom
and if so call itsReadFrom
method. Unfortunately, I don't think that really helps, as I think the semantics ofblob.ReadFrom
are such that it can be called repeatedly.Additional context
In order to work around this issue, we've specialized some of out internal code to use
s3manager.Uploader
directly instead ofbucket.NewWriter
because profiles indicated that we were spending a disproportionate amount of time zeroing and collecting temporary buffers due to #2807. The workaround isn't too annoying, but we were pretty shocked when we discovered that we were hitting #2807 and that it was costing us so much CPU time.The text was updated successfully, but these errors were encountered: