-
Notifications
You must be signed in to change notification settings - Fork 869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(core): Add storage upload to move away from unified upload protocol #11508
feat(core): Add storage upload to move away from unified upload protocol #11508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking good, nice progress here! A few things, mostly questions.
request_header = header.dup | ||
request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size) | ||
request_header[CONTENT_LENGTH_HEADER] = current_chunk_size | ||
body = upload_io.read(current_chunk_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a chunk upload fails and the input has already been read? Is the upload_io able to rewind? I don't really see how this happens in the code (but my ruby isn't very good!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case there's a failure, I'm manually updating the position of the content to the previous offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, gotcha -- so the rewinding happens in L153 I see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually rewinding happens at L156
a757458
to
7e672a7
Compare
@tritone Integration tests are already there in google-cloud-ruby. I tested them against this code change and they ran successfully. The tests look thorough enough to me and covers all cases. So these tests along with conformance tests should be enough for our testing, I believe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending approval from a Ruby reviewer and confirmation that the google-cloud-ruby storage integration tests pass.
request_header = header.dup | ||
request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size) | ||
request_header[CONTENT_LENGTH_HEADER] = current_chunk_size | ||
body = upload_io.read(current_chunk_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, gotcha -- so the rewinding happens in L153 I see
# This is specifically for storage because we are moving to a new upload protocol. | ||
# Ref: https://cloud.google.com/storage/docs/performing-resumable-uploads | ||
# | ||
# @param [symbol] method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symbol
is a class and should be capitalized
CONTENT_LENGTH_HEADER = 'Content-Length' | ||
CONTENT_TYPE_HEADER = 'Content-Type' | ||
UPLOAD_CONTENT_TYPE_HEADER = 'X-Upload-Content-Type' | ||
LOCATION_HEADER = 'Location' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're not running rubocop on this repo yet, but let's start using our style for new code (e.g. double quotes for all strings, omit parens in method calls further down, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
self.upload_io = File.new(upload_source, 'r') | ||
if self.upload_content_type.nil? | ||
type = MiniMime.lookup_by_filename(upload_source) | ||
self.upload_content_type = type && type.content_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think type&.content_type
is a bit more clear.
|
||
# Check the to see if the upload is complete or needs to be resumed. | ||
# | ||
# @param [Fixnum] status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixnum is defunct as of Ruby 2.4. Use Integer.
end | ||
@close_io_on_finish = true | ||
else | ||
fail Google::Apis::ClientError, 'Invalid upload source' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help to know which type is not supported.
fail Google::Apis::ClientError, 'Invalid upload source' | ||
end | ||
if self.upload_content_type.nil? || self.upload_content_type.empty? | ||
self.upload_content_type = 'application/octet-stream' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCS handles the default value for users; X-Goog-Upload-Content-Type
is optional unless that's changed recently.
current_chunk_size = remaining_content_size < CHUNK_SIZE ? remaining_content_size : CHUNK_SIZE | ||
|
||
request_header = header.dup | ||
request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this code accepts a streamable type such as STDIN
which may know the upload_io.size
.
You can fall back to unknown sized resumable uploads:
.
source: https://cloud.google.com/storage/docs/streaming#rest-apis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline. We'll implement this in a separate PR. Created b/246693802 to track it.
|
||
let(:command) do | ||
command = Google::Apis::Core::StorageUploadCommand.new(:post, 'https://www.googleapis.com/zoo/animals') | ||
command.upload_source = StringIO.new('Hello world') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include IO type which does not have a known size in your tests.
request_header[CONTENT_LENGTH_HEADER] = current_chunk_size | ||
body = upload_io.read(current_chunk_size) | ||
|
||
response = client.put(@upload_url, body: body, header: request_header, follow_redirect: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit here, but I'd name the body
local variable something like chunk_body
instead, to distinguish it from self.body
which is a method on the base class (and is treated as such in line 120). This will protect you from a bug if you ever move these lines around and forget that the name is overloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks for pointing it out.
The change in googleapis#11508 caused the HTTP client to send a String buffer instead of an IO buffer, which significantly slowed upload performance since the SSL socket attempts to resize the String buffer after sending 16K blocks. To avoid this unnecessary String resizing, wrap the read chunk in a `StringIO` object. Relates to googleapis#13212
The change in googleapis#11508 caused the HTTP client to send a String buffer instead of an IO buffer, which significantly slowed upload performance since the SSL socket attempts to resize the String buffer after sending 16K blocks. To avoid this unnecessary String resizing, wrap the read chunk in a `StringIO` object. Relates to googleapis#13212
The change in googleapis#11508 introduced a significant performance regression. It caused the HTTP client to read 8 MB chunks from the upload source and send each chunk in separate PUT requests. This significantly slowed upload performance since the SSL socket sends 16K at a time and resizes the buffer each time. This caused CPU to skyrocket since it requires allocating a new buffer and copying existing data into that string. The garbage collector would have to work hard to keep up. To eliminate this unnecessary String resizing, wrap the read chunk in a `StringIO` object. This enables `OpenSSL::Buffering` to read and write a full 16K buffer. Relates to googleapis#13212
The change in googleapis#11508 introduced a significant performance regression. It caused the HTTP client to read 8 MB chunks from the upload source and send each chunk in separate PUT requests with a String object instead of an IO object. This significantly slowed upload performance since the SSL socket sends 16K at a time and resizes the String buffer with each iteration. This caused CPU to skyrocket since it requires allocating a new buffer and copying existing data into that string. The garbage collector would have to work hard to keep up. To eliminate this unnecessary String resizing, wrap the read chunk in a `StringIO` object. This enables `OpenSSL::Buffering` to read and write a full 16K buffer. Relates to googleapis#13212
The change in #11508 introduced a significant performance regression. It caused the HTTP client to read 8 MB chunks from the upload source and send each chunk in separate PUT requests with a String object instead of an IO object. This significantly slowed upload performance since the SSL socket sends 16K at a time and resizes the String buffer with each iteration. This caused CPU to skyrocket since it requires allocating a new buffer and copying existing data into that string. The garbage collector would have to work hard to keep up. To eliminate this unnecessary String resizing, wrap the read chunk in a `StringIO` object. This enables `OpenSSL::Buffering` to read and write a full 16K buffer. Relates to #13212
No description provided.