Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): Add storage upload to move away from unified upload protocol #11508

Merged
merged 9 commits into from
Sep 16, 2022

Conversation

bajajneha27
Copy link
Contributor

No description provided.

Copy link

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looking good, nice progress here! A few things, mostly questions.

google-apis-core/lib/google/apis/core/base_service.rb Outdated Show resolved Hide resolved
request_header = header.dup
request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size)
request_header[CONTENT_LENGTH_HEADER] = current_chunk_size
body = upload_io.read(current_chunk_size)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a chunk upload fails and the input has already been read? Is the upload_io able to rewind? I don't really see how this happens in the code (but my ruby isn't very good!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case there's a failure, I'm manually updating the position of the content to the previous offset.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, gotcha -- so the rewinding happens in L153 I see

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually rewinding happens at L156

@bajajneha27
Copy link
Contributor Author

@tritone Integration tests are already there in google-cloud-ruby. I tested them against this code change and they ran successfully. The tests look thorough enough to me and covers all cases. So these tests along with conformance tests should be enough for our testing, I believe?

@bajajneha27 bajajneha27 marked this pull request as ready for review September 5, 2022 18:14
@bajajneha27 bajajneha27 requested a review from a team as a code owner September 5, 2022 18:14
Copy link

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending approval from a Ruby reviewer and confirmation that the google-cloud-ruby storage integration tests pass.

request_header = header.dup
request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size)
request_header[CONTENT_LENGTH_HEADER] = current_chunk_size
body = upload_io.read(current_chunk_size)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, gotcha -- so the rewinding happens in L153 I see

# This is specifically for storage because we are moving to a new upload protocol.
# Ref: https://cloud.google.com/storage/docs/performing-resumable-uploads
#
# @param [symbol] method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbol is a class and should be capitalized

CONTENT_LENGTH_HEADER = 'Content-Length'
CONTENT_TYPE_HEADER = 'Content-Type'
UPLOAD_CONTENT_TYPE_HEADER = 'X-Upload-Content-Type'
LOCATION_HEADER = 'Location'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we're not running rubocop on this repo yet, but let's start using our style for new code (e.g. double quotes for all strings, omit parens in method calls further down, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

self.upload_io = File.new(upload_source, 'r')
if self.upload_content_type.nil?
type = MiniMime.lookup_by_filename(upload_source)
self.upload_content_type = type && type.content_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think type&.content_type is a bit more clear.


# Check the to see if the upload is complete or needs to be resumed.
#
# @param [Fixnum] status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixnum is defunct as of Ruby 2.4. Use Integer.

end
@close_io_on_finish = true
else
fail Google::Apis::ClientError, 'Invalid upload source'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help to know which type is not supported.

fail Google::Apis::ClientError, 'Invalid upload source'
end
if self.upload_content_type.nil? || self.upload_content_type.empty?
self.upload_content_type = 'application/octet-stream'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCS handles the default value for users; X-Goog-Upload-Content-Type is optional unless that's changed recently.

current_chunk_size = remaining_content_size < CHUNK_SIZE ? remaining_content_size : CHUNK_SIZE

request_header = header.dup
request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this code accepts a streamable type such as STDIN which may know the upload_io.size.

You can fall back to unknown sized resumable uploads:
image.

source: https://cloud.google.com/storage/docs/streaming#rest-apis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. We'll implement this in a separate PR. Created b/246693802 to track it.


let(:command) do
command = Google::Apis::Core::StorageUploadCommand.new(:post, 'https://www.googleapis.com/zoo/animals')
command.upload_source = StringIO.new('Hello world')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include IO type which does not have a known size in your tests.

request_header[CONTENT_LENGTH_HEADER] = current_chunk_size
body = upload_io.read(current_chunk_size)

response = client.put(@upload_url, body: body, header: request_header, follow_redirect: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit here, but I'd name the body local variable something like chunk_body instead, to distinguish it from self.body which is a method on the base class (and is treated as such in line 120). This will protect you from a bug if you ever move these lines around and forget that the name is overloaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for pointing it out.

@bajajneha27 bajajneha27 merged commit d9f8a13 into googleapis:main Sep 16, 2022
stanhu added a commit to stanhu/google-api-ruby-client that referenced this pull request Jan 12, 2023
The change in
googleapis#11508 caused
the HTTP client to send a String buffer instead of an IO buffer, which
significantly slowed upload performance since the SSL socket attempts
to resize the String buffer after sending 16K blocks. To avoid this
unnecessary String resizing, wrap the read chunk in a `StringIO`
object.

Relates to googleapis#13212
stanhu added a commit to stanhu/google-api-ruby-client that referenced this pull request Jan 12, 2023
The change in
googleapis#11508 caused
the HTTP client to send a String buffer instead of an IO buffer, which
significantly slowed upload performance since the SSL socket attempts
to resize the String buffer after sending 16K blocks. To avoid this
unnecessary String resizing, wrap the read chunk in a `StringIO`
object.

Relates to googleapis#13212
stanhu added a commit to stanhu/google-api-ruby-client that referenced this pull request Jan 12, 2023
The change in googleapis#11508 introduced a significant performance
regression. It caused the HTTP client to read 8 MB chunks from the
upload source and send each chunk in separate PUT requests. This
significantly slowed upload performance since the SSL socket sends 16K
at a time and resizes the buffer each time. This caused CPU to
skyrocket since it requires allocating a new buffer and copying
existing data into that string. The garbage collector would have to
work hard to keep up.

To eliminate this unnecessary String resizing, wrap the read chunk in
a `StringIO` object. This enables `OpenSSL::Buffering` to read and
write a full 16K buffer.

Relates to googleapis#13212
stanhu added a commit to stanhu/google-api-ruby-client that referenced this pull request Jan 12, 2023
The change in googleapis#11508 introduced a significant performance
regression. It caused the HTTP client to read 8 MB chunks from the
upload source and send each chunk in separate PUT requests with a
String object instead of an IO object. This significantly slowed
upload performance since the SSL socket sends 16K at a time and
resizes the String buffer with each iteration. This caused CPU to
skyrocket since it requires allocating a new buffer and copying
existing data into that string. The garbage collector would have to
work hard to keep up.

To eliminate this unnecessary String resizing, wrap the read chunk in
a `StringIO` object. This enables `OpenSSL::Buffering` to read and
write a full 16K buffer.

Relates to googleapis#13212
dazuma pushed a commit that referenced this pull request Jan 12, 2023
The change in #11508 introduced a significant performance
regression. It caused the HTTP client to read 8 MB chunks from the
upload source and send each chunk in separate PUT requests with a
String object instead of an IO object. This significantly slowed
upload performance since the SSL socket sends 16K at a time and
resizes the String buffer with each iteration. This caused CPU to
skyrocket since it requires allocating a new buffer and copying
existing data into that string. The garbage collector would have to
work hard to keep up.

To eliminate this unnecessary String resizing, wrap the read chunk in
a `StringIO` object. This enables `OpenSSL::Buffering` to read and
write a full 16K buffer.

Relates to #13212
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants