feat(core): Add storage upload to move away from unified upload protocol #11508

bajajneha27 · 2022-09-01T08:35:52Z

No description provided.

tritone

Generally looking good, nice progress here! A few things, mostly questions.

google-apis-core/spec/google/apis/core/storage_upload_spec.rb

google-apis-core/lib/google/apis/core/base_service.rb

google-apis-core/lib/google/apis/core/storage_upload.rb

tritone · 2022-09-01T20:40:17Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+          request_header = header.dup
+          request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size)
+          request_header[CONTENT_LENGTH_HEADER] = current_chunk_size
+          body = upload_io.read(current_chunk_size)


What happens if a chunk upload fails and the input has already been read? Is the upload_io able to rewind? I don't really see how this happens in the code (but my ruby isn't very good!).

In case there's a failure, I'm manually updating the position of the content to the previous offset.

Okay, gotcha -- so the rewinding happens in L153 I see

Actually rewinding happens at L156

google-apis-generator/lib/google/apis/generator/templates/_method.tmpl

bajajneha27 · 2022-09-05T18:13:08Z

@tritone Integration tests are already there in google-cloud-ruby. I tested them against this code change and they ran successfully. The tests look thorough enough to me and covers all cases. So these tests along with conformance tests should be enough for our testing, I believe?

tritone

LGTM, pending approval from a Ruby reviewer and confirmation that the google-cloud-ruby storage integration tests pass.

tritone · 2022-09-06T17:12:34Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+          request_header = header.dup
+          request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size)
+          request_header[CONTENT_LENGTH_HEADER] = current_chunk_size
+          body = upload_io.read(current_chunk_size)


Okay, gotcha -- so the rewinding happens in L153 I see

dazuma · 2022-09-07T06:13:17Z

google-apis-core/lib/google/apis/core/base_service.rb

+        # This is specifically for storage because we are moving to a new upload protocol.
+        # Ref: https://cloud.google.com/storage/docs/performing-resumable-uploads
+        #
+        # @param [symbol] method


Symbol is a class and should be capitalized

dazuma · 2022-09-07T06:19:52Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+        CONTENT_LENGTH_HEADER = 'Content-Length'
+        CONTENT_TYPE_HEADER = 'Content-Type'
+        UPLOAD_CONTENT_TYPE_HEADER = 'X-Upload-Content-Type'
+        LOCATION_HEADER = 'Location'


I know we're not running rubocop on this repo yet, but let's start using our style for new code (e.g. double quotes for all strings, omit parens in method calls further down, etc.)

dazuma · 2022-09-07T06:28:18Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+            self.upload_io = File.new(upload_source, 'r')
+            if self.upload_content_type.nil?
+              type = MiniMime.lookup_by_filename(upload_source)
+              self.upload_content_type = type && type.content_type


I think type&.content_type is a bit more clear.

dazuma · 2022-09-07T06:31:32Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+
+        # Check the to see if the upload is complete or needs to be resumed.
+        #
+        # @param [Fixnum] status


Fixnum is defunct as of Ruby 2.4. Use Integer.

frankyn · 2022-09-07T14:16:35Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+            end
+            @close_io_on_finish = true
+          else
+            fail Google::Apis::ClientError, 'Invalid upload source'


It would help to know which type is not supported.

frankyn · 2022-09-07T14:19:51Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+            fail Google::Apis::ClientError, 'Invalid upload source'
+          end
+          if self.upload_content_type.nil? || self.upload_content_type.empty?
+            self.upload_content_type = 'application/octet-stream'


GCS handles the default value for users; X-Goog-Upload-Content-Type is optional unless that's changed recently.

frankyn · 2022-09-07T14:24:44Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+          current_chunk_size = remaining_content_size < CHUNK_SIZE ? remaining_content_size : CHUNK_SIZE
+
+          request_header = header.dup
+          request_header[CONTENT_RANGE_HEADER] = sprintf('bytes %d-%d/%d', @offset, @offset+current_chunk_size-1, upload_io.size)


IIUC, this code accepts a streamable type such as STDIN which may know the upload_io.size.

You can fall back to unknown sized resumable uploads:
.

source: https://cloud.google.com/storage/docs/streaming#rest-apis

Discussed offline. We'll implement this in a separate PR. Created b/246693802 to track it.

frankyn · 2022-09-07T14:29:43Z

google-apis-core/spec/google/apis/core/storage_upload_spec.rb

+
+  let(:command) do
+    command = Google::Apis::Core::StorageUploadCommand.new(:post, 'https://www.googleapis.com/zoo/animals')
+    command.upload_source = StringIO.new('Hello world')


Please include IO type which does not have a known size in your tests.

dazuma · 2022-09-14T22:24:45Z

google-apis-core/lib/google/apis/core/storage_upload.rb

+          request_header[CONTENT_LENGTH_HEADER] = current_chunk_size
+          body = upload_io.read(current_chunk_size)
+
+          response = client.put(@upload_url, body: body, header: request_header, follow_redirect: true)


Small nit here, but I'd name the body local variable something like chunk_body instead, to distinguish it from self.body which is a method on the base class (and is treated as such in line 120). This will protect you from a bug if you ever move these lines around and forget that the name is overloaded.

Makes sense. Thanks for pointing it out.

The change in googleapis#11508 caused the HTTP client to send a String buffer instead of an IO buffer, which significantly slowed upload performance since the SSL socket attempts to resize the String buffer after sending 16K blocks. To avoid this unnecessary String resizing, wrap the read chunk in a `StringIO` object. Relates to googleapis#13212

The change in googleapis#11508 introduced a significant performance regression. It caused the HTTP client to read 8 MB chunks from the upload source and send each chunk in separate PUT requests. This significantly slowed upload performance since the SSL socket sends 16K at a time and resizes the buffer each time. This caused CPU to skyrocket since it requires allocating a new buffer and copying existing data into that string. The garbage collector would have to work hard to keep up. To eliminate this unnecessary String resizing, wrap the read chunk in a `StringIO` object. This enables `OpenSSL::Buffering` to read and write a full 16K buffer. Relates to googleapis#13212

The change in googleapis#11508 introduced a significant performance regression. It caused the HTTP client to read 8 MB chunks from the upload source and send each chunk in separate PUT requests with a String object instead of an IO object. This significantly slowed upload performance since the SSL socket sends 16K at a time and resizes the String buffer with each iteration. This caused CPU to skyrocket since it requires allocating a new buffer and copying existing data into that string. The garbage collector would have to work hard to keep up. To eliminate this unnecessary String resizing, wrap the read chunk in a `StringIO` object. This enables `OpenSSL::Buffering` to read and write a full 16K buffer. Relates to googleapis#13212

The change in #11508 introduced a significant performance regression. It caused the HTTP client to read 8 MB chunks from the upload source and send each chunk in separate PUT requests with a String object instead of an IO object. This significantly slowed upload performance since the SSL socket sends 16K at a time and resizes the String buffer with each iteration. This caused CPU to skyrocket since it requires allocating a new buffer and copying existing data into that string. The garbage collector would have to work hard to keep up. To eliminate this unnecessary String resizing, wrap the read chunk in a `StringIO` object. This enables `OpenSSL::Buffering` to read and write a full 16K buffer. Relates to #13212

tritone reviewed Sep 1, 2022

View reviewed changes

bajajneha27 added 4 commits September 5, 2022 13:12

feat(core): Add storage upload to move away from unified upload protocol

9c5fee2

make_storage_upload_command for storage object insert method

24bce78

reset content pos in case of failure

ee4bcd3

address review comments

7e672a7

bajajneha27 force-pushed the storage/retry/232351963 branch from a757458 to 7e672a7 Compare September 5, 2022 09:46

bug fix: Return with object when successful

16da256

bajajneha27 marked this pull request as ready for review September 5, 2022 18:14

bajajneha27 requested a review from a team as a code owner September 5, 2022 18:14

tritone approved these changes Sep 6, 2022

View reviewed changes

dazuma reviewed Sep 7, 2022

View reviewed changes

bajajneha27 added 2 commits September 7, 2022 18:09

add review comments

2e3958c

fix test case

55fec8f

frankyn suggested changes Sep 7, 2022

View reviewed changes

chore(core): address review comments

567edaf

dazuma reviewed Sep 14, 2022

View reviewed changes

chore(core): address review comments

69686f9

frankyn approved these changes Sep 15, 2022

View reviewed changes

dazuma approved these changes Sep 16, 2022

View reviewed changes

bajajneha27 merged commit d9f8a13 into googleapis:main Sep 16, 2022

This was referenced Sep 16, 2022

chore(main): release google-apis-core 0.8.0 #11661

Merged

chore(main): release google-apis-generator 0.10.0 #11662

Merged

This was referenced Jan 11, 2023

storage: Extremely slow object upload performance (regression in google-apis-core 0.8.0) #13212

Closed

fix: Improve upload performance #13213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): Add storage upload to move away from unified upload protocol #11508

feat(core): Add storage upload to move away from unified upload protocol #11508

bajajneha27 commented Sep 1, 2022

tritone left a comment

tritone Sep 1, 2022

bajajneha27 Sep 5, 2022

tritone Sep 6, 2022

bajajneha27 Sep 7, 2022

bajajneha27 commented Sep 5, 2022

tritone left a comment

tritone Sep 6, 2022

dazuma Sep 7, 2022

dazuma Sep 7, 2022

bajajneha27 Sep 7, 2022

dazuma Sep 7, 2022

dazuma Sep 7, 2022

frankyn Sep 7, 2022

frankyn Sep 7, 2022

frankyn Sep 7, 2022

bajajneha27 Sep 14, 2022

frankyn Sep 7, 2022

dazuma Sep 14, 2022

bajajneha27 Sep 15, 2022

feat(core): Add storage upload to move away from unified upload protocol #11508

feat(core): Add storage upload to move away from unified upload protocol #11508

Conversation

bajajneha27 commented Sep 1, 2022

tritone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bajajneha27 commented Sep 5, 2022

tritone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment