Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add blob.open() for file-like I/O #385

Merged
merged 9 commits into from Mar 24, 2021
Merged

feat: add blob.open() for file-like I/O #385

merged 9 commits into from Mar 24, 2021

Conversation

@andrewsg
Copy link
Contributor

@andrewsg andrewsg commented Feb 22, 2021

Fixes #29

@andrewsg andrewsg requested a review from Feb 22, 2021
@andrewsg andrewsg requested a review from as a code owner Feb 22, 2021
@google-cla google-cla bot added the cla: yes label Feb 22, 2021
@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Feb 22, 2021

Docstrings are WIP and system tests should be expanded before merging.

Copy link
Member

@frankyn frankyn left a comment

Added some feedback that I want to talk through in implementation. Otherwise this is looking good. I'll review the design document as well.

google/cloud/storage/blob.py Show resolved Hide resolved

# Upload chunks. The SlidingBuffer class will manage seek position.
for _ in range(num_chunks):
upload.transmit_next_chunk(transport)
Copy link
Member

@frankyn frankyn Feb 24, 2021

Do you know if partial success occurs does transmit_next_chunk handle the retry from the remote offset?

Copy link
Contributor Author

@andrewsg andrewsg Mar 20, 2021

Define "partial success"?

Copy link
Member

@frankyn frankyn Mar 22, 2021

GCS may receive a subset of bytes from client upload, leaving the client and server next byte offset unaligned. A request to get current GCS offset can be made per Step Checking the status of a resumable upload.

If the client doesn't attempt to recover from this state, the inconsistent byte alignment will cause a 400 error which is not recoverable.

Copy link
Contributor Author

@andrewsg andrewsg Mar 22, 2021

Got it. With the changes in this recent update, sliding buffer implements seek() so this feature is as supported as it's going to be in python-storage. If we want further support we need to flesh it out in resumable media.

Copy link
Member

@frankyn frankyn Mar 22, 2021

Let's do that for recovering in a mismatch (address it in resumable media repo). Could you file a tracking issue in that repo?

google/cloud/storage/fileio.py Outdated Show resolved Hide resolved
@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Mar 20, 2021

Added docstrings, seek functionality in the sliding buffer, and other features. PTAL. Remaining questions: can we set up a system test to test end-to-end the error-on-chunk-upload-and-retry behavior? And, even though blob.download methods do not populate the blob generation, can we fetch that generation information from download metadata somehow in order to implement generation lock? (alternative: force blob.reload() before d/l if generation is not populated - race condition possible).

@frankyn
Copy link
Member

@frankyn frankyn commented Mar 22, 2021

Remaining questions:

  1. Can we set up a system test to test end-to-end the error-on-chunk-upload-and-retry behavior?
  • Not without GCS emulator. You'd be better off mocking out the API request at this time.
  1. Even though blob.download methods do not populate the blob generation, can we fetch that generation information from download metadata somehow in order to implement generation lock? (alternative: force blob.reload() before d/l if generation is not populated - race condition possible).
  • You can get generation information from response headers. A similar issue came up in Node.js (for checksums) and here's example of responses specifically the header is named X-goog-generation:.

Copy link
Member

@frankyn frankyn left a comment

Over LGTM, I have a few nits.

Also retrying on mismatched offset may be better handled / addressed in resumable media than in this wrapper so we could table it for now and open tracking issues in resumable media repo. WDYT?

google/cloud/storage/blob.py Outdated Show resolved Hide resolved
google/cloud/storage/blob.py Show resolved Hide resolved
google/cloud/storage/blob.py Outdated Show resolved Hide resolved
google/cloud/storage/blob.py Show resolved Hide resolved
Copy link
Member

@frankyn frankyn left a comment

LGTM! 🚢

@gcf-merge-on-green gcf-merge-on-green bot merged commit 440a0a4 into master Mar 24, 2021
4 checks passed
@gcf-merge-on-green gcf-merge-on-green bot deleted the fileio branch Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants