Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add configurable checksumming for blob uploads and downloads #246

Merged
merged 14 commits into from Aug 26, 2020
Merged

Conversation

@andrewsg
Copy link
Contributor

@andrewsg andrewsg commented Aug 14, 2020

No description provided.

@andrewsg andrewsg requested review from crwilcox and frankyn Aug 14, 2020
@google-cla google-cla bot added the cla: yes label Aug 14, 2020
response = upload.transmit_next_chunk(transport, timeout=timeout)
except resumable_media.DataCorruption:
# Attempt to delete the corrupted object.
self.delete()
Copy link
Contributor Author

@andrewsg andrewsg Aug 14, 2020

If this fails for any reason, both exceptions will be reported, using the Python exception pattern "During handling of the above exception, another exception occurred"

Copy link
Contributor

@tseaver tseaver Aug 17, 2020

What occurs here if a valid previous generation of the blob exists?

Copy link
Contributor Author

@andrewsg andrewsg Aug 21, 2020

In the default case, if versioning is off, there is no way to recover the older version. Unfortunately this can only be resolved by backend changes to make server-side checksumming possible for resumable uploads.

In the case where versioning is on, the previous generation still exists, but is not live. This is consistent with behavior in nodejs, which I modeled this code after. We could potentially try to roll back and make the previous version live, but we should also modify nodejs and other languages to be consistent across languages. What do you think?

Copy link
Member

@frankyn frankyn Aug 21, 2020

I think this is more of a documentation issue until there's a better fix in the backend to support checksums better.

I'd prefer consistency here with clear acknowledgment that these cases can occur when using checksumming. It's not done automatically so it doesn't change behavior for uploads.

:type checksum: str
:param checksum:
(Optional) The type of checksum to compute to verify
the integrity of the object. If the upload is completed in a single
Copy link
Contributor Author

@andrewsg andrewsg Aug 14, 2020

This "if the upload is completed in a single request" / "if the upload is too large" is undeniably ugly and hopefully we can implement server-side handling of resumable upload checksums to resolve that soon

@andrewsg andrewsg changed the title Add configurable checksumming for blob uploads and downloads feat: add configurable checksumming for blob uploads and downloads Aug 14, 2020
@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Aug 14, 2020

Recommend hold off on merging until google-resumable-media releases to 1.0 (and bump up the dependency version number in setup.py here)

tests/system/test_system.py Outdated Show resolved Hide resolved
tests/system/test_system.py Show resolved Hide resolved
response = upload.transmit_next_chunk(transport, timeout=timeout)
except resumable_media.DataCorruption:
# Attempt to delete the corrupted object.
self.delete()
Copy link
Member

@frankyn frankyn Aug 21, 2020

I think this is more of a documentation issue until there's a better fix in the backend to support checksums better.

I'd prefer consistency here with clear acknowledgment that these cases can occur when using checksumming. It's not done automatically so it doesn't change behavior for uploads.

@tseaver
Copy link
Contributor

@tseaver tseaver commented Aug 24, 2020

googleapis/google-resumable-media-python#142 is the PR for bumping GRMP to 1.0.

@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Aug 24, 2020

@tseaver @frankyn PTAL

@tseaver Has the PR you linked made it to release yet? If it's released I'll use this PR to also update setup.py to use the new version.

@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Aug 25, 2020

Bumped version number. This should be good to merge as soon as final review comes through. Thanks all

Copy link
Member

@frankyn frankyn left a comment

LGTM, thanks @andrewsg!

@andrewsg
Copy link
Contributor Author

@andrewsg andrewsg commented Aug 26, 2020

@tseaver PTAL soon if you are available, I've been asked to release. Thanks!

@tseaver
Copy link
Contributor

@tseaver tseaver commented Aug 26, 2020

@tseaver tseaver merged commit 23b7d1c into master Aug 26, 2020
4 checks passed
@tseaver tseaver deleted the crc32c branch Aug 26, 2020
cojenco added a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
…oogleapis#246)

Co-authored-by: Tres Seaver <tseaver@palladion.com>
Co-authored-by: Frank Natividad <frankyn@users.noreply.github.com>
cojenco added a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
…oogleapis#246)

Co-authored-by: Tres Seaver <tseaver@palladion.com>
Co-authored-by: Frank Natividad <frankyn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants