Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add multiprocessing and chunked downloading to transfer manager #1002

Merged
merged 14 commits into from
Mar 21, 2023

Conversation

andrewsg
Copy link
Contributor

@andrewsg andrewsg commented Mar 7, 2023

No description provided.

@andrewsg andrewsg requested review from a team as code owners March 7, 2023 17:59
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: storage Issues related to the googleapis/python-storage API. labels Mar 7, 2023
@andrewsg andrewsg changed the title Add multiprocessing and chunked downloading to transfer manager feat: Add multiprocessing and chunked downloading to transfer manager Mar 7, 2023
Copy link

@danielduhh danielduhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Just had a couple thoughts..

google/cloud/storage/transfer_manager.py Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Outdated Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Show resolved Hide resolved
Copy link
Contributor

@cojenco cojenco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awesome to see both process and thread workers being supported 🎉 Still making my way through the PR :) A few questions

google/cloud/storage/transfer_manager.py Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Outdated Show resolved Hide resolved
google/cloud/storage/transfer_manager.py Outdated Show resolved Hide resolved
@andrewsg
Copy link
Contributor Author

System test is deadlocking due to grpc/grpc#31885, investigating workaround

Copy link
Contributor

@cojenco cojenco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions after a second pass; otherwise LGTM, thanks!

google/cloud/storage/transfer_manager.py Outdated Show resolved Hide resolved
behavior. The default is therefore to use processes instead of threads.

Checksumming (md5 or crc32c) is not supported for chunked operations. Any
`checksum` parameter passed in to download_kwargs will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call out. Where are we ignoring the checksum parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explained this offline, but for posterity, checksum is not sent by the server for ranged reads, and chunked downloads are all ranged reads, so none of the responses include a checksum. As a result, the resumable media code silently ignores the checksum parameter.

google/cloud/storage/transfer_manager.py Show resolved Hide resolved
with open(full_filename, "rb") as file_obj:
assert _base64_md5hash(file_obj) == source_file["hash"]

# Now test for case where last chunk is exactly 1 byte.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough tests! I especially like this one; really helped check the cursor moving through the last byte.

@andrewsg
Copy link
Contributor Author

@cojenco @danielduhh responded to feedback, PTAL. Thanks!

Copy link
Contributor

@ddelgrosso1 ddelgrosso1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a functionality standpoint this LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants