Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: download_to_filename hangs if network disconnects in the middle #5909

Closed
logmackenzie opened this issue Sep 10, 2018 · 6 comments
Closed
Assignees
Labels
api: storage Issues related to the Cloud Storage API. help wanted We'd love to have community involvement on this issue. type: question Request for information or clarification. Not an issue.

Comments

@logmackenzie
Copy link

I have an issue where I am downloading large files (a few GB) from Google Cloud Storage using the download_to_filename method. My application needs to be able to transition into and out of offline mode nicely so during testing I was disconnecting from the network and found that if I disconnect in the middle of download_to_filename, the application just hangs and the function never returns.

Example code:

from google.cloud import storage

client = storage.Client('project-id')
bucket = client.get_bucket('bucket-name')
blob = storage.blob.Blob('Large-file.txt', bucket)
blob.download_to_filename(blob.name)

If I disconnect from the network just before calling blob.download_to_filename, then I get a ConnectionError from requests which is what I would expect. But if I disconnect from the network after blob.download_to_filename starts, the function just hangs. There is no timeout or exception raised.

I am using Python 3.6.1 on Windows 10 with these package versions:

google-api-core==1.3.0
google-auth==1.5.1
google-cloud-core==0.28.1
google-cloud-storage==1.10.0
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3

I have not been able to find a good work-around to this, but in my opinion it would make sense for download_to_filename to take a timeout argument and raise an exception if it fails.

I realize that it does resume nicely after network connection is re-established, but in my application the state of those files can change while offline so that does not help me. So maybe having a timeout that by default is None and would maintain the existing behavior, but provides the option to force the function to return within a reasonable amount of time.

I also posted a question to StackOverflow about a possible solution (https://stackoverflow.com/questions/52239860/download-to-filename-hangs-if-network-disconnects-in-the-middle), but this is more of a request for an improvement to the API.

Also, I am not sure if this issue also applies to the upload_from_filename or not, but that might also require a similar modification.

@tseaver tseaver added type: question Request for information or clarification. Not an issue. api: storage Issues related to the Cloud Storage API. labels Sep 10, 2018
@tseaver
Copy link
Contributor

tseaver commented Sep 10, 2018

/cc @frankyn

@pulltab
Copy link

pulltab commented Sep 26, 2018

+1, I am experiencing a similar issue when uploading files as well.

Being able to set a timeout on such operations would help me recover more gracefully.

@tseaver
Copy link
Contributor

tseaver commented Oct 18, 2018

We might need to handle this in google-resumable-media. @dhermes WDYT?

@dhermes
Copy link
Contributor

dhermes commented Oct 18, 2018

@tseaver Sounds reasonable. Any good ideas on reproducing this?

@tseaver
Copy link
Contributor

tseaver commented Oct 22, 2018

@dhermes maybe monkey-patch requests.request with something which doesn't return?

@tseaver tseaver added this to To do in 2018 Q4 Fixit Dec 4, 2018
@crwilcox crwilcox added the help wanted We'd love to have community involvement on this issue. label Feb 15, 2019
@fgrzadkowski
Copy link

@tseaver @dhermes We experience the same issue when uploading and our thread hangs forever. In our setup every few seconds we upload a relatively small file to a bucket (below 100 kB). And we depend on those files to exist.

What's the recommended workaround for this issue? Currently it makes our code not production-grade because of this problem...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API. help wanted We'd love to have community involvement on this issue. type: question Request for information or clarification. Not an issue.
Projects
No open projects
2018 Q4 Fixit
  
Done
Development

No branches or pull requests

6 participants