-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storage: Timeout when upload file using google.cloud.storage.Blob.upload_from_filename() #74
Comments
You state you ran I created a repro, but unfortunately this is working well for me.
For good measure, I tried the same thing with a 4Gb file as well:
This was done from a new virtual environment and python 3.8
|
I'm also getting the same error on both Ubuntu 18.0.4 with Python 3.6.9 and Windows 10 with Python 3.8.0, both using google-cloud-storage 1.26.0 |
@ElectricSwan , what is the code you are running? The sample code I provided runs well beyond 60 seconds. Simplified it is
|
@crwilcox , I just ran your simplified code in a virtual env with Python 3.6.9 on Ubuntu 18.04, and I get the timeout at 60 seconds. I get the exact same timeout error with your simplified code on my Windows 10 PC running Python 3.8.0. Here is the
Here's my
The only potentially relevant differences that I can see between my Windows and Ubuntu boxes are that my Windows box has slightly older versions of;
but the result is the same on both platforms; timeout after 60 seconds. Here is my stacktrace on the Ubuntu PC, which is almost identical to the stacktrace submitted by @vfa-minhtv
I also just tried installing |
Hi @crwilcox! So sorry. I gave a wrong example. My actual code is: |
Exact same thing happen for me. My internet is also limited so 60 seconds timeout is insufficient to finish uploading |
@vfa-minhtv, I have been experiencing similar timeout issues on my macOS and Win platforms with google-cloud-storage==1.26.0. However, the timeout issues are inconsistent and apparently dependent on the network speed. As already mentioned in this thread, typically it fails with very slow upload speed. I checked the code and found that any data stream of 8 MB and larger will _do_resumable_upload(..) which sends the data stream in chunks (which absolutely makes sense to support slow network connectivity):
However, the chunk size is not set in the initialization call and therefore will be set to some predefined default value:
This default value is set to 100 MB:
So you must have ~ 15 MBps upload speed to complete the request within 1 min, which is apparently the default timeout (see http://www.meridianoutpost.com/resources/etools/calculators/calculator-file-download-time.php for quick upload time calculations). I made a test and reduced the _DEFAULT_CHUNKSIZE value to 10 MB, which solved my issues. |
Thank you @aborzin for sharing your excellent investigation. I've edited lines 108 and 109 of
With my 800 kbps upload speed, the maximum size was 6 MB, so I chose 5 MB to provide some margin. |
@ElectricSwan, my problem with the workaround I proposed earlier is that it was not portable (because it changes the code in the local version of google-cloud-storage lib). So I decided to override the chunk size of the blob object after it is created:
Even though it is a bad practice to access the "private" variable of a class, it seems to be a reasonable solution for now. |
@aborzin, I agree wholeheartedly that it is not a good idea to change code in a library, and I do prefer your 2nd solution, but unfortunately it doesn't work for me, because of the test at line 1191; In my case (800 kbps upload), I am unable to upload 8 MB within 60 seconds. That is why I have to change the value of Because |
@ElectricSwan, I agree that my second solution works only if you set the chunk size to 8MB or larger because of the _MAX_MULTIPART_SIZE threshold. However I think you can override it from your code as well:
I debugged this option and and the threshold was set correctly to 5 MB. Of course, you can do it once, after the google.cloud.storage package is loaded (and not do it for each and every call to upload a file). |
@aborzin, I was working on the same solution, and was just about to post it when I got your notification. I've done;
|
@aborzin , I found that there is a setter for chunk size on the blob object, so I've replaced the module-level This also means that, for anyone with an upload speed of at least 1.1 Mbps [1], then no change needs to be made to the library, and only the public setter needs to be used. For anyone whose upload speed is less than 1.1 Mbps, the module level [1] 1.1 Mbps is the minimum required to upload 8 MB within the 60 second timeout |
PR #185 added explicit timeout argument to the blob methods. Now users can pass a longer timeout to resolve this issue. Feel free to reopen if this issue appear again. |
I am having the exact same issue constantly while uploading a 15 mb file on 2.4mb upload speed.
|
GCS has a default chunk size of 100Mb and a default timeout per chunk of 60 seconds, so an upload speed of 13.3Mbps is required to be able to log an artifact over 100Mb with the default settings. This is discussed at length in this issue on the python-storage module of googleapis: googleapis/python-storage#74. I propose we increase the default timeout from 1 minute to 10, thereby allowing a minimum upload speed of 1.3Mbps to complete a 100Mb upload in the allotted time. At the very least I think Mlflow should accept a user override for this parameter. Thanks for reading! Signed-off-by: MacKinley Smith <smit1625@msu.edu>
Environment details
OS: MacOS 10.15.1
Python: Python 3.7.4
Google-cloud version:
Steps to reproduce
blob.upload_from_filename("path/on/storage", "path/of/big/file/on/local")
Stack trace
Expected result
No timeout error
Actual result
The upload timeout after 1 minute
The text was updated successfully, but these errors were encountered: