You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a subprocess.Popen pipe, this returns a stat structure with st_size set to 0. For some miracle that I didn't track down, this appears to cause the library to upload the output in a single chunk, rather than uploading a zero length file, which is a good failure mode to have!
However, I would expect this to fail with the usual total bytes could not be determined error, and to use a chunked upload once I fix that. Additionally: The streaming code relies on a file implementing file.tell(), which Popen pipes do not: It raises IOError Illegal seek. The code seems to only use tell() for the current position, so this could be fixed by having the code maintain the position itself.
I worked around this bug by wrapping the Popen object in my own object that implements read and tell and it seems to work.
The line you linked to (929) is just for open directly from files. That st_size check is present in the latest released version but has been removed in HEAD (not released yet though).
RE: .tell(), upload_from_file explicitly requires an IO[Bytes] object, and a subprocess.Popen does not fit that interface (as you noted). Adapting the implementing not to use tell() is not likely to happen, instead your fix is the right approach for such types. However, if you want to upload the output of a command, you should write it to disk and then use upload_from_filename
Ah oops, sorry for wasting your time: I was looking at the wrong thing. I'm going to close it, since it looks like this will work correctly with the next release.
One minor suggestion: The doc comment for upload_from_file maybe should be updated to indicate that if you want a streaming upload, you must set chunk_size in the Blob constructor. That isn't super obvious if you just read the docs. I would typically assume that an API that takes a file object will do something "in chunks" by default, but it looks like this will attempt to read the entire thing in one go into memory, which is what I'm trying to avoid (I'm generating a fairly large output).
@evanj Thanks for the suggestion! I recently dug into the very old implementation and was less than happy about the amount of things going on under the covers.
If you'd like a more explicit interface, I recommend checking out what we use under the covers (source and docs).
upload_from_file
attempts to detect the size of the file by executingos.fstat(file_obj.fileno())
(
google/cloud/storage/blob.py:929
https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/storage/google/cloud/storage/blob.py#L929)For a
subprocess.Popen
pipe, this returns a stat structure withst_size
set to0
. For some miracle that I didn't track down, this appears to cause the library to upload the output in a single chunk, rather than uploading a zero length file, which is a good failure mode to have!However, I would expect this to fail with the usual
total bytes could not be determined
error, and to use a chunked upload once I fix that. Additionally: The streaming code relies on a file implementingfile.tell()
, which Popen pipes do not: It raisesIOError Illegal seek
. The code seems to only use tell() for the current position, so this could be fixed by having the code maintain the position itself.I worked around this bug by wrapping the Popen object in my own object that implements
read
andtell
and it seems to work.To reproduce:
Expected result:
A ValueError instructing the user to specify a chunk size.
Actual result:
The entire file gets uploaded in a single go (I think).
System Details:
OS: Mac OS X 10.11.6
Python: Python 2.7.10
google-cloud-storage version 1.1.1
The text was updated successfully, but these errors were encountered: