storage: upload_from_file does not correctly stream Popen pipes #3429

evanj · 2017-05-16T21:17:02Z

upload_from_file attempts to detect the size of the file by executing os.fstat(file_obj.fileno())
(google/cloud/storage/blob.py:929 https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/storage/google/cloud/storage/blob.py#L929)

For a subprocess.Popen pipe, this returns a stat structure with st_size set to 0. For some miracle that I didn't track down, this appears to cause the library to upload the output in a single chunk, rather than uploading a zero length file, which is a good failure mode to have!

However, I would expect this to fail with the usual total bytes could not be determined error, and to use a chunked upload once I fix that. Additionally: The streaming code relies on a file implementing file.tell(), which Popen pipes do not: It raises IOError Illegal seek. The code seems to only use tell() for the current position, so this could be fixed by having the code maintain the position itself.

I worked around this bug by wrapping the Popen object in my own object that implements read and tell and it seems to work.

To reproduce:

import google.cloud.storage
import subprocess
client = google.cloud.storage.Client()
blob = client.bucket('bucket').blob('blob')
p = subprocess.Popen(['ls'], stdout=subprocess.PIPE)
blob.upload_from_file(p.stdout)

Expected result:

A ValueError instructing the user to specify a chunk size.

Actual result:

The entire file gets uploaded in a single go (I think).

System Details:

OS: Mac OS X 10.11.6
Python: Python 2.7.10
google-cloud-storage version 1.1.1

The text was updated successfully, but these errors were encountered:

dhermes · 2017-05-16T21:49:22Z

@evanj Thanks for reporting!

The line you linked to (929) is just for open directly from files. That st_size check is present in the latest released version but has been removed in HEAD (not released yet though).
RE: .tell(), upload_from_file explicitly requires an IO[Bytes] object, and a subprocess.Popen does not fit that interface (as you noted). Adapting the implementing not to use tell() is not likely to happen, instead your fix is the right approach for such types. However, if you want to upload the output of a command, you should write it to disk and then use upload_from_filename

evanj · 2017-05-16T22:01:36Z

Ah oops, sorry for wasting your time: I was looking at the wrong thing. I'm going to close it, since it looks like this will work correctly with the next release.

One minor suggestion: The doc comment for upload_from_file maybe should be updated to indicate that if you want a streaming upload, you must set chunk_size in the Blob constructor. That isn't super obvious if you just read the docs. I would typically assume that an API that takes a file object will do something "in chunks" by default, but it looks like this will attempt to read the entire thing in one go into memory, which is what I'm trying to avoid (I'm generating a fairly large output).

dhermes · 2017-05-16T22:22:05Z

@evanj Thanks for the suggestion! I recently dug into the very old implementation and was less than happy about the amount of things going on under the covers.

If you'd like a more explicit interface, I recommend checking out what we use under the covers (source and docs).

dhermes added the api: storage Issues related to the Cloud Storage API. label May 16, 2017

evanj closed this as completed May 16, 2017

JustinBeckwith assigned evanj Feb 1, 2021

Gwali-1 mentioned this issue Jan 6, 2023

put method for storage throws error after authenticating server as service account nhorvath/Pyrebase4#58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: upload_from_file does not correctly stream Popen pipes #3429

storage: upload_from_file does not correctly stream Popen pipes #3429

evanj commented May 16, 2017 •

edited by dhermes

Loading

dhermes commented May 16, 2017

evanj commented May 16, 2017

dhermes commented May 16, 2017

storage: upload_from_file does not correctly stream Popen pipes #3429

storage: upload_from_file does not correctly stream Popen pipes #3429

Comments

evanj commented May 16, 2017 • edited by dhermes Loading

Expected result:

Actual result:

System Details:

dhermes commented May 16, 2017

evanj commented May 16, 2017

dhermes commented May 16, 2017

evanj commented May 16, 2017 •

edited by dhermes

Loading