Resumable uploads errors if (optional) name is omitted #147

JacobHayes · 2022-03-14T02:11:51Z

Thanks for the great package!

Describe the bug
When initiating a resumable upload, the name query parameter is optional:

This is not required if you included a name in the object metadata file in Step 2.

However, gcp-storage-emulator expects to get some data on the first request (request.data.get("name")), causing:

  File ".../gcp_storage_emulator/handlers/objects.py", line 188, in _create_resumable_upload
    object_id = request.data.get("name")
AttributeError: 'NoneType' object has no attribute 'get'

To Reproduce
A minimal, complete, and reproducible example to reproduce the behavior.

import os

import gcp_storage_emulator.server
import gcsfs

server = gcp_storage_emulator.server.create_server("localhost", 0, in_memory=True)
server.start()
host, port = server._api._httpd.socket.getsockname()
os.environ["STORAGE_EMULATOR_HOST"] = f"http://{host}:{port}"

try:
    gcs_fs = gcsfs.GCSFileSystem()
    gcs_fs.mkdir("test_bucket")
    gcs_fs.touch("test_bucket/file") # Triggers a resumable upload w/o name on first request and errors
finally:
    server.stop()

Or, with curl to a running instance w/ existing bucket:

curl -i -X 'POST' -H 'content-type: application/json' 'http://127.0.0.1:9023/upload/storage/v1/b/test_bucket/o?uploadType=resumable'

Expected behavior
No error, but to redirect to a unique generated upload ID.

System (please complete the following information):

OS version: macos monterey
Python version: 3.9
gcp-storage-emulator version: 2022.2.17

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

oittaa · 2022-03-14T11:14:23Z

Thanks for the great test case! Unfortunately files.pythonhosted.org is currently broken (at least for me) and I can't test properly with gcsfs.

$ pip install gcsfs
...
ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Max retries exceeded with url: /packages/6f/ad/46e12b207068000bfa76ab5e07154d975b0e1977cc0786299344e6c90925/gcsfs-2022.2.0-py2.py3-none-any.whl (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:1129)')))
...

I made a quick fix. #148 Could you test with it and see if gcsfs works as expected with the change?

If you don't want, or can't, build from the source, the latest Docker image follows the main branch.

docker pull oittaa/gcp-storage-emulator
docker run -d --rm \
  -e PORT=9023 \
  -p 9023:9023 \
  --name gcp-storage-emulator \
  oittaa/gcp-storage-emulator

oittaa · 2022-03-14T11:55:15Z

Must be a CDN problem or something, because GitHub doesn't have any issues fetching the packages, but I get some self-signed certificate every time. pypa/pip#10963

JacobHayes · 2022-03-14T15:31:18Z

Thanks for the quick look (and very odd on pypi 🙃)! Now I got a UnboundLocalError: local variable 'object_id' referenced before assignment (and apparently a non-json requests.data) with:

import gcsfs
import json

fs = gcsfs.GCSFileSystem()
fs.mkdir("test")

gcsfs.GCSFileSystem().touch("test/a") # I think this one just doesn't have the name in either the body nor query param
# UnboundLocalError: local variable 'object_id' referenced before assignment

with fs.open("test/b", "w") as file:
    json.dump(5, file) # this just writes the requests data with b"5"
# AttributeError: 'bytes' object has no attribute 'get'

For the optional file name, I'm wondering if GCS just generates a uuid or similar upload_id and gets the name on subsequent requests? It looks like we use the object name from the first request for the upload_id (which seems very reasonable), but maybe it should generate a uuid or something if no name... and I guess store a mapping of {uuid: upload session} sort of thing?

oittaa · 2022-03-14T15:53:51Z

Hmm... From the API documentation I understood that you have to pass either the query parameter or the metadata, but you shouldn't omit both. I need to take a closer look how gcsfs works when the CDN starts working again.

Optionally, create a JSON file that contains the metadata you want to set on the object that you're uploading.

...

OBJECT_NAME is the name you want to give your object. For example, pets/dog.png. This is not required if you included a name in the object metadata file in Step 2.

oittaa · 2022-03-14T21:20:08Z

Ok, I've tried to figure out what's going on. I'm about 99% sure that gcsfs uses a wrong method. They should use PUT method to write the data, but they're using POST. https://cloud.google.com/storage/docs/performing-resumable-uploads#single-chunk-upload, but for some reason the official Google API accepts it anyway, if the request includes upload_id parameter. So I added a workaround for the undocumented Google API behavior. #149

This should work now.

import os

import gcp_storage_emulator.server
import gcsfs
from google.cloud import storage

server = gcp_storage_emulator.server.create_server("localhost", 9023, in_memory=True)
server.start()
host, port = server._api._httpd.socket.getsockname()
os.environ["STORAGE_EMULATOR_HOST"] = f"http://{host}:{port}"

try:
    gcs_fs = gcsfs.GCSFileSystem()
    gcs_fs.mkdir("test_bucket")
    gcs_fs.touch("test_bucket/file")
    with gcs_fs.open("test_bucket/file", "w") as file:
        file.write("testing")

    client = storage.Client()
    bucket = client.bucket("test_bucket")
    for blob in bucket.list_blobs():
        content = blob.download_as_bytes()
        print(f"Blob [{blob.name}]: {content}")
finally:
    server.stop()

JacobHayes · 2022-03-15T03:23:41Z

Wow, things look good on my end too so I'll close out the issue! Thank you for the super quick fixes! 🙌

JacobHayes mentioned this issue Mar 14, 2022

Add GCSFile Storage artigraph/artigraph#149

Merged

JacobHayes closed this as completed Mar 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resumable uploads errors if (optional) name is omitted #147

Resumable uploads errors if (optional) name is omitted #147

JacobHayes commented Mar 14, 2022

oittaa commented Mar 14, 2022

oittaa commented Mar 14, 2022

JacobHayes commented Mar 14, 2022 •

edited

Loading

oittaa commented Mar 14, 2022

oittaa commented Mar 14, 2022

JacobHayes commented Mar 15, 2022

Resumable uploads errors if (optional) name is omitted #147

Resumable uploads errors if (optional) name is omitted #147

Comments

JacobHayes commented Mar 14, 2022

oittaa commented Mar 14, 2022

oittaa commented Mar 14, 2022

JacobHayes commented Mar 14, 2022 • edited Loading

oittaa commented Mar 14, 2022

oittaa commented Mar 14, 2022

JacobHayes commented Mar 15, 2022

JacobHayes commented Mar 14, 2022 •

edited

Loading