New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set blob metadata #1185

Closed
remcohaszing opened this Issue Oct 15, 2015 · 19 comments

Comments

Projects
7 participants
@remcohaszing

remcohaszing commented Oct 15, 2015

I am trying to set custom metadata on a blob. However, this does not work. I'm using the following script:

#!/usr/bin/env python3
import os

from gcloud import storage

client = storage.Client('super secret id')
bucket = client.get_bucket('super secret bucket')

blob = bucket.blob('kirby.png')
blob.metadata = dict(Color='Pink')
with open(os.path.expanduser('~/Pictures/kirby.png'), 'rb') as img_data:
    blob.upload_from_file(img_data)

When retrieving the blob or inspecing it in the developer console, the metadata is still unset.

@tseaver

This comment has been minimized.

Contributor

tseaver commented Oct 15, 2015

Setting the metadata locally does not cause the instance to make the API call to patch it on the server, and we don't send metadata along with the upload_from_file() call. You need to add a call to blob.patch() to persist the changes to blob.metadata.

@remcohaszing

This comment has been minimized.

remcohaszing commented Oct 15, 2015

I modified the script:

#!/usr/bin/env python3
import os

from gcloud import storage

client = storage.Client('dc-app-api')
bucket = client.get_bucket('appsemble-dev')

blob = bucket.blob('kirby.png')
with open(os.path.expanduser('~/Pictures/kirby.png'), 'rb') as img_data:
    blob.upload_from_file(img_data)
blob.metadata = {'Color': 'Pink', 'Spam': 'Cheese'}
blob.metadata['Foo'] = 'Bar'
blob.patch()

This sets the Color and Spam metadata fields, but not the Foo field. There is no warning about this whatsoever.

Also there is no documentation on the Blob class on how to update metadata.

@dhermes

This comment has been minimized.

Contributor

dhermes commented Oct 15, 2015

Thanks for reporting. Looking into it now.

@dhermes

This comment has been minimized.

Contributor

dhermes commented Oct 15, 2015

Ahhh. This is because the Blob.metadata getter returns a copy.deepcopy of the dictionary stored, so doing

blob.metadata['Foo'] = 'Bar'

updates the copy, not the dictionary actually stored on the Blob. We should fix this.

@tseaver WDYT?

@tseaver

This comment has been minimized.

Contributor

tseaver commented Oct 15, 2015

To allow mutation of the metadata in-place (as opposed to assigning to it wholesale), we would need a way to detect those mutations, in order to mark it as changed (for a subsequent call to blob.patch() to pick up).

@dhermes

This comment has been minimized.

Contributor

dhermes commented Oct 15, 2015

Right. Custom dict subclass?

@tseaver

This comment has been minimized.

Contributor

tseaver commented Oct 15, 2015

Right, overriding:

  • __delitem__
  • __setitem__
  • clear
  • update
  • setdefault
  • pop
  • popitem
@remcohaszing

This comment has been minimized.

remcohaszing commented Oct 15, 2015

A custom dict class could work for the examples I provided. This would still not make a dict behave as expected tough. Since dicts are mutable, I would expect the following to work:

metadata = {'Holy': 'Grail'}
blob.metadata = metadata
metadata['Ministry'] = 'Silly Walks'
blob.patch()

If the setter of blob.metadata would return this new class, this would no longer refer to the original dict that has been set.

@remcohaszing

This comment has been minimized.

remcohaszing commented Oct 15, 2015

I think this subclass could work if no setter is defined. An empty metadata object could be set in the Blob constructor. This exposes the metadata as if it is a regular dict. This would break with the current behaviour, where setting metadata is allowed.

@dhermes

This comment has been minimized.

Contributor

dhermes commented Oct 15, 2015

The fact that the getter returns a copy is not set in stone. We did it to "protect" the integrity of the actual data own by the class, but didn't think through situations like this.

We don't need a getter, but do need a setter to track changes on the object. Can't have a setter without a getter, so that's just how it's gotta be. (Also, all properties are stored on a non-public attribute called _properties, another reason for a getter.)

@remcohaszing

This comment has been minimized.

remcohaszing commented Oct 15, 2015

I suppose it could look something like this:

class Metadata(dict):
    def __init__(self, blob):
        self._blob = blob

    def __setitem__(self, key, value):
        super(self, Metadata).__setitem__(key, value)
        self._blob._patch_property('metadata', self)

    # XXX Repeat pattern for other modifier functions


class Blob(_PropertyMixin):
    ...

    def __init__(self, ...)
        ...
        self._properties['metadata'] = Metadata(self)

    @property
    def metadata(self):
        return self._properties['metadata']

This marks the metadata as modified if it gets edited and ensures the metadata can't be overridden using a setter.

A side effect is that the metadata property is never None. I personally like that better anyway.

@saxenanurag

This comment has been minimized.

saxenanurag commented Dec 16, 2015

blob.upload_from_file()
blob.metadata = {'luke': 'skywalker'}
blob.patch()

worked for me. I was able to upload a file and apply the meta data after the file upload.

@tseaver

This comment has been minimized.

Contributor

tseaver commented Dec 16, 2015

@saxenanurag I'm in the middle of reviewing how we are using the ex-apitools code (now in gcloud.streaming), and can see how we should be able to send the patched metadata along with the upload request.

@dhermes

This comment has been minimized.

Contributor

dhermes commented Dec 17, 2015

👍 What he said 😀

@saxenanurag

This comment has been minimized.

saxenanurag commented Dec 17, 2015

Thanks.

@pdknsk

This comment has been minimized.

pdknsk commented Mar 24, 2016

I think this is a duplicate of #754. The use of metadata is a bit ambiguous as metadata refers to both the metadata of an object (such as Cache-Control) and the metadata property, meant for custom metadata.

Anyway, to the issue. Any update? The problem with patch is that the object is online briefly with wrong metadata, or less briefly in case of a connection failure between two requests. And, it increases metageneration.

>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.upload_from_string('text')
>>> text.reload()
>>> text.cache_control
>>> text.metageneration
1
>>> text.cache_control = 'no-cache'
>>> text.patch()
>>> text.reload()
>>> text.cache_control
u'no-cache'
>>> text.metageneration
2
@pdknsk

This comment has been minimized.

pdknsk commented Mar 19, 2017

I've updated a previous patch, hack perhaps, which makes it work. It's not tested extensively.

--- a/blob.py
+++ b/blob.py
@@ -494,6 +494,11 @@
 
         # Configure the upload request parameters.
         request = Request(upload_url, 'POST', headers)
+
+        if self._properties:
+            headers['content-type'] = 'application/json'
+            request.body = json.dumps(self._properties)
+
         upload.configure_request(upload_config, request, url_builder)
 
         # Configure final URL

The file google/cloud/storage/blob.py is located in your Python site-packages directory.

>>> text = bucket.blob('file.txt')
>>> text.cache_control = 'no-cache'
>>> text.metadata = {'custom': 'metadata'}
>>> text.upload_from_string('text')
>>> text.reload()
>>> text.cache_control
u'no-cache'
>>> text.metadata
{u'custom': u'metadata'}
>>> text.metageneration
1
>>> text.download_as_string()
'text'
@tseaver

This comment has been minimized.

Contributor

tseaver commented Jun 5, 2017

#3362 passes along all non-ACL blob properties with the (initial or only) upload request.

@tseaver tseaver closed this Jun 5, 2017

@gjkemnitz

This comment has been minimized.

gjkemnitz commented Oct 20, 2018

You can work around this bug by creating a completely new dictionary, assigning it to blob.metadata, and doing blob.patch():

newdict={}
for x in blob.metadata:
    newdict[x] = blob.metadata[x]
    newdict['mynewproperty'] = 'awesome'
blob.metadata=newdict
blob.patch()

Your blob will now have the {'mynewproperty':'awesome'} metadata property in its stored metadata list on GCS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment