Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nested dictionaries in the Zarr backend (#3517) #3526

Closed
wants to merge 1 commit into from

Conversation

eddienko
Copy link

@eddienko eddienko commented Nov 13, 2019

Closes #3517.

I have tried to touch the minimum code as possible while leaving the defaults as they are now. The code works now by redefining to value to save as a list containing the dictionary. However when writing to disk the the dictionary is saved correctly, without the list (I do not understand 100% how it works but it does so someone double check!).

$ cat test.zarr/.zattrs
{
    "foo": {
        "bar": {
            "val": 3
        }
    }
}

@eddienko eddienko force-pushed the master branch 2 times, most recently from 170ba35 to 1ad46e7 Compare November 14, 2019 10:29
@@ -198,6 +198,9 @@ def check_attr(name, value):
"serialization to netCDF files".format(name)
)

if isinstance(value, dict) and ("zarr" in backend):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. This looks like we are inventing a convention to store nested dictionaries; is that right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same, but it is not the case. The nested dictionary is stored correctly in the .zattrs file, and by correctly I mean in the same way that using Zarr would store them. If we use Zarr directly to read the attributes I get a nested dictionary.

@rabernat
Copy link
Contributor

I am slammed right now with a proposal and can't really spend time on this, so I'll tag @jakirkham and @alimanfoo who can provide a zarr perspective on what is going on here.

@jakirkham
Copy link

Yeah this probably works as these are just JSON files. That said, IDK that we are making any attempt to ensure this works. IOW I don't think this is tested or in the spec.

Additionally IDK that we do the same decoding on nested dictionaries as would be done on a flat dictionary. Meaning non-JSON values like datetime64/timedelta64 might not be handled correctly in this case.

Could be wrong about these things. Those are just my immediate thoughts.

@alimanfoo
Copy link
Contributor

FWIW in the Zarr Python implementation I don't think we do any special encoding or decoding of attribute values. Whatever value is given then gets serialised using the built-in json.dumps. This means I believe that if someone provides a dict as an attribute value then that will get serialised as a JSON object, and get deserialised back to a dict, although this is not something we test for currently.

From the zarr v2 spec point of view I think anything goes in the .zattrs file, as long as .zattrs is a JSON object at the root.

Hth.

@dcherian
Copy link
Contributor

Whatever value is given then gets serialised using the built-in json.dumps. This means I believe that if someone provides a dict as an attribute value then that will get serialised as a JSON object,

@eddienko I think this means we can avoid converting to a list?

@eddienko
Copy link
Author

mmm, I am not sure I understand. Currently

ds = xr.Dataset()
ds.attrs['foo'] = {'bar': 1}
ds.to_zarr('foo.zarr', mode='w')

does not work. This is because dict is not allowed in the list of types that the attributes are validated against. My feeling is this should not produce an error.

This PR just addresses that so the code above works. The following works as well:

ds = xr.open_zarr('foo.zarr')
ds.attrs['foo']
{'bar': 1}

@jhamman
Copy link
Member

jhamman commented Jan 31, 2023

Bring back this old issue since I think there is value in getting something like this in. My suggestion would be to write a zarr-specific _validate_attrs function and use that, rather than stubbing in a special case for zarr in the generic one. @eddienko , I know this has sat for a long time but would you like to try to finish this up?

@jhamman
Copy link
Member

jhamman commented Sep 14, 2023

closing as stale. Folks should feel free to open a new PR to address the issue referenced above.

@jhamman jhamman closed this Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-backends topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for nested dictionaries in atttributes for Zarr
6 participants