You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importxarrayasxrda=xr.DataArray(['foo'])
ds=da.to_dataset(name='da')
ds.to_zarr('ds') # no special encoding specifiedds=xr.open_zarr('ds')
print(ds.da.values)
The following code prints ['foo'] (string type). The encoding chosen by zarr is "dtype": "|S3", which corresponds to bytes, but it seems to be decoded to a string, which is what we want.
The following code prints
['foo']
(string type). The encoding chosen by zarr is"dtype": "|S3"
, which corresponds to bytes, but it seems to be decoded to a string, which is what we want.The problem is that if I want to append to the zarr archive, like so:
It prints
['foo' 'bar']
. Indeed the encoding was kept as"dtype": "|S3"
, which is fine for a string of 3 characters but not for 6.If I want to specify the encoding with the maximum length, e.g:
It solves the length problem, but now my strings are kept as bytes:
[b'foo' b'barbar']
. If I specify a Unicode encoding:It is not taken into account. The zarr encoding is
"dtype": "|S3"
and I am back to my length problem:['foo' 'bar']
.The solution with
'dtype': '|S6'
is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.The text was updated successfully, but these errors were encountered: