Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.to_zarr with datetime64[ns] #2265

Closed
NickMortimer opened this issue Jul 3, 2018 · 4 comments
Closed

.to_zarr with datetime64[ns] #2265

NickMortimer opened this issue Jul 3, 2018 · 4 comments

Comments

@NickMortimer
Copy link

NickMortimer commented Jul 3, 2018

Hi I've noticed a possible inconsistency with datetime storing.

t=xr.open_dataset(files[0])
t['JULD_LOCATION'][0]
<xarray.DataArray 'JULD_LOCATION' ()>
array('2008-07-29T20:20:58.000000000', dtype='datetime64[ns]')
Attributes:
    long_name:    Julian day (UTC) of the location relative to REFERENCE_DATE...
    conventions:  Relative julian days with decimal part (as parts of day)
    resolution:   0.0

t.to_zarr(r'D:\argo\argo2.zarr',mode='w')

za =zarr.open(r'D:\argo\argo2.zarr',mode='w+')
za['JULD_LOCATION'].info
Out[442]: 
Name               : /JULD_LOCATION
Type               : zarr.core.Array
Data type          : float64
Shape              : (197,)
Chunk shape        : (197,)
Order              : C
Read-only          : False
Compressor         : Zlib(level=1)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 1576 (1.5K)
No. bytes stored   : 2000 (2.0K)
Storage ratio      : 0.8
Chunks initialized : 1/1

if I try this

za['JULD_LOCATION1'] =t['JULD_LOCATION']
za['JULD_LOCATION1'].info
Out[444]: 
Name               : /JULD_LOCATION1
Type               : zarr.core.Array
Data type          : datetime64[ns]
Shape              : (197,)
Chunk shape        : (197,)
Order              : C
Read-only          : False
Compressor         : Zlib(level=1)
Store type         : zarr.storage.DirectoryStore
No. bytes          : 1576 (1.5K)
No. bytes stored   : 1742 (1.7K)
Storage ratio      : 0.9
Chunks initialized : 1/1

There also seems to be a problem with the actual values stored are different using the two methods

pd.to_datetime(za['JULD_LOCATION'][0])
Timestamp('1970-01-01 00:00:00.000021394')

pd.to_datetime(za['JULD_LOCATION1'][0])
Timestamp('2008-07-29 20:20:58')

I think it is to do with the reference date time not being applied?

t1['REFERENCE_DATE_TIME']
Out[459]: 
<xarray.DataArray 'REFERENCE_DATE_TIME' ()>
array(b'19500101000000', dtype='|S14')
Attributes:
    long_name:    Date of reference for Julian days
    conventions:  YYYYMMDDHHMISS

I hope this makes sense

@spencerkclark
Copy link
Member

When writing datetime objects to disk, xarray encodes them following CF conventions (converting them to numerical values quantifying some units of time since a given reference date). This is to support other backends which do not support writing datetime64 objects directly to disk.

For accurate roundtripping of Datasets, xarray also includes logic to automatically decode datetimes stored following CF conventions. For that reason, instead of loading in the raw zarr store using zarr's open function, I recommend using xarray's open_zarr function, which will automatically decode the CF-encoded values to datetime64 objects.

See the following example:

In [1]: import numpy as np

In [2]: import xarray as xr

In [3]: da = xr.DataArray(np.datetime64('2000-01-01'), name='date')

In [4]: da
Out[4]:
<xarray.DataArray 'date' ()>
array('2000-01-01T00:00:00.000000000', dtype='datetime64[ns]')

In [5]: da.to_dataset().to_zarr('example.zarr')
Out[5]: <xarray.backends.zarr.ZarrStore at 0x1109ca190>

In [6]: ds = xr.open_zarr('example.zarr')

In [7]: ds.date
Out[7]:
<xarray.DataArray 'date' ()>
array('2000-01-01T00:00:00.000000000', dtype='datetime64[ns]')

Note if we open the zarr store directly, we'll find that the date was encoded with the integer 0 and appropriate units and calendar attributes:

In [1]: import zarr

In [2]: z = zarr.open('example.zarr')

In [3]: z['date'][...]
Out[3]: array(0)

In [4]: z['date'].attrs['units']
Out[4]: u'days since 2000-01-01 00:00:00'

In [5]: z['date'].attrs['calendar']
Out[5]: u'proleptic_gregorian'

Unlike xarray, zarr does not include logic for automatically decoding CF-encoded datetimes. I hope that helps.

@NickMortimer
Copy link
Author

NickMortimer commented Jul 3, 2018

@spencerkclark yes that helps very much and a great example of how to answer a question! I'm learning so much from this group. Is there a way of appending an xarray dataset onto an existing zarr array? That's why I've been accessing direct through zarr, what I'm trying to do is build a zarr file of all the Argo float profiles and add new ones as they arrive.

@jhamman
Copy link
Member

jhamman commented Jul 3, 2018

Is there a way of appending an xarray dataset onto an existing zarr array?

@NickMortimer - No, not yet. It has been proposed already though in #2022.

@NickMortimer
Copy link
Author

@jhamman thanks I'll add to the discussion there and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants