Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time encoding fails for subdaily frequencies and days since #8271

Closed
4 tasks done
larsbuntemeyer opened this issue Oct 4, 2023 · 3 comments · Fixed by #8272
Closed
4 tasks done

time encoding fails for subdaily frequencies and days since #8271

larsbuntemeyer opened this issue Oct 4, 2023 · 3 comments · Fixed by #8272

Comments

@larsbuntemeyer
Copy link

larsbuntemeyer commented Oct 4, 2023

What happened?

This is my example, that doesn't work since v2023.09.0:

import numpy as np
import pandas as pd
import xarray as xr

time = pd.date_range("1970-01-01", "1970-01-31", freq="6h")
ds = xr.Dataset(coords=dict(time=time))

units = "days since 1960-01-01 00:00:00"
calendar = "gregorian"
encoding = dict(time=dict(units=units, calendar=calendar, dtype=np.dtype("float64")))

ds.to_netcdf("test.nc", encoding=encoding)

! ncdump -v time test.nc

This gives the following output:

netcdf test {
dimensions:
	time = 121 ;
variables:
	double time(time) ;
		time:_FillValue = NaN ;
		time:units = "days since 1960-01-01" ;
		time:calendar = "gregorian" ;
data:

 time = 3653, 3653, 3653, 3653, 3654, 3654, 3654, 3654, 3655, 3655, 3655, 
    3655, 3656, 3656, 3656, 3656, 3657, 3657, 3657, 3657, 3658, 3658, 3658, 
    3658, 3659, 3659, 3659, 3659, 3660, 3660, 3660, 3660, 3661, 3661, 3661, 
    3661, 3662, 3662, 3662, 3662, 3663, 3663, 3663, 3663, 3664, 3664, 3664, 
    3664, 3665, 3665, 3665, 3665, 3666, 3666, 3666, 3666, 3667, 3667, 3667, 
    3667, 3668, 3668, 3668, 3668, 3669, 3669, 3669, 3669, 3670, 3670, 3670, 
    3670, 3671, 3671, 3671, 3671, 3672, 3672, 3672, 3672, 3673, 3673, 3673, 
    3673, 3674, 3674, 3674, 3674, 3675, 3675, 3675, 3675, 3676, 3676, 3676, 
    3676, 3677, 3677, 3677, 3677, 3678, 3678, 3678, 3678, 3679, 3679, 3679, 
    3679, 3680, 3680, 3680, 3680, 3681, 3681, 3681, 3681, 3682, 3682, 3682, 
    3682, 3683 ;
}

It seems like the subdaily fraction is truncated. Note, that this does not happend, if i set the units to the start of the time range, e.g., units = "days since 1970-01-01 00:00:00". This results correctly in

netcdf test {
dimensions:
	time = 121 ;
variables:
	double time(time) ;
		time:_FillValue = NaN ;
		time:units = "days since 1970-01-01" ;
		time:calendar = "gregorian" ;
data:

 time = 0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 2.75, 3, 3.25, 
    3.5, 3.75, 4, 4.25, 4.5, 4.75, 5, 5.25, 5.5, 5.75, 6, 6.25, 6.5, 6.75, 7, 
    7.25, 7.5, 7.75, 8, 8.25, 8.5, 8.75, 9, 9.25, 9.5, 9.75, 10, 10.25, 10.5, 
    10.75, 11, 11.25, 11.5, 11.75, 12, 12.25, 12.5, 12.75, 13, 13.25, 13.5, 
    13.75, 14, 14.25, 14.5, 14.75, 15, 15.25, 15.5, 15.75, 16, 16.25, 16.5, 
    16.75, 17, 17.25, 17.5, 17.75, 18, 18.25, 18.5, 18.75, 19, 19.25, 19.5, 
    19.75, 20, 20.25, 20.5, 20.75, 21, 21.25, 21.5, 21.75, 22, 22.25, 22.5, 
    22.75, 23, 23.25, 23.5, 23.75, 24, 24.25, 24.5, 24.75, 25, 25.25, 25.5, 
    25.75, 26, 26.25, 26.5, 26.75, 27, 27.25, 27.5, 27.75, 28, 28.25, 28.5, 
    28.75, 29, 29.25, 29.5, 29.75, 30 ;
}

What did you expect to happen?

I expect subdaily frequencies to be encoded correctly also if the units startdate is different from the startdate of the time axis, e.g. v2023.08.0 correctly preserves fractions:

netcdf test {
dimensions:
	time = 121 ;
variables:
	double time(time) ;
		time:_FillValue = NaN ;
		time:units = "days since 1960-01-01" ;
		time:calendar = "gregorian" ;
data:

 time = 3653, 3653.25, 3653.5, 3653.75, 3654, 3654.25, 3654.5, 3654.75, 3655, 
    3655.25, 3655.5, 3655.75, 3656, 3656.25, 3656.5, 3656.75, 3657, 3657.25, 
    3657.5, 3657.75, 3658, 3658.25, 3658.5, 3658.75, 3659, 3659.25, 3659.5, 
    3659.75, 3660, 3660.25, 3660.5, 3660.75, 3661, 3661.25, 3661.5, 3661.75, 
    3662, 3662.25, 3662.5, 3662.75, 3663, 3663.25, 3663.5, 3663.75, 3664, 
    3664.25, 3664.5, 3664.75, 3665, 3665.25, 3665.5, 3665.75, 3666, 3666.25, 
    3666.5, 3666.75, 3667, 3667.25, 3667.5, 3667.75, 3668, 3668.25, 3668.5, 
    3668.75, 3669, 3669.25, 3669.5, 3669.75, 3670, 3670.25, 3670.5, 3670.75, 
    3671, 3671.25, 3671.5, 3671.75, 3672, 3672.25, 3672.5, 3672.75, 3673, 
    3673.25, 3673.5, 3673.75, 3674, 3674.25, 3674.5, 3674.75, 3675, 3675.25, 
    3675.5, 3675.75, 3676, 3676.25, 3676.5, 3676.75, 3677, 3677.25, 3677.5, 
    3677.75, 3678, 3678.25, 3678.5, 3678.75, 3679, 3679.25, 3679.5, 3679.75, 
    3680, 3680.25, 3680.5, 3680.75, 3681, 3681.25, 3681.5, 3681.75, 3682, 
    3682.25, 3682.5, 3682.75, 3683 ;
}

Minimal Complete Verifiable Example

import numpy as np
import pandas as pd
import xarray as xr

time = pd.date_range("1970-01-01", "1970-01-31", freq="6h")
ds = xr.Dataset(coords=dict(time=time))

units = "days since 1960-01-01 00:00:00"
calendar = "gregorian"
encoding = dict(time=dict(units=units, calendar=calendar, dtype=np.dtype("float64")))

ds.to_netcdf("test.nc", encoding=encoding)

! ncdump -v time test.nc

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

This is still an issue in the current main.

Environment

INSTALLED VERSIONS

commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:39:40) [Clang 15.0.7 ]
python-bits: 64
OS: Darwin
OS-release: 22.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.2

xarray: 2023.9.1.dev12+gd5f17858
pandas: 2.1.1
numpy: 1.24.4
scipy: 1.11.3
netCDF4: 1.6.4
pydap: installed
h5netcdf: 1.2.0
h5py: 3.9.0
Nio: None
zarr: 2.16.1
cftime: 1.6.2
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
iris: 3.7.0
bottleneck: 1.3.7
dask: 2023.9.3
distributed: 2023.9.3
matplotlib: 3.8.0
cartopy: 0.22.0
seaborn: 0.13.0
numbagg: 0.2.2
fsspec: 2023.9.2
cupy: None
pint: 0.20.1
sparse: 0.14.0
flox: 0.7.2
numpy_groupies: 0.10.2
setuptools: 68.2.2
pip: 23.2.1
conda: None
pytest: 7.4.2
mypy: None
IPython: 8.16.1
sphinx: None

@spencerkclark
Copy link
Member

Yikes! Sorry I did not catch this in reviewing #7827 and #8201. See #8272 for what I think should be a fix. Thanks for the report.

@kmuehlbauer
Copy link
Contributor

Thanks @larsbuntemeyer for catching and reporting this issue.

@larsbuntemeyer
Copy link
Author

Wow, thanks a lot for the quick fix @spencerkclark !

@dcherian dcherian removed the needs triage Issue that has not been reviewed by xarray team member label Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants