Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError when trying to encode time variable in a NetCDF file with CF convensions #3739

Closed
avatar101 opened this issue Jan 31, 2020 · 7 comments

Comments

@avatar101
Copy link

# Imports
import numpy as np
import xarray as xr
import pandas as pd
from glob import glob

# files to be concatenated
files = sorted(glob(path + str(1988) + '/V250*'))
# corrected dates
dates = pd.date_range(start=str(yr), end=str(yr+1), freq='6H', closed='left')

ds_test = xr.open_mfdataset(files[:10], combine='nested', concat_dim='time', decode_cf=False)
# correcting time
ds_test.time.values=dates[:10]
# fixing encoding
ds_test.time.attrs['units'] = "Seconds since 1970-01-01 00:00:00"

# preview of the time variable
print(ds_test.time)

> <xarray.DataArray 'time' (time: 10)>
array(['1988-01-01T00:00:00.000000000', '1988-01-01T06:00:00.000000000',
       '1988-01-01T12:00:00.000000000', '1988-01-01T18:00:00.000000000',
       '1988-01-02T00:00:00.000000000', '1988-01-02T06:00:00.000000000',
       '1988-01-02T12:00:00.000000000', '1988-01-02T18:00:00.000000000',
       '1988-01-03T00:00:00.000000000', '1988-01-03T06:00:00.000000000'],
      dtype='datetime64[ns]')
Coordinates:
  * time     (time) datetime64[ns] 1988-01-01 ... 1988-01-03T06:00:00
Attributes:
    calendar:       proleptic_gregorian
    standard_name:  time
    units:          Seconds since 1970-01-01 00:00:00

ds_test.to_netcdf(path+'test.nc')

>ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'.
 This is probably an encoding field used by xarray to describe how a variable is serialized. 
To proceed, remove this key from the variable's attributes manually.




Expected Output

Correctly encode time such that it saves the file by correctly converting value of time according to the reference units. I have the flexibility of dropping CF-conventions as long as time values are correct but it would also be nice to have a solution which keeps the CF-conventions intact.

Problem Description

I'm trying to concatenate netcdf files which have CF conventions mentioned in their global attributes. These files have an incorrect time dimension which I try to fix with the code above. It seems that some existing encoding is preventing from writing the files back. But when I print the encoding, it doesn't show any such clashing units. I'm not sure if this is a bug or a wrong usage issue. Thus, any usage help on how to correctly encode time such that it saves the time values by correctly converting according to the reference units is much appreciated.

# More diagnostics on the encoding
print(ds_test.encoding)
>{'unlimited_dims': {'time'},
 'source': '/file/to/path/V250_19880101_00'}

# checking any existing time
print(ds_test.time.encoding)
>{}

# another try on setting time encoding
ds_test.time.encoding['units'] = "Seconds since 1970-01-01 00:00:00"
# writing the file gives the same ValueError as above
ds_test.to_netcdf(path+'test.nc')

# ncdump output of one of the files
>netcdf V250_19880101_06 {
dimensions:
	lon = 720 ;
	lat = 361 ;
	lev = 1 ;
	time = UNLIMITED ; // (1 currently)
variables:
	float lon(lon) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:standard_name = "longitude" ;
		lon:axis = "X" ;
	float lat(lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:standard_name = "latitude" ;
		lat:axis = "Y" ;
	float lev(lev) ;
		lev:long_name = "hybrid level at layer midpoints" ;
		lev:units = "level" ;
		lev:standard_name = "hybrid_sigma_pressure" ;
		lev:positive = "down" ;
		lev:formula = "hyam hybm (mlev=hyam+hybm*aps)" ;
		lev:formula_terms = "ap: hyam b: hybm ps: aps" ;
	float time(time) ;
		time:units = "hours since 1988-01-01 06:00:00" ;
		time:calendar = "proleptic_gregorian" ;
		time:standard_name = "time" ;
	float V(time, lev, lat, lon) ;
		V:long_name = "unknown (please add with NCO)" ;
		V:units = "unknown (please add with NCO)" ;
		V:_FillValue = -999.99f ;

// global attributes:
		:Conventions = "CF" ;
		:constants_file_name = "P19880101_06" ;
		:institution = "IACETH" ;
		:lonmin = -180.f ;
		:lonmax = 179.5f ;
		:latmin = -90.f ;
		:latmax = 90.f ;
		:levmin = 250.f ;
		:levmax = 250.f ;
		:history = "Fri Sep  6 15:59:17 2019: ncatted -a units,time,o,c,hours since 1988-01-01 06:00:00 -a standard_name,time,o,c,time V250_19880101_06" ;
		:NCO = "4.7.2" ;
data:

 time = 6 ;
}

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.0.0-23-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.1

xarray: 0.13.0
pandas: 0.25.3
numpy: 1.18.1
scipy: 1.3.2
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.9.2
distributed: 2.9.3
matplotlib: 3.1.0
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 44.0.0.post20200106
pip: 19.3.1
conda: None
pytest: None
IPython: 7.11.1
sphinx: None

@rabernat
Copy link
Contributor

Hi @avatar101 - thanks for your issue!

I couldn't fully reproduce your example, since it references files that I don't have access to. For the future, please consider trying to create a "Minimal, Complete and Verifiable Example" (MCVE): (http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)

In the meantime, I do have a suggestion you could try. Instead of

ds_test.time.attrs['units'] = "Seconds since 1970-01-01 00:00:00"

try

ds_test.time.encoding['units'] = "Seconds since 1970-01-01 00:00:00"

The reason is that, since you have created your time coordinate using pd.date_range, which returns a datetime64 dtype, that variable is already considered to be encoded.

@avatar101
Copy link
Author

Hi Ryan, thanks for your reply. Apologies for not creating a reproducible problem earlier as the files weren't created by xarray routine. Please find my attempt at reproducing the problem below:

Minimum steps to reproduce the error

import numpy as np
import xarray as xr
import pandas as pd

data1 = np.ones(shape=(1, 181, 360))
lats=np.arange(-90,91, 1)
lons=np.arange(-180,180,1)
time1 = np.array([0])

# creating the first dataset
da_1 = xr.DataArray(data1, coords=[time1, lats, lons], dims=['time', 'lats', 'lons'])
da_1.time.attrs['units'] = "hours since 1988-01-01 00:00:00"
da_1.time.attrs['calendar'] = "proleptic_gregorian"
da_1.time.attrs['standard_name'] = "time"
ds_1 = xr.Dataset({'V':da_1})
ds_1.attrs['Conventions'] = 'CF'
ds_1.to_netcdf('ds_1.nc', encoding=None)

# creating second test dataset
time2=np.array([6])  # wrong time value
da_2 = xr.DataArray(data1, coords=[time2, lats, lons], dims=['time', 'lats', 'lons'])
da_2.time.attrs['units'] = "hours since 1988-01-01 06:00:00"
da_2.time.attrs['calendar'] = "proleptic_gregorian"
da_2.time.attrs['standard_name'] = "time"

ds_2 = xr.Dataset({'V':da_2})
ds_2.attrs['Conventions'] = 'CF'
# saving it with wrong time value
ds_2.to_netcdf('ds_2.nc', encoding=None)


# Reading the 2 files and concatenating them
files = ['/path/to/ds_1.nc', '/path/to/ds_2.nc']

ds_test = xr.open_mfdataset(files, combine='nested', concat_dim='time', decode_cf=False)
yr = 1988 # year
dates = pd.date_range(start=(yr), end=str(yr+1), freq='6H', closed='left')
ds_test.time.values=dates[:2] # fixing the time values 
ds_test.time.attrs['units'] = "Seconds since 1970-01-01 00:00:00"  #required encoding
ds_test.to_netcdf('ds_1_2.nc')  # gives the same error

ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

I've also mentioned your suggestion earlier in the original post. It also gives the same error message
Please find the following reproducible steps incorporating your suggestion.

Trying time encoding solution

# Reading the files 
files = ['/path/to/ds_1.nc', '/path/to/ds_2.nc']

ds_test = xr.open_mfdataset(files, combine='nested', concat_dim='time', decode_cf=False)
yr = 1988 # year
dates = pd.date_range(start=(yr), end=str(yr+1), freq='6H', closed='left')
ds_test.time.values=dates[:2] # fixing the time values 
# encoding try
ds_test.time.encoding['units'] = "Seconds since 1970-01-01 00:00:00"
ds_test.to_netcdf('ds_1_2.nc')  # gives same error

ValueError: failed to prevent overwriting existing key calendar in attrs on variable 'time'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually.

@Chan-Jer
Copy link

hey,avatar, thanks for your examples. I'm wondering that problem has figured out or not . I have met the same problem.
I'm appreciated for any help.

@mathause
Copy link
Collaborator

The following works (using assign_coords() instead of time.values=). I think you also forgot a str in date_range.

import numpy as np
import xarray as xr
import pandas as pd

files = ['ds_1.nc', 'ds_2.nc']

ds_test = xr.open_mfdataset(files, combine='nested', concat_dim='time', decode_cf=False)
yr = 1988 # year
dates = pd.date_range(start=str(yr), end=str(yr+1), freq='6H', closed='left')
ds_test.assign_coords(time=dates[:2])
ds_test.time.encoding['units'] = "seconds since 1970-01-01 00:00:00"
ds_test.time.encoding['calendar'] = "proleptic_gregorian"
ds_test.to_netcdf('ds_1_2.nc')

The following works as well:

ds_test = xr.open_mfdataset(files, combine='nested', concat_dim='time')
ds_test.time.encoding['units'] = "seconds since 1970-01-01 00:00:00"
ds_test.to_netcdf('ds_1_2.nc')

However, what does indeed not work is the following

ds_test = xr.open_mfdataset(files, combine='nested', concat_dim='time')
ds_test.time.attrs['units'] = "seconds since 1970-01-01 00:00:00"
ds_test.to_netcdf('ds_1_2.nc')

which I don't entirely understand, because ds_test.time.encoding is empty. So maybe there is an encoding hidden somewhere, but I couldn't find it.

@avatar101
Copy link
Author

@mathause

thanks for your suggestions. Your first solution works fine for correcting the time data stored in the array. I also don't understand why ds_test.time.encoding is empty and yet, its the reason for an error while saving it. Maybe its a bug?

@Chan-Jer Another work around which I used was to set the correct time value using cdo's settime function.

@bonnland
Copy link

I got a similar error message when opening a Zarr store with datetime64 time values, where I tried to set the "calendar" attribute on the time axis (the attribute was unset in the original store). I've found some xarray code that appears to treat the "calendar" and "units" time attributes as special, and it essentially prevents users from setting or changing these values, even when those values are not present:

From xarray/coding/times.py:

class CFDatetimeCoder(VariableCoder):
    def __init__(self, use_cftime=None):
        self.use_cftime = use_cftime

    def encode(self, variable, name=None):
        dims, data, attrs, encoding = unpack_for_encoding(variable)
        if np.issubdtype(data.dtype, np.datetime64) or contains_cftime_datetimes(
            variable
        ):
            (data, units, calendar) = encode_cf_datetime(
                data, encoding.pop("units", None), encoding.pop("calendar", None)
            )
            safe_setitem(attrs, "units", units, name=name)
            safe_setitem(attrs, "calendar", calendar, name=name)

        return Variable(dims, data, attrs, encoding)

@kmuehlbauer
Copy link
Contributor

The initial issue seems to be sorted out, by overwriting/setting the needed keys in encoding instead of attrs. If you still have issues with this, please open a new issue with MCVE. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants