Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unlimited_dims generates 0-length dimensions named as letters of unlimited dimension #2134

Closed
dnowacki-usgs opened this issue May 15, 2018 · 5 comments
Labels

Comments

@dnowacki-usgs
Copy link
Contributor

dnowacki-usgs commented May 15, 2018

I'm not sure I understand how the unlimited_dims option to to_netcdf() is supposed to work. Consider the following:

ds = xr.Dataset()
ds['time'] = xr.DataArray(pd.date_range('2000-01-01', '2000-01-10'), dims='time')
ds.to_netcdf('timedim.cdf', unlimited_dims='time')

This results in a file that looks like this:

$ ncdump timedim.cdf
netcdf timedim {
dimensions:
	t = UNLIMITED ; // (0 currently)
	i = UNLIMITED ; // (0 currently)
	m = UNLIMITED ; // (0 currently)
	e = UNLIMITED ; // (0 currently)
	time = UNLIMITED ; // (10 currently)
variables:
	int64 time(time) ;
		time:units = "days since 2000-01-01 00:00:00" ;
		time:calendar = "proleptic_gregorian" ;
data:

 time = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ;
}

Note the dimensions named t, i, m, e all with zero length. The time dimension (which is the only one that should exist) is properly set to UNLIMITED but we shouldn't have the four extra dimensions. What's going on here? The same behavior occurs when setting via ds.encoding['unlimited_dims'] = 'time'. Everything is as expected without the unlimited_dims option (but the time dimension is not UNLIMITED, of course).

I thought it could be related to the variable and dimension having the same name, but this also happens when they are different.

Expected Output

There shouldn't be extra 0-length dimensions

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None

xarray: 0.10.3
pandas: 0.22.0
numpy: 1.14.3
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: 0.5.0
h5py: 2.7.1
Nio: None
zarr: 2.2.0
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.1
distributed: 1.20.2
matplotlib: 2.2.2
cartopy: 0.16.0
seaborn: None
setuptools: 36.5.0.post20170921
pip: 9.0.1
conda: 4.5.3
pytest: None
IPython: 6.3.1
sphinx: 1.7.1

@rabernat
Copy link
Contributor

What if you do unlimited_dims=['time']? It might be expecting a list and then incorrectly parsing the string as a sequence.

@dnowacki-usgs
Copy link
Contributor Author

Yep that does it, thanks! 👍

I guess I could have read the "sequence of str" description in the docs more closely. Maybe it would make sense to accept a single string in addition to a sequence of strings?

@rabernat
Copy link
Contributor

I still think this is a bug. I really don't know the best way to check that an object is a sequence other than a string, but it must be solved elsewhere in xarray.

@shoyer
Copy link
Member

shoyer commented May 16, 2018

We usually write something like:

if isinstance(unlimited_dims, basestring):
    unlimited_dims = [unlimited_dims]

(This does come up quite commonly, but the work-around is short enough that we haven't written a utility function for it.)

@shoyer shoyer added the bug label May 16, 2018
@jhamman jhamman mentioned this issue May 17, 2018
4 tasks
@jhamman
Copy link
Member

jhamman commented May 17, 2018

fix for this in #2154.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants