Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot store data after group_by #2847

Open
volkerjaenisch opened this issue Mar 23, 2019 · 6 comments
Open

Cannot store data after group_by #2847

volkerjaenisch opened this issue Mar 23, 2019 · 6 comments

Comments

@volkerjaenisch
Copy link

Hi Xarray!

I really like your Library. But now I am stuck completely.

Code Sample, a copy-pastable example if possible

import numpy as np
import xarray as xr

data = [1,2,3,4,5,6,7,8,9,10]
bins = np.array(range(5)) * 2
xr_data = xr.Dataset({'data': data})
out = xr_data.groupby_bins('data', bins).mean()
out.to_netcdf('/tmp/test')

Problem description

Get Error :
Traceback (most recent call last):
File "/home/volker/workspace/pycharm-community-2018.1.2/helpers/pydev/pydevd.py", line 1664, in
main()
File "/home/volker/workspace/pycharm-community-2018.1.2/helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/volker/workspace/pycharm-community-2018.1.2/helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/volker/workspace/pycharm-community-2018.1.2/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/volker/workspace/eprofile_wind/eprofile/src/eprofile/sandbox/test_xarray.py", line 12, in
out.to_netcdf('/tmp/test')
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/core/dataset.py", line 1232, in to_netcdf
compute=compute)
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/backends/api.py", line 747, in to_netcdf
unlimited_dims=unlimited_dims)
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/backends/api.py", line 790, in dump_to_store
unlimited_dims=unlimited_dims)
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/backends/common.py", line 261, in store
variables, attributes = self.encode(variables, attributes)
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/backends/common.py", line 347, in encode
variables, attributes = cf_encoder(variables, attributes)
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/conventions.py", line 605, in cf_encoder
for k, v in iteritems(variables))
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/conventions.py", line 605, in
for k, v in iteritems(variables))
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/conventions.py", line 241, in encode_cf_variable
var = ensure_dtype_not_object(var, name=name)
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/conventions.py", line 201, in ensure_dtype_not_object
data = _copy_with_dtype(data, dtype=_infer_dtype(data, name))
File "/home/volker/workspace/eprofile_wind-CRxNsezQ/lib/python3.5/site-packages/xarray/conventions.py", line 139, in _infer_dtype
.format(name))
ValueError: unable to infer dtype on variable 'data_bins'; xarray cannot serialize arbitrary Python objects

Expected Output

The Dataset should be written to file in netcdf

Output of xr.show_versions()

>>> xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516]
python-bits: 64
OS: Linux
OS-release: 4.9.0-8-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: de_DE.UTF-8
LOCALE: de_DE.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.4.1.1

xarray: 0.11.3
pandas: 0.24.1
numpy: 1.16.1
scipy: None
netCDF4: 1.4.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
cyordereddict: None

@dcherian
Copy link
Contributor

Try data = np.array(...)

@spencerkclark
Copy link
Member

Thanks for the issue. I think the main problem is that we currently do not have a way of saving an IntervalIndex, which groupby_bins produces, to a netCDF file:

In [7]: out.indexes['data_bins']
Out[7]:
IntervalIndex([(0, 2], (2, 4], (4, 6], (6, 8]],
              closed='right',
              dtype='interval[int64]')

One way to work around this in the meantime is to redefine the bins coordinate before saving things to a file. See @jhamman's answer to a related StackOverflow question for an example.

@volkerjaenisch
Copy link
Author

Thank you @spencerkclark for the fast response.
I did exactly as you advices and it works fine.
A hint in the documentation may hinder others to fall into this pit.
Also it would be nice to have intervals serialized into netCDF since they are quite common structures.

Cheers,
Volker

@fmaussion
Copy link
Member

A hint in the documentation may hinder others to fall into this pit.

Agreed. Would you like to submit a pull-request?

Also it would be nice to have intervals serialized into netCDF since they are quite common structures.

There are ways to deal with intervals in the CF conventions, but what we really need is a way for xarray to truly understand intervals, which is a much bigger endeavor.

@rabernat
Copy link
Contributor

rabernat commented Mar 26, 2019

In the longer term, we could move towards supporting IntervalIndex as an in-memory representation of CF-conventions' cell bounds concept. In addition to many other benefits, this would allow us to encode and decode such indices from netCDF files.

xref #1475

@rabernat
Copy link
Contributor

Also xref #2844

StefanBrand added a commit to StefanBrand/masterdatacube that referenced this issue May 29, 2020
IntervalIndex is not allowed as coords. More info:
pydata/xarray#2847
StefanBrand added a commit to StefanBrand/masterdatacube that referenced this issue May 29, 2020
IntervalIndex is not allowed as coords. More info:
pydata/xarray#2847
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants