Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong list of coordinate when a singleton coordinate exists #6196

Open
lanougue opened this issue Jan 26, 2022 · 5 comments
Open

Wrong list of coordinate when a singleton coordinate exists #6196

lanougue opened this issue Jan 26, 2022 · 5 comments

Comments

@lanougue
Copy link

lanougue commented Jan 26, 2022

What happened?

Here is some simple code:

a = xr.DataArray(np.arange(5), dims='x', coords={'x':np.arange(5)})
a = a.assign_coords({'y':1})

Now calling a['x'] or a['x'].coords shows y as a coordinate of x, which is unexpected for me

What did you expect to happen?

I expect that a singleton coordinate of a dataset not to be a coordinate of other coordinates present in the dataset

Minimal Complete Verifiable Example

import xarray as xr
import numpy as np
a = xr.DataArray(np.arange(5), dims='x', coords={'x':np.arange(5)})
a = a.assign_coords({'y':1})
print(a['x'].coords)

returns

Coordinates:
  * x        (x) int64 0 1 2 3 4
    y        int64 1

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:57:06)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 3.12.53-60.30-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 0.19.0
pandas: 1.3.5
numpy: 1.20.3
scipy: 1.6.3
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.5.1.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.10.0
distributed: 2021.10.0
matplotlib: 3.2.2
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 60.5.0
pip: 21.3.1
conda: None
pytest: None
IPython: 7.31.0
sphinx: None

@lanougue lanougue added bug needs triage Issue that has not been reviewed by xarray team member labels Jan 26, 2022
@lanougue lanougue changed the title [Bug]: xr.coords returns wrong list of coords when a singleton coords exists [Bug]: Wrong list of coordinate when a singleton coordinate exists Jan 26, 2022
@TomNicholas
Copy link
Contributor

TomNicholas commented Jan 27, 2022

Thanks for raising this suggestion @lanougue .

I expect that a singleton coordinate of a dataset not to be a coordinate of other coordinates present in the dataset

What do you think it should return? When selecting a coordinate there seem to be only 3 options to me:

  1. Coordinate variables retain all coordinates from the parent object.

This is what it currently does. It's a bit counter-intuitive, but it prioritises keeping coordinates available for use by users. It also means you can pull out a coordinate variable and then use other coordinates to index into it.

  1. Coordinate variables retain no coordinates at all.

We could drop all of them, which makes some sense (why should a coordinate have coordinates?). We can try that out naively by adding a drop_vars call to DataArray._getitem_coord:

def _getitem_coord(self, key):
    from .dataset import _get_virtual_variable

    try:
        var = self._coords[key]
        coord_da = self._replace_maybe_drop_dims(var, name=key)
        return coord_da.drop_vars(coord for coord in coord_da.coords)

which makes your example return

Coordinates:
    *empty*

but this breaks a fair number of tests - 25 failed just in test_dataarray.py (although mostly parametrizations of one test):

FAILED xarray/tests/test_dataarray.py::TestDataArray::test_isel_fancy - AssertionError: Left and right DataArray objects are not identical
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_sel_dataarray - ValueError: One or more of the specified variables cannot be found in this dataset
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_coord_coords - AssertionError: Left and right DataArray objects are not identical
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_coords_non_string - AssertionError: Left and right DataArray objects are not identical
FAILED xarray/tests/test_dataarray.py::TestDataArray::test_coordinate_diff - AssertionError: Left and right DataArray objects are not equal
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmin[False-x0-5-2-None] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmin[False-x1-5-2-None] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmin[False-x2-5-2-1] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmin[False-x3-5-2-1] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmin[False-x4-nan-nan-0] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmin[False-x5-0-1-None] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmax[False-x0-5-2-None] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmax[False-x1-5-2-None] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmax[False-x2-5-2-1] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmax[False-x3-5-2-1] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmax[False-x4-nan-nan-0] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce1D::test_idxmax[False-x5-0-1-None] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmin[False-x0-minindex0-maxindex0-nanindex0] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmin[False-x1-minindex1-maxindex1-nanindex1] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmin[False-x2-minindex2-maxindex2-nanindex2] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmin[False-x3-minindex3-maxindex3-nanindex3] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmax[False-x0-minindex0-maxindex0-nanindex0] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmax[False-x1-minindex1-maxindex1-nanindex1] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmax[False-x2-minindex2-maxindex2-nanindex2] - KeyError: "'x' is not a coordinate variable."
FAILED xarray/tests/test_dataarray.py::TestReduce2D::test_idxmax[False-x3-minindex3-maxindex3-nanindex3] - KeyError: "'x' is not a coordinate variable."
  1. *Coordinate variables retain only coordinates with which they share some dimensions

You might say "well in my example 'y' doesn't depend on dimension 'x', so why is it still there? We should keep the 'x' coord and drop the 'y' one." But I don't really think this option makes sense if you think of all coordinates as labels: the 'y' coord is a label for all values in the DataArray, that just happens to be independent of 'x'. Why drop one and not the other?


I agree this is a little counterintuitive, but it is actually causing you problems? Because I think there is a rationale for the current behaviour, and changing it would be a breaking change. In other words, it's not obvious to me that this is a bug rather than an implicit feature.

(One might also argue that if xr.Variable were first-class public API (as in our roadmap), then da.coords[var] should return a Variable, which has no coordinates. At the moment da._coords[key] returns either a Variable or IndexVariable, which then gets wrapped back up into a new DataArray.)

Interested to hear what you think though.

@TomNicholas TomNicholas added API design and removed needs triage Issue that has not been reviewed by xarray team member labels Jan 27, 2022
@lanougue
Copy link
Author

lanougue commented Feb 1, 2022

Thanks for the enlightening.

Actually, this coordinates dependency with singleton dimension caused me a problem when using the to_netcdf() function. No problem playing whith the xr.Dataset but I get some error when trying to write on disk using to_netcdf().
For now, I wasn't able to reproduce a minimalist example because the error disappears with minimalist example. I wasn't able to find the fundamental difference between the dataset causing the error and the minimalist one. Printing them are exactly the same. I have to do deeper inspection.

Concerning the philosophy of what a coordinate should be: For me the "label" idea is understandable at a dataset level. A singleton dimension become a (shared) "label' for the whole dataset. This is ok for me. However, I do not understand why it should also be a "label" of the other coordinates of the dataset. A singleton dimension should not be "more important" than the other (not singleton) dimensions. Why the singleton dimension should become a "label" of another dimension while the other dimensions are not. This do not seem logical to me.

@TomNicholas
Copy link
Contributor

For now, I wasn't able to reproduce a minimalist example because the error disappears with minimalist example. I wasn't able to find the fundamental difference between the dataset causing the error and the minimalist one. Printing them are exactly the same.

Can you perhaps use ncdump to show the differences?

However, I do not understand why it should also be a "label" of the other coordinates of the dataset. A singleton dimension should not be "more important" than the other (not singleton) dimensions. Why the singleton dimension should become a "label" of another dimension while the other dimensions are not. This do not seem logical to me.

I'm not quite sure I understand what you mean - currently aren't all coordinates reattached to any coordinate extracted from a DataArray? (so long as they have compatible dimensions)

@lanougue
Copy link
Author

lanougue commented Mar 1, 2023

Hello @TomNicholas ,

Reopening this issue 1 year later ! To answer your last question, singleton dimension seems to have, indeed, a unique behavior since they are reattached systematically to other coordinates (even if they naturally do not share any dimension with other coordinates).
These singleton dimensions introduce some strange behavior. This is another example:

a = xr.DataArray(np.random.rand(2,3,2), dims=('x','y','z'), coords={'x':[1,2], 'y':[3,4,5],'z':['0','1']})
b = xr.DataArray(np.random.rand(2,3,2), dims=('x','y','z'), coords={'x':[1,2], 'y':[3,4,5],'z':['0','1']})
res1 = a.sel(z='0')/b
res2 = a.sel(z='0').expand_dims('z')/b

res1 and res2 do not have the same size on dimension "z". In res1, dimension "z" is not considered anymore as a dimension at all !

@dcherian
Copy link
Contributor

dcherian commented Mar 1, 2023

Yes when you index out a single location z becomes a scalar (similar with numpy). if you want to keep the dimension use .sel(z=['0']).

I expect that a singleton coordinate of a dataset not to be a coordinate of other coordinates present in the dataset

Xarray only checks if a coordinate variable's dimensions are a subset of the extracted DataArray dimensions. for a scalar, .dims is an empty set, and an empty set is a subset of every set.

@dcherian dcherian changed the title [Bug]: Wrong list of coordinate when a singleton coordinate exists Wrong list of coordinate when a singleton coordinate exists Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants