Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset.dims should return a set, not a dict of sizes #8496

Open
Tracked by #217
TomNicholas opened this issue Nov 30, 2023 · 9 comments
Open
Tracked by #217

Dataset.dims should return a set, not a dict of sizes #8496

TomNicholas opened this issue Nov 30, 2023 · 9 comments

Comments

@TomNicholas
Copy link
Member

TomNicholas commented Nov 30, 2023

What is your issue?

This is inconsistent:

In [25]: ds
Out[25]: 
<xarray.Dataset>
Dimensions:  (x: 1, y: 2)
Dimensions without coordinates: x, y
Data variables:
    a        (x, y) int64 0 1

In [26]: ds['a'].dims
Out[26]: ('x', 'y')

In [27]: ds['a'].sizes
Out[27]: Frozen({'x': 1, 'y': 2})

In [28]: ds.dims
Out[28]: Frozen({'x': 1, 'y': 2})

In [29]: ds.sizes
Out[29]: Frozen({'x': 1, 'y': 2})

Surely ds.dims should return something like a Frozenset({'x', 'y'})? (because dimension order is meaningless when you have multiple arrays underneath - see #8498)

@TomNicholas TomNicholas added the needs triage Issue that has not been reviewed by xarray team member label Nov 30, 2023
@TomNicholas TomNicholas added API design and removed needs triage Issue that has not been reviewed by xarray team member labels Nov 30, 2023
@max-sixty
Copy link
Collaborator

max-sixty commented Nov 30, 2023

I agree — and given there is a clear alternative to ds.dims — ds.sizes, it would be reasonable to put a FutureWarning on it and then change it in a few versions.

(I thought there might be an issue for this but couldn't find one...)

@TomNicholas
Copy link
Member Author

TomNicholas commented Dec 1, 2023

This is actually more than 5 minutes of effort: once I add the FutureWarning, then a huge number of warnings will be raised by internals. Some of those warnings then break tests that assert_no_warnings.

To remove these warnings I have to find every location where .dims was called but a dict was intended (but not places where .dims was called and only the keys used). I can't simply grep for that...

@TomNicholas
Copy link
Member Author

TomNicholas commented Dec 1, 2023

Also relevant previous issues (very old): #921 and #1076

@TomNicholas
Copy link
Member Author

TomNicholas commented Dec 1, 2023 via email

@dcherian
Copy link
Contributor

dcherian commented Dec 1, 2023

Yeah that struck me too. It seems like a custom Mapping object would be the way to do. Raise warnings on __getitem__ but not on __iter__.

@TomNicholas
Copy link
Member Author

Oh that's clever! I did not think of that.

@dcherian
Copy link
Contributor

dcherian commented Dec 1, 2023

then a huge number of warnings will be raised by internals.

One insight I've learned form Stephan on this repo is to treat warnings in the test suite as a metric for the impact on users. This one seems pretty big! =)

@TomNicholas
Copy link
Member Author

This one seems pretty big! =)

I hope that's a good thing 😅

@dcherian I think that ds.groupby.dims should also be changed for consistency too FYI.

@TomNicholas
Copy link
Member Author

Are we ready to complete this deprecation cycle? I ask because it would be nice to release DataTree with DataTree.dims able to be start with the new intended behaviour...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants