Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow some notion of ordering in Dataset dims #8498

Closed
max-sixty opened this issue Nov 30, 2023 · 5 comments
Closed

Allow some notion of ordering in Dataset dims #8498

max-sixty opened this issue Nov 30, 2023 · 5 comments
Labels
API design plan to close May be closeable, needs more eyeballs

Comments

@max-sixty
Copy link
Collaborator

max-sixty commented Nov 30, 2023

What is your issue?

Currently a DataArray's dims are ordered, while a Dataset's are not.

Do we gain anything from have unordered dims in a Dataset? Could we have an ordering without enforcing it on every variable?

Here's one proposal, with fairly wide error-bars:

  • Datasets have a dim order, which is set at construction time or through .transpose
    • Currently .transpose changes the order of each variable's dims, but not the dataset's
    • If dims aren't supplied, we can just use the first variable's
  • Variables don't have to conform to that order — .assign(foo=differently_ordered) maintains the differently ordered dims. So this doesn't limit any current functionality.
  • When there are transformations which change dim ordering, Xarray is "allowed" to transpose variables to the dataset's ordering. Currently Xarray is "allowed" to change dim order arbitrarily — for example to put a core dim last. IIUC, we'd prefer to set a non-arbitrary order, but we don't have one to reference.
    • This would remove a bunch of boilerplate from methods that save the ordering, run .apply_ufunc and then reorder in the original order1

What do folks think?

Footnotes

  1. though also we could do this in .apply_ufunc

@max-sixty max-sixty added needs triage Issue that has not been reviewed by xarray team member API design and removed needs triage Issue that has not been reviewed by xarray team member labels Nov 30, 2023
@TomNicholas
Copy link
Member

My understanding of why we don't have ordered dims on Dataset is that you can't guarantee consistency with all the ordered dims of stored variables. e.g. if 3 variables a, b and c have ordered dims

"a": ('x', 'y')
"b": ('y', 'z')
"c": ('z', 'x')

then how can you decide what the order of the dimensions ('x', 'y', 'z') on the containing dataset should be?

You could make a choice, but it will always be arbitrary, and either inconsistent with one of the variables or require changing the order of one of the variables. If that choice lives "above" the dataarrays, then it's sort of weird silent extra information.

I also think we might want to think of this in terms of preserving some properties, e.g.

ds === xr.merge(var.transpose(...) for var in ds.variables)

I'm not sure if that is addressed by your proposal but it's my mental model of why we don't do this.

Do we gain anything from have unordered dims in a Dataset?

Do we gain anything by having ordered dims?

@max-sixty
Copy link
Collaborator Author

Do we gain anything by having ordered dims?

Good question!

The main thing thing is having some canonical ordering:

When there are transformations which change dim ordering, Xarray is "allowed" to transpose variables to the dataset's ordering. Currently Xarray is "allowed" to change dim order arbitrarily — for example to put a core dim last. IIUC, we'd prefer to set a non-arbitrary order, but we don't have one to reference.

For example:

DataArray.dims & Dataset.dims have different types — and it wouldn't be possible to unify them without a change like this

Some methods that need a dim_order kwarg, because this can't be encoded in a dataset:

We have this sort of boilerplate littered around, because .apply_ufunc doesn't want to enforce dim order


You could make a choice, but it will always be arbitrary, and either inconsistent with one of the variables or require changing the order of one of the variables. If that choice lives "above" the dataarrays, then it's sort of weird silent extra information.

I agree — "sort of weird silent extra information." is exactly right. Though at the moment, arguably we have this extra information — <xarray.Dataset> Dimensions: (lat: 25, time: 2920, lon: 53) — it's just not possible to change. The proposal is basically: "allow people to change this, and using this order if we need one"...

@TomNicholas
Copy link
Member

Some methods that need a dim_order kwarg, because this can't be encoded in a dataset:

We have this sort of boilerplate littered around, because .apply_ufunc doesn't want to enforce dim order

I don't really think this is boilerplate, I think it's the user entering new information (or an arbitrary choice being made for them) at the exact moment it's needed. The examples you've shown here are basically the only times we have to step outside the xarray.Dataset model of unordered dims: (1) converting to an ordered type (dataframes) and (2) losing distinction between dimensions via name because we're dealing with a type with unnamed dims (numpy arrays).

I agree — "sort of weird silent extra information." is exactly right. Though at the moment, arguably we have this extra information — <xarray.Dataset> Dimensions: (lat: 25, time: 2920, lon: 53) — it's just not possible to change. The proposal is basically: "allow people to change this, and using this order if we need one"...

This is clearer to me, thanks. I'm not sure I agree that we already have this extra information though - Dataset._dims stores a dict, which though technically ordered since python 3.7, certainly isn't intended there to represent an ordered Mapping. (Or are you saying that this information is stored in some other way? By being deterministically generatable from the stored variables?)

@max-sixty
Copy link
Collaborator Author

Yeah — it sounds like your perspective is:

  • this is really not that useful, the arguments in favor are somewhat tertiary
  • It would introduce this "sort of weird silent extra information", which is a legible downside

I think that's quite reasonable. I had weighed them differently, but I agree with the map of arguments.

Unless someone wants to arrive like Joan of Arc in favor of ordering, we can close...

@max-sixty max-sixty added the plan to close May be closeable, needs more eyeballs label Dec 5, 2023
@max-sixty
Copy link
Collaborator Author

Closing

(but not forgetting! One day... 😁 )

@max-sixty max-sixty closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API design plan to close May be closeable, needs more eyeballs
Projects
None yet
Development

No branches or pull requests

2 participants