Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xarray.DataArray.expand_dims() can only expand dimension for a point coordinate #2710

Closed
pletchm opened this issue Jan 25, 2019 · 14 comments
Closed

Comments

@pletchm
Copy link
Contributor

pletchm commented Jan 25, 2019

Current expand_dims functionality

Apparently, expand_dims can only create a dimension for a point coordinate, i.e. it promotes a scalar coordinate into 1D coordinate. Here is an example:

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
>>> da["a"] = 0  # create a point coordinate
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
    a        int64 0
>>> da.expand_dims("a")  # create a new dimension "a" for the point coordinated
<xarray.DataArray (a: 1, b: 5, c: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
  * a        (a) int64 0
>>>

Problem description

I want to be able to do 2 more things with expand_dims or maybe a related/similar method:

  1. broadcast the data across 1 or more new dimensions
  2. expand an existing dimension to include 1 or more new coordinates

Here is the code I currently use to accomplish this

from collections import OrderedDict

import xarray as xr


def expand_dimensions(data, fill_value=np.nan, **new_coords):
    """Expand (or add if it doesn't yet exist) the data array to fill in new
    coordinates across multiple dimensions.

    If a dimension doesn't exist in the dataarray yet, then the result will be
    `data`, broadcasted across this dimension.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, b=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 3, b: 5)>
    array([[ 1.,  1.,  1.,  1.,  1.],
           [ 2.,  2.,  2.,  2.,  2.],
           [ 3.,  3.,  3.,  3.,  3.]])
    Coordinates:
      * a        (a) int64 0 1 2
      * b        (b) int64 1 2 3 4 5

    Or, if `dim` is already a dimension in `data`, then any new coordinate
    values in `new_coords` that are not yet in `data[dim]` will be added,
    and the values corresponding to those new coordinates will be `fill_value`.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, a=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 6)>
    array([ 1.,  2.,  3.,  0.,  0.,  0.])
    Coordinates:
      * a        (a) int64 0 1 2 3 4 5

    Args:
        data (xarray.DataArray):
            Data that needs dimensions expanded.
        fill_value (scalar, xarray.DataArray, optional):
            If expanding new coords this is the value of the new datum.
            Defaults to `np.nan`.
        **new_coords (list[int | str]):
            The keywords are arbitrary dimensions and the values are
            coordinates of those dimensions that the data will include after it
            has been expanded.
    Returns:
        xarray.DataArray:
            Data that had its dimensions expanded to include the new
            coordinates.
    """
    ordered_coord_dict = OrderedDict(new_coords)
    shape_da = xr.DataArray(
        np.zeros(list(map(len, ordered_coord_dict.values()))),
        coords=ordered_coord_dict,
        dims=ordered_coord_dict.keys())
    expanded_data = xr.broadcast(data, shape_da)[0].fillna(fill_value)
    return expanded_data

Here's an example of broadcasting data across a new dimension:

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> expand_dimensions(da, a=[0, 1, 2])
<xarray.DataArray (b: 5, c: 3, a: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
  * a        (a) int64 0 1 2

Here's an example of expanding an existing dimension to include new coordinates:

>>> expand_dimensions(da, b=[5, 6])
<xarray.DataArray (b: 7, c: 3)>
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [nan, nan, nan],
       [nan, nan, nan]])
Coordinates:
  * b        (b) int64 0 1 2 3 4 5 6
  * c        (c) int64 0 1 2

Final Note

If no one else is already working on this, and if it seems like a useful addition to XArray, then I would more than happy to work on this. Please let me know.

Thank you,
Martin

@shoyer
Copy link
Member

shoyer commented Jan 26, 2019

broadcast the data across 1 or more new dimensions

Yes, this feels in scope for expand_dims(). But I think there are two separate features here:

  1. Support inserting/broadcasting dimensions with size > 1.
  2. Specify the size of the new dimension implicitly, by providing coordinate labels.

I think we would want both to be supported -- you should not be required to supply coordinate labels in order to expand to a dimension of size > 1. We can imagine the first being spelled like da.expand_dims({'a': 3}) or da.expand_dims(a=3).

expand an existing dimension to include 1 or more new coordinates

This feels a little different from expand_dims to me. Here the fundamental operation is alignment/reindexing, not broadcasting across a new dimension. The result also looks different, because you get all the NaN values.

I would probably write this with reindex, e.g.,

In [12]: da.reindex(b=list(da.b.values)+[5, 6])
Out[12]:
<xarray.DataArray (b: 7, c: 3)>
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [nan, nan, nan],
       [nan, nan, nan]])
Coordinates:
  * b        (b) int64 0 1 2 3 4 5 6
  * c        (c) int64 0 1 2

@pletchm
Copy link
Contributor Author

pletchm commented Jan 29, 2019

Hi,
Thanks for replying. I see what you mean about the 2 separate features.

Would it be alright if I opened a PR sometime soon that upgraded expand_dims to support the inserting/broadcasting dimensions with size > 1 (the first feature)?

I would use your suggested API, i.e. not requiring explicit coordinate names -- that makes sense. However, it feels like the dimension kwargs (i.e. the new dimension/dimensions), should be allowed to be given implicit or explicit coordinates, in case the user doesn't want 0-based integer coordinates for the new dimension. For example,

da.expand_dims(a=3)

is equivalent to

da.expand_dims(a=[0, 1, 2])   

but this will also work

da.expand_dims(a=['w', 'x', 'y', 'z'])

where da is

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2

Does that make sense?

Thank you!
Martin

@dcherian
Copy link
Contributor

da.expand_dims(a=3) should not be equivalent to da.expand_dims(a=[0, 1, 2]) because the latter will also create a co-ordinate a. Am I understanding this right?

@pletchm
Copy link
Contributor Author

pletchm commented Jan 29, 2019

Those would be equivalent, I think, assuming they're both manipulating the same da object (I meant for them to be separate calls not sequential, but even if they were sequential, expand_dims doesn't and wouldn't alter da, but instead return a new xarray object). I edited my above post to clarify what da is.

@dcherian
Copy link
Contributor

Well then I think they should be different.

Currently, da.expand_dims('a') gives

<xarray.DataArray (a: 1, b: 5, c: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
Dimensions without coordinates: a

da.expand_dims(a=3) should give

<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
Dimensions without coordinates: a

da.expand_dims(a=[9, 10, 11]) should give

<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
  * a        (a) int64 9 10 11
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2

i.e. in this last case, the user has specified co-ordinate labels and so the returned DataArray has a new co-ordinate a.

@pletchm
Copy link
Contributor Author

pletchm commented Jan 29, 2019

Oh I see what you're saying. Yeah, that makes sense.

To get the equivalent of da.expand_dims(a=[9, 10, 11]), you'd do

>>> new = da.expand_dims(a=3)
>>> new
<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
Dimensions without coordinates: a
>>> new["a"] = [9, 10, 11]

@shoyer
Copy link
Member

shoyer commented Jan 29, 2019

Would it be alright if I opened a PR sometime soon that upgraded expand_dims to support the inserting/broadcasting dimensions with size > 1 (the first feature)?

Yes, that sounds welcome to me!

I think much of the underlying logic should already exist on the Variable.set_dims() method. See also the either_dict_or_kwargs utility in xarray.core.utils.

@barkls
Copy link

barkls commented Apr 22, 2019

Unfortunately this most recent change has broken my workflow. I was using expand_dims to add a named dimension back onto a DataArray, when the dimension had been previously removed with the sel method. I realize this may not be the best way of doing things, but I wanted to point out that there is a loss of functionality here.

import xarray as xr
da = xr.DataArray([0,1,2], dims=['dim1'], coords={'dim1':['a','b','c']})
print(da.dims) # returns ('dim1',)
da = da.sel({'dim1':'a'})
print(da.dims) # returns ()
da = da.expand_dims(da.coords) # fails in 0.12.1
print(da.dims) # returns ('dim1',) in 0.12.0

@shoyer
Copy link
Member

shoyer commented Apr 22, 2019

@barkls I think da.expand_dims(list(da.coords)) should work for this use-case.

Previously, we only used the argument to expand_dims() as a sequence, but now we distinguish between mappings and other sequences.

I don't know what the best resolution would be here, but this seems to be a hazard of duck-typing. I did not anticipate that some users would already be iterating over mappings like .coords.

@dcherian
Copy link
Contributor

dcherian commented Apr 22, 2019

Another solution could be adding support for da.sel(dim1='a', squeeze=False) to avoid losing the dim1 dimension/coordinate in the first place

@pletchm
Copy link
Contributor Author

pletchm commented Apr 22, 2019

Another solution could be adding support for da.sel(dim1='a', squeeze=False) to avoid losing the dim1 dimension/coordinate in the first place

Or equivalently, you could just do

da.sel(dim1=['a'])

@barkls
Copy link

barkls commented Apr 22, 2019

@pletchm that is the solution I found as well. Thanks all for the suggestions!

@TomNicholas
Copy link
Contributor

@pletchm was this issue closed by #2757?

@pletchm
Copy link
Contributor Author

pletchm commented Feb 20, 2020

Yes, @TomNicholas. My PR got merged but I forgot to close the issue -- closing it now. Thanks for checking.

@pletchm pletchm closed this as completed Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants