xarray.DataArray.expand_dims() can only expand dimension for a point coordinate #2710

pletchm · 2019-01-25T20:46:05Z

Current `expand_dims` functionality

Apparently, expand_dims can only create a dimension for a point coordinate, i.e. it promotes a scalar coordinate into 1D coordinate. Here is an example:

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
>>> da["a"] = 0  # create a point coordinate
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
    a        int64 0
>>> da.expand_dims("a")  # create a new dimension "a" for the point coordinated
<xarray.DataArray (a: 1, b: 5, c: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
  * a        (a) int64 0
>>>

Problem description

I want to be able to do 2 more things with expand_dims or maybe a related/similar method:

broadcast the data across 1 or more new dimensions
expand an existing dimension to include 1 or more new coordinates

Here is the code I currently use to accomplish this

from collections import OrderedDict

import xarray as xr


def expand_dimensions(data, fill_value=np.nan, **new_coords):
    """Expand (or add if it doesn't yet exist) the data array to fill in new
    coordinates across multiple dimensions.

    If a dimension doesn't exist in the dataarray yet, then the result will be
    `data`, broadcasted across this dimension.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, b=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 3, b: 5)>
    array([[ 1.,  1.,  1.,  1.,  1.],
           [ 2.,  2.,  2.,  2.,  2.],
           [ 3.,  3.,  3.,  3.,  3.]])
    Coordinates:
      * a        (a) int64 0 1 2
      * b        (b) int64 1 2 3 4 5

    Or, if `dim` is already a dimension in `data`, then any new coordinate
    values in `new_coords` that are not yet in `data[dim]` will be added,
    and the values corresponding to those new coordinates will be `fill_value`.

    >>> da = xr.DataArray([1, 2, 3], dims="a", coords=[[0, 1, 2]])
    >>> expand_dimensions(da, a=[1, 2, 3, 4, 5])
    <xarray.DataArray (a: 6)>
    array([ 1.,  2.,  3.,  0.,  0.,  0.])
    Coordinates:
      * a        (a) int64 0 1 2 3 4 5

    Args:
        data (xarray.DataArray):
            Data that needs dimensions expanded.
        fill_value (scalar, xarray.DataArray, optional):
            If expanding new coords this is the value of the new datum.
            Defaults to `np.nan`.
        **new_coords (list[int | str]):
            The keywords are arbitrary dimensions and the values are
            coordinates of those dimensions that the data will include after it
            has been expanded.
    Returns:
        xarray.DataArray:
            Data that had its dimensions expanded to include the new
            coordinates.
    """
    ordered_coord_dict = OrderedDict(new_coords)
    shape_da = xr.DataArray(
        np.zeros(list(map(len, ordered_coord_dict.values()))),
        coords=ordered_coord_dict,
        dims=ordered_coord_dict.keys())
    expanded_data = xr.broadcast(data, shape_da)[0].fillna(fill_value)
    return expanded_data

Here's an example of broadcasting data across a new dimension:

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> expand_dimensions(da, a=[0, 1, 2])
<xarray.DataArray (b: 5, c: 3, a: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
  * a        (a) int64 0 1 2

Here's an example of expanding an existing dimension to include new coordinates:

>>> expand_dimensions(da, b=[5, 6])
<xarray.DataArray (b: 7, c: 3)>
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [nan, nan, nan],
       [nan, nan, nan]])
Coordinates:
  * b        (b) int64 0 1 2 3 4 5 6
  * c        (c) int64 0 1 2

Final Note

If no one else is already working on this, and if it seems like a useful addition to XArray, then I would more than happy to work on this. Please let me know.

Thank you,
Martin

The text was updated successfully, but these errors were encountered:

shoyer · 2019-01-26T03:55:39Z

broadcast the data across 1 or more new dimensions

Yes, this feels in scope for expand_dims(). But I think there are two separate features here:

Support inserting/broadcasting dimensions with size > 1.
Specify the size of the new dimension implicitly, by providing coordinate labels.

I think we would want both to be supported -- you should not be required to supply coordinate labels in order to expand to a dimension of size > 1. We can imagine the first being spelled like da.expand_dims({'a': 3}) or da.expand_dims(a=3).

expand an existing dimension to include 1 or more new coordinates

This feels a little different from expand_dims to me. Here the fundamental operation is alignment/reindexing, not broadcasting across a new dimension. The result also looks different, because you get all the NaN values.

I would probably write this with reindex, e.g.,

In [12]: da.reindex(b=list(da.b.values)+[5, 6])
Out[12]:
<xarray.DataArray (b: 7, c: 3)>
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [nan, nan, nan],
       [nan, nan, nan]])
Coordinates:
  * b        (b) int64 0 1 2 3 4 5 6
  * c        (c) int64 0 1 2

pletchm · 2019-01-29T16:32:36Z

Hi,
Thanks for replying. I see what you mean about the 2 separate features.

Would it be alright if I opened a PR sometime soon that upgraded expand_dims to support the inserting/broadcasting dimensions with size > 1 (the first feature)?

I would use your suggested API, i.e. not requiring explicit coordinate names -- that makes sense. However, it feels like the dimension kwargs (i.e. the new dimension/dimensions), should be allowed to be given implicit or explicit coordinates, in case the user doesn't want 0-based integer coordinates for the new dimension. For example,

da.expand_dims(a=3)

is equivalent to

da.expand_dims(a=[0, 1, 2])

but this will also work

da.expand_dims(a=['w', 'x', 'y', 'z'])

where da is

>>> coords = {"b": range(5), "c": range(3)}
>>> da = xr.DataArray(np.ones([5, 3]), coords=coords, dims=list(coords.keys()))
>>> da
<xarray.DataArray (b: 5, c: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2

Does that make sense?

Thank you!
Martin

dcherian · 2019-01-29T17:25:26Z

da.expand_dims(a=3) should not be equivalent to da.expand_dims(a=[0, 1, 2]) because the latter will also create a co-ordinate a. Am I understanding this right?

pletchm · 2019-01-29T17:49:55Z

Those would be equivalent, I think, assuming they're both manipulating the same da object (I meant for them to be separate calls not sequential, but even if they were sequential, expand_dims doesn't and wouldn't alter da, but instead return a new xarray object). I edited my above post to clarify what da is.

dcherian · 2019-01-29T21:42:08Z

Well then I think they should be different.

Currently, da.expand_dims('a') gives

<xarray.DataArray (a: 1, b: 5, c: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
Dimensions without coordinates: a

da.expand_dims(a=3) should give

<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
Dimensions without coordinates: a

da.expand_dims(a=[9, 10, 11]) should give

<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
  * a        (a) int64 9 10 11
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2

i.e. in this last case, the user has specified co-ordinate labels and so the returned DataArray has a new co-ordinate a.

pletchm · 2019-01-29T22:19:58Z

Oh I see what you're saying. Yeah, that makes sense.

To get the equivalent of da.expand_dims(a=[9, 10, 11]), you'd do

>>> new = da.expand_dims(a=3)
>>> new
<xarray.DataArray (a: 3, b: 5, c: 3)>
...
Coordinates:
  * b        (b) int64 0 1 2 3 4
  * c        (c) int64 0 1 2
Dimensions without coordinates: a
>>> new["a"] = [9, 10, 11]

shoyer · 2019-01-29T23:13:33Z

Would it be alright if I opened a PR sometime soon that upgraded expand_dims to support the inserting/broadcasting dimensions with size > 1 (the first feature)?

Yes, that sounds welcome to me!

I think much of the underlying logic should already exist on the Variable.set_dims() method. See also the either_dict_or_kwargs utility in xarray.core.utils.

barkls · 2019-04-22T19:29:56Z

Unfortunately this most recent change has broken my workflow. I was using expand_dims to add a named dimension back onto a DataArray, when the dimension had been previously removed with the sel method. I realize this may not be the best way of doing things, but I wanted to point out that there is a loss of functionality here.

import xarray as xr
da = xr.DataArray([0,1,2], dims=['dim1'], coords={'dim1':['a','b','c']})
print(da.dims) # returns ('dim1',)
da = da.sel({'dim1':'a'})
print(da.dims) # returns ()
da = da.expand_dims(da.coords) # fails in 0.12.1
print(da.dims) # returns ('dim1',) in 0.12.0

shoyer · 2019-04-22T20:26:03Z

@barkls I think da.expand_dims(list(da.coords)) should work for this use-case.

Previously, we only used the argument to expand_dims() as a sequence, but now we distinguish between mappings and other sequences.

I don't know what the best resolution would be here, but this seems to be a hazard of duck-typing. I did not anticipate that some users would already be iterating over mappings like .coords.

dcherian · 2019-04-22T20:29:47Z

Another solution could be adding support for da.sel(dim1='a', squeeze=False) to avoid losing the dim1 dimension/coordinate in the first place

pletchm · 2019-04-22T20:39:17Z

Another solution could be adding support for da.sel(dim1='a', squeeze=False) to avoid losing the dim1 dimension/coordinate in the first place

Or equivalently, you could just do

da.sel(dim1=['a'])

barkls · 2019-04-22T20:44:26Z

@pletchm that is the solution I found as well. Thanks all for the suggestions!

TomNicholas · 2020-02-19T10:09:26Z

@pletchm was this issue closed by #2757?

pletchm · 2020-02-20T15:35:22Z

Yes, @TomNicholas. My PR got merged but I forgot to close the issue -- closing it now. Thanks for checking.

shoyer added the API design label Jan 26, 2019

pletchm mentioned this issue Feb 8, 2019

Allow expand_dims() method to support inserting/broadcasting dimensions with size>1 #2757

Merged

2 tasks

shoyer mentioned this issue Apr 22, 2019

Behavior of da.expand_dims(da.coords) changed in 0.12.1 #2914

Closed

pletchm closed this as completed Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xarray.DataArray.expand_dims() can only expand dimension for a point coordinate #2710

xarray.DataArray.expand_dims() can only expand dimension for a point coordinate #2710

pletchm commented Jan 25, 2019

shoyer commented Jan 26, 2019

pletchm commented Jan 29, 2019 •

edited

dcherian commented Jan 29, 2019

pletchm commented Jan 29, 2019

dcherian commented Jan 29, 2019

pletchm commented Jan 29, 2019

shoyer commented Jan 29, 2019

barkls commented Apr 22, 2019

shoyer commented Apr 22, 2019 •

edited

dcherian commented Apr 22, 2019 •

edited

pletchm commented Apr 22, 2019

barkls commented Apr 22, 2019

TomNicholas commented Feb 19, 2020

pletchm commented Feb 20, 2020

xarray.DataArray.expand_dims() can only expand dimension for a point coordinate #2710

xarray.DataArray.expand_dims() can only expand dimension for a point coordinate #2710

Comments

pletchm commented Jan 25, 2019

Current expand_dims functionality

Problem description

Here is the code I currently use to accomplish this

Final Note

shoyer commented Jan 26, 2019

pletchm commented Jan 29, 2019 • edited

dcherian commented Jan 29, 2019

pletchm commented Jan 29, 2019

dcherian commented Jan 29, 2019

pletchm commented Jan 29, 2019

shoyer commented Jan 29, 2019

barkls commented Apr 22, 2019

shoyer commented Apr 22, 2019 • edited

dcherian commented Apr 22, 2019 • edited

pletchm commented Apr 22, 2019

barkls commented Apr 22, 2019

TomNicholas commented Feb 19, 2020

pletchm commented Feb 20, 2020

Current `expand_dims` functionality

pletchm commented Jan 29, 2019 •

edited

shoyer commented Apr 22, 2019 •

edited

dcherian commented Apr 22, 2019 •

edited