# GroupBy: Split-Apply-Combine

xarray supports “group by” operations with the same API as pandas to implement the split-apply-combine strategy:

    Split your data into multiple independent groups.
    Apply some function to each group.
    Combine your groups back into a single data object.

In [2]:
import xarray as xr
import pandas as pd
import numpy as np

Split

In [3]:
ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 3))},
                coords={'x': [10, 20, 30, 40],
                        'letters': ('x', list('abba'))})



In [4]:
arr = ds['foo']
arr

<xarray.DataArray 'foo' (x: 4, y: 3)>
array([[0.71539 , 0.196173, 0.488009],
       [0.302064, 0.435861, 0.656879],
       [0.831296, 0.41056 , 0.067434],
       [0.993389, 0.14018 , 0.450688]])
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y

In [5]:
ds

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.7154 0.1962 0.488 0.3021 ... 0.9934 0.1402 0.4507

In [6]:
arr

<xarray.DataArray 'foo' (x: 4, y: 3)>
array([[0.71539 , 0.196173, 0.488009],
       [0.302064, 0.435861, 0.656879],
       [0.831296, 0.41056 , 0.067434],
       [0.993389, 0.14018 , 0.450688]])
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y

----------------

If we groupby the name of a variable or coordinate in a dataset (we can also use a DataArray directly), we get back a GroupBy object:

In [7]:
ds.groupby('letters')

<xarray.core.groupby.DatasetGroupBy at 0x30a4f6a90>

This object works very similarly to a pandas GroupBy object. You can view the group indices with the groups attribute:

In [8]:
ds.groupby('letters').groups

{'a': [0, 3], 'b': [1, 2]}

You can also iterate over groups in (label, group) pairs:

In [9]:
list(ds.groupby('letters'))

[('a', <xarray.Dataset>
  Dimensions:  (x: 2, y: 3)
  Coordinates:
    * x        (x) int64 10 40
      letters  (x) <U1 'a' 'a'
  Dimensions without coordinates: y
  Data variables:
      foo      (x, y) float64 0.7154 0.1962 0.488 0.9934 0.1402 0.4507),
 ('b', <xarray.Dataset>
  Dimensions:  (x: 2, y: 3)
  Coordinates:
    * x        (x) int64 20 30
      letters  (x) <U1 'b' 'b'
  Dimensions without coordinates: y
  Data variables:
      foo      (x, y) float64 0.3021 0.4359 0.6569 0.8313 0.4106 0.06743)]

------------------

# Binning

Sometimes you don’t want to use all the unique values to determine the groups but instead want to “bin” the data into coarser groups. You could always create a customized coordinate, but xarray facilitates this via the groupby_bins() method.

In [10]:
x_bins = [0,10,20]
x_bins

[0, 10, 20]

In [11]:
ds

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.7154 0.1962 0.488 0.3021 ... 0.9934 0.1402 0.4507

In [12]:
ds.groupby_bins('x', x_bins).groups

{Interval(0, 10, closed='right'): [0], Interval(10, 20, closed='right'): [1]}

In [13]:
ds.values

<bound method Mapping.values of <xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.7154 0.1962 0.488 0.3021 ... 0.9934 0.1402 0.4507>

The binning is implemented via pandas.cut, whose documentation details how the bins are assigned. As seen in the example above, by default, the bins are labeled with strings using set notation to precisely identify the bin limits. To override this behavior, you can specify the bin labels explicitly. Here we choose float labels which identify the bin centers:

In [14]:
x_bin_labels = [6.5,8.5]
x_bin_labels


[6.5, 8.5]

In [15]:
ds

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.7154 0.1962 0.488 0.3021 ... 0.9934 0.1402 0.4507

In [16]:
x_bins

[0, 10, 20]

In [17]:
x_bin_labels

[6.5, 8.5]

In [18]:
ds.groupby_bins('x', x_bins, labels=x_bin_labels).groups

{6.5: [0], 8.5: [1]}

In [19]:
ds.groupby_bins('x', x_bins).groups

{Interval(0, 10, closed='right'): [0], Interval(10, 20, closed='right'): [1]}

In [20]:
data = np.arange(1,51,1)

In [21]:
data


array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

In [22]:
data1 = data.reshape(5,10)
data1

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

In [23]:
s = pd.DataFrame(data=data1)

In [24]:
s

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1,2,3,4,5,6,7,8,9,10
1,11,12,13,14,15,16,17,18,19,20
2,21,22,23,24,25,26,27,28,29,30
3,31,32,33,34,35,36,37,38,39,40
4,41,42,43,44,45,46,47,48,49,50


In [25]:
chunk1 = s[0:10]
chunk1

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1,2,3,4,5,6,7,8,9,10
1,11,12,13,14,15,16,17,18,19,20
2,21,22,23,24,25,26,27,28,29,30
3,31,32,33,34,35,36,37,38,39,40
4,41,42,43,44,45,46,47,48,49,50


In [26]:
chunk2 = s[11:21]
chunk2

Unnamed: 0,0,1,2,3,4,5,6,7,8,9


In [27]:
chunk3 = s[22:32]
chunk3

Unnamed: 0,0,1,2,3,4,5,6,7,8,9


In [28]:
chunk4 = s[33:43]
chunk4


Unnamed: 0,0,1,2,3,4,5,6,7,8,9


In [29]:
chunk5 = s[44:]
chunk5

Unnamed: 0,0,1,2,3,4,5,6,7,8,9


In [30]:
data = [chunk1.mean(),chunk2.mean(),chunk3.mean(),chunk4.mean(),chunk5.mean()]


In [31]:
data

[0    21.0
 1    22.0
 2    23.0
 3    24.0
 4    25.0
 5    26.0
 6    27.0
 7    28.0
 8    29.0
 9    30.0
 dtype: float64, 0   NaN
 1   NaN
 2   NaN
 3   NaN
 4   NaN
 5   NaN
 6   NaN
 7   NaN
 8   NaN
 9   NaN
 dtype: float64, 0   NaN
 1   NaN
 2   NaN
 3   NaN
 4   NaN
 5   NaN
 6   NaN
 7   NaN
 8   NaN
 9   NaN
 dtype: float64, 0   NaN
 1   NaN
 2   NaN
 3   NaN
 4   NaN
 5   NaN
 6   NaN
 7   NaN
 8   NaN
 9   NaN
 dtype: float64, 0   NaN
 1   NaN
 2   NaN
 3   NaN
 4   NaN
 5   NaN
 6   NaN
 7   NaN
 8   NaN
 9   NaN
 dtype: float64]

In [32]:
s = pd.Series(data)

In [33]:
s

0    0    21.0
1    22.0
2    23.0
3    24.0
4    2...
1    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
2    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
3    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
4    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
dtype: object

In [34]:
fda = xr.DataArray(s)
fda

<xarray.DataArray (dim_0: 5)>
array([0    21.0
1    22.0
2    23.0
3    24.0
4    25.0
5    26.0
6    27.0
7    28.0
8    29.0
9    30.0
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64], dtype=object)
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4

In [35]:
2d


SyntaxError: invalid syntax (<ipython-input-35-6fec5e983169>, line 1)

In [36]:
s

0    0    21.0
1    22.0
2    23.0
3    24.0
4    2...
1    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
2    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
3    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
4    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
dtype: object

In [37]:
ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 3))},
   ...:                 coords={'x': [10, 20, 30, 40],
   ...:                         'letters': ('x', list('abbc'))})

In [38]:
ds

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'c'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.1 0.5364 0.9973 0.463 ... 0.6888 0.04219 0.2054

In [39]:
ds['letters'] = 'c'

In [40]:
ds.groupby('letters').groups

ValueError: not enough values to unpack (expected 2, got 0)

In [41]:
list(ds.groupby('letters'))

ValueError: not enough values to unpack (expected 2, got 0)

In [42]:
x_bins = [0,25,50]

In [43]:
x_bin_labels = ['chunk1','chunk2','chunk3','chunk4','chunk5']

In [44]:
s

0    0    21.0
1    22.0
2    23.0
3    24.0
4    2...
1    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
2    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
3    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
4    0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   Na...
dtype: object

In [45]:
x = xr.Dataset(s)

TypeError: 'Series' objects are mutable, thus they cannot be hashed

In [46]:
x

NameError: name 'x' is not defined

In [47]:
x.groupby_bins(x, x_bins).groups

NameError: name 'x' is not defined

In [79]:
x2 = xr.DataArray(s)

In [80]:
x2

<xarray.DataArray (dim_0: 5)>
array([0    21.0
1    22.0
2    23.0
3    24.0
4    25.0
5    26.0
6    27.0
7    28.0
8    29.0
9    30.0
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64,
       0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
8   NaN
9   NaN
dtype: float64], dtype=object)
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4

In [50]:
x2.groupby_bins('dim_1',bins = x_bins, labels=x_bin_labels).groups

NameError: name 'x2' is not defined

In [51]:
x2['dim_1']

NameError: name 'x2' is not defined

In [52]:
x2

NameError: name 'x2' is not defined

In [53]:
x2[3,3]

NameError: name 'x2' is not defined

In [54]:
x2[4,9]

NameError: name 'x2' is not defined

In [55]:
x2[0,7]

NameError: name 'x2' is not defined

In [56]:
x2[:1]

NameError: name 'x2' is not defined

In [57]:
tbg = x2[:]

NameError: name 'x2' is not defined

In [58]:
x2.groupby_bins('tbg', x_bins).groups

NameError: name 'x2' is not defined

In [59]:
ds

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  <U1 'c'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.1 0.5364 0.9973 0.463 ... 0.6888 0.04219 0.2054

In [60]:
ds.groupby_bins('foo', x_bins).groups

{Interval(0, 25, closed='right'): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}

In [61]:
ds['foo']

<xarray.DataArray 'foo' (x: 4, y: 3)>
array([[0.100046, 0.536364, 0.997336],
       [0.463025, 0.893551, 0.498319],
       [0.116993, 0.307118, 0.781454],
       [0.688785, 0.042186, 0.20541 ]])
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  <U1 'c'
Dimensions without coordinates: y

In [62]:
x_bins

[0, 25, 50]

In [63]:
x_bins2 = [0,7,11]

In [64]:
ds.groupby_bins('foo', x_bins2).groups

{Interval(0, 7, closed='right'): [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]}

In [65]:
ds['foo'][0][0]

<xarray.DataArray 'foo' ()>
array(0.100046)
Coordinates:
    x        int64 10
    letters  <U1 'c'

In [66]:
arr.groupby('x',''y').mean()

SyntaxError: invalid syntax (<ipython-input-66-ed3f12689b42>, line 1)

In [67]:
da = xr.DataArray([[0,1],[2,3]],
   ....:     coords={'lon': (['ny','nx'], [[30,40],[50,50]] ),
   ....:             'lat': (['ny','nx'], [[10,10],[20,20]] ),},
   ....:     dims=['ny','nx'])

In [68]:
da

<xarray.DataArray (ny: 2, nx: 2)>
array([[0, 1],
       [2, 3]])
Coordinates:
    lon      (ny, nx) int64 30 40 50 50
    lat      (ny, nx) int64 10 10 20 20
Dimensions without coordinates: ny, nx

In [69]:
da['lon']

<xarray.DataArray 'lon' (ny: 2, nx: 2)>
array([[30, 40],
       [50, 50]])
Coordinates:
    lon      (ny, nx) int64 30 40 50 50
    lat      (ny, nx) int64 10 10 20 20
Dimensions without coordinates: ny, nx

In [70]:
da.groupby('lon').groups

{30: [0], 40: [1], 50: [2, 3]}

In [71]:

da.all

<bound method ImplementsArrayReduce._reduce_method.<locals>.wrapped_func of <xarray.DataArray (ny: 2, nx: 2)>
array([[0, 1],
       [2, 3]])
Coordinates:
    lon      (ny, nx) int64 30 40 50 50
    lat      (ny, nx) int64 10 10 20 20
Dimensions without coordinates: ny, nx>

In [72]:
da.expand_dims?

In [73]:
da.groupby('lon').

SyntaxError: invalid syntax (<ipython-input-73-df34c733f9eb>, line 1)

In [192]:
da.groupby_bins('lon', [0,45,50])

<xarray.core.groupby.DataArrayGroupBy at 0x315cfb0f0>

In [193]:
da.groupby_bins('lon', [0,45,50]).groups

{Interval(0, 45, closed='right'): [0, 1],
 Interval(45, 50, closed='right'): [2, 3]}

In [194]:
data

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50])

In [195]:
x2

<xarray.DataArray (dim_0: 5, dim_1: 10)>
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4
  * dim_1    (dim_1) int64 0 1 2 3 4 5 6 7 8 9

In [200]:
da = xr.DataArray(x2)

In [201]:
da

<xarray.DataArray (dim_0: 5, dim_1: 10)>
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4
  * dim_1    (dim_1) int64 0 1 2 3 4 5 6 7 8 9

In [209]:
da.groupby_bins('dim_1',[0,6,9],labels=['low','high']).groups

{'low': [1, 2, 3, 4, 5, 6], 'high': [7, 8, 9]}

In [210]:
test = da.groupby_bins('dim_1',[0,6,9],labels=['low','high']).groups

In [226]:
da[test['low'][0]][1]

<xarray.DataArray ()>
array(12)
Coordinates:
    dim_0    int64 1
    dim_1    int64 1

In [223]:
da['dim_1'].load_data()

AttributeError: 'DataArray' object has no attribute 'load_data'

In [225]:
xr.Coordinate('dim_1',da)

  """Entry point for launching an IPython kernel.


ValueError: dimensions ('dim_1',) must have the same length as the number of data dimensions, ndim=2

In [232]:
da.data[test['low'][0]][9]

20

In [238]:
da.chunk(5,5)

<xarray.DataArray (dim_0: 5, dim_1: 10)>
dask.array<shape=(5, 10), dtype=int64, chunksize=(5, 5)>
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4
  * dim_1    (dim_1) int64 0 1 2 3 4 5 6 7 8 9

In [241]:
da.data

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])

In [242]:
da = xr.DataArray(np.arange(12).reshape((3, 4)), dims=['x', 'y'],
   ....:                   coords={'x': [0, 1, 2], 'y': ['a', 'b', 'c', 'd']})

In [243]:
da

<xarray.DataArray (x: 3, y: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) <U1 'a' 'b' 'c' 'd'

In [245]:
da[1,1]

<xarray.DataArray ()>
array(5)
Coordinates:
    x        int64 1
    y        <U1 'b'

In [249]:
da['y'] = 'a'

ValueError: dimension 'y' already exists as a scalar variable

In [254]:
da

<xarray.DataArray (x: 3, y: 4)>
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
Coordinates:
  * x        (x) int64 0 1 2
  * y        (y) <U1 'a' 'b' 'c' 'd'

In [256]:
x2

<xarray.DataArray (dim_0: 5, dim_1: 10)>
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4
  * dim_1    (dim_1) int64 0 1 2 3 4 5 6 7 8 9

In [257]:
x2[0,4]

<xarray.DataArray ()>
array(5)
Coordinates:
    dim_0    int64 0
    dim_1    int64 4

In [259]:
x2[0,0]

<xarray.DataArray ()>
array(1)
Coordinates:
    dim_0    int64 0
    dim_1    int64 0

In [260]:
x2[4,9]

<xarray.DataArray ()>
array(50)
Coordinates:
    dim_0    int64 4
    dim_1    int64 9

In [261]:
x2[4,0]

<xarray.DataArray ()>
array(41)
Coordinates:
    dim_0    int64 4
    dim_1    int64 0

In [267]:
s3=(x2[4,0]+x2[4,1]+x2[4,2]+x2[4,3]+x2[4,4]+x2[4,5]+x2[4,6]+x2[4,7]+x2[4,8]+x2[4,9])

In [271]:
s3.sum()

<xarray.DataArray ()>
array(455)
Coordinates:
    dim_0    int64 4

In [272]:
x2

<xarray.DataArray (dim_0: 5, dim_1: 10)>
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4
  * dim_1    (dim_1) int64 0 1 2 3 4 5 6 7 8 9

In [273]:
x_bins_range = [(0,0),(0,9),(1,0),(1,9),(2,0),(2,9),(3,0),(3,9),(4,0),(4,9)]

In [275]:
alt

<xarray.Dataset>
Dimensions:  (z: 1)
Coordinates:
  * z        (z) int64 10
    lat      int64 0
    lon      int64 0
Data variables:
    *empty*

In [277]:
temp = 15 + 8 * np.random.randn(2, 2, 3)

precip = 10 * np.random.rand(2, 2, 3)

lon = [[-99.83, -99.32], [-99.79, -99.23]]

lat = [[42.25, 42.21], [42.63, 42.59]]

In [281]:
ds

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.8716 0.4991 0.5633 0.6713 ... 0.4379 0.6857 0.4121

In [282]:
ds.reset_coords()

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
  * x        (x) int64 10 20 30 40
Dimensions without coordinates: y
Data variables:
    foo      (x, y) float64 0.8716 0.4991 0.5633 0.6713 ... 0.4379 0.6857 0.4121
    letters  (x) <U1 'a' 'b' 'b' 'a'

In [284]:
ds.set_coords(['foo'])

<xarray.Dataset>
Dimensions:  (x: 4, y: 3)
Coordinates:
    foo      (x, y) float64 0.8716 0.4991 0.5633 0.6713 ... 0.4379 0.6857 0.4121
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'
Dimensions without coordinates: y
Data variables:
    *empty*

In [289]:
ds['letters'].load()

<xarray.DataArray 'letters' (x: 4)>
array(['a', 'b', 'b', 'a'], dtype='<U1')
Coordinates:
  * x        (x) int64 10 20 30 40
    letters  (x) <U1 'a' 'b' 'b' 'a'

In [290]:
x2

<xarray.DataArray (dim_0: 5, dim_1: 10)>
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
       [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48, 49, 50]])
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4
  * dim_1    (dim_1) int64 0 1 2 3 4 5 6 7 8 9

In [291]:
ds1 = xr.Dataset({'foo': (('x', 'y'), np.random.randn(2, 3))},
                     coords={'x': [10, 20], 'y': ['a', 'b', 'c'],
                             'along_x': ('x', np.random.randn(2)),
                             'scalar': 123})

In [292]:
ds1

<xarray.Dataset>
Dimensions:  (x: 2, y: 3)
Coordinates:
  * x        (x) int64 10 20
  * y        (y) <U1 'a' 'b' 'c'
    along_x  (x) float64 -0.9478 0.5777
    scalar   int64 123
Data variables:
    foo      (x, y) float64 -0.1044 -0.4326 -0.13 -0.6268 -2.644 1.49

In [78]:
x2

NameError: name 'x2' is not defined

In [77]:
dsa = xr.Dataset({'TestA': (('x', 'y'), x2)},
                     coords={'x': [0,1,2,3,4], 'y': [0,1,2,3,4,5,6,7,8,9]})

NameError: name 'x2' is not defined

In [297]:
dsa

<xarray.Dataset>
Dimensions:  (x: 5, y: 10)
Coordinates:
  * x        (x) int64 0 1 2 3 4
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    TestA    (x, y) int64 1 2 3 4 5 6 7 8 9 10 ... 41 42 43 44 45 46 47 48 49 50

In [298]:
dsa.data_vars

Data variables:
    TestA    (x, y) int64 1 2 3 4 5 6 7 8 9 10 ... 41 42 43 44 45 46 47 48 49 50

In [309]:
dsar = dsa.set_coords(['TestA'])

In [305]:
dsa.load()

<xarray.Dataset>
Dimensions:  (x: 5, y: 10)
Coordinates:
  * x        (x) int64 0 1 2 3 4
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    TestA    (x, y) int64 1 2 3 4 5 6 7 8 9 10 ... 41 42 43 44 45 46 47 48 49 50

In [310]:
dsar.load()

<xarray.Dataset>
Dimensions:  (x: 5, y: 10)
Coordinates:
    TestA    (x, y) int64 1 2 3 4 5 6 7 8 9 10 ... 41 42 43 44 45 46 47 48 49 50
  * x        (x) int64 0 1 2 3 4
  * y        (y) int64 0 1 2 3 4 5 6 7 8 9
Data variables:
    *empty*

In [307]:
x_bins_range2 = [0,10,20,30,40,50]

In [308]:
x_bin_labels2 = ['chunk1','chunk2','chunk3','chunk4','chunk5']

In [311]:
dsar.groupby_bins('TestA', x_bins_range2, labels=x_bin_labels2).groups

{'chunk1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 'chunk2': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 'chunk3': [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 'chunk4': [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 'chunk5': [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]}

In [312]:
grouped = dsar.groupby_bins('TestA', x_bins_range2, labels=x_bin_labels2).groups

In [313]:
grouped

{'chunk1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 'chunk2': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 'chunk3': [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 'chunk4': [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 'chunk5': [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]}

In [1]:
grouped['chunk5']

NameError: name 'grouped' is not defined

In [318]:
grpda = xr.DataArray(grouped)

In [319]:
grpda

<xarray.DataArray ()>
array(<built-in method values of dict object at 0x321f59d80>, dtype=object)

In [None]:
grpse = pd.Ser