Skip to content

xarray 2025.4.0: issues with grouping using external arrays #10356

@abiasiol

Description

@abiasiol

What is your issue?

The latest version of xarray (2025.4.0) breaks a pattern I relied on extensively: grouping by an "external" array (one not included in the data itself).

I understand that, when using Dask arrays, groupby now requires specifying the labels in advance. However, I would expect the "Second test" below to be sufficient to obtain the grouped results.

Using flox == 0.10.3, but I don't believe it matters here.

Example

import numpy as np
import pandas as pd
import xarray as xr

data = np.random.randn(100, 30)
times = pd.date_range(start="2020-01-01", periods=100, freq="1min")
da = xr.DataArray(data, dims=["t", "y"], coords={"t": times, "y": range(30)})

groups = [0] * 20 + [1] * 30 + [0] * 30 + [1] * 20
grouper = xr.DataArray(
    groups, dims=["t"], coords={"t": times}, name="grouper"
).chunk({"t": 20})

da_grouped = da.groupby(grouper).mean()
da_grouped

xarray 2025.3.1

<xarray.DataArray (grouper: 2, y: 30)> Size: 480B
array([[ 0.01227056, -0.05314055, -0.0197765 ,  0.01933484, -0.24461523,
         0.03464199, -0.04162751,  0.100811  ,  0.07839175, -0.17579131,
        -0.19086663, -0.03855792, -0.11889067, -0.10432551, -0.08128964,
         0.18822824,  0.06078923, -0.09421675, -0.05877735, -0.04124922,
        -0.05763315, -0.15810688, -0.06592451,  0.2640506 , -0.0817198 ,
        -0.14321519, -0.04152522, -0.00993262,  0.03271208,  0.08647186],
       [ 0.06181896, -0.03559387,  0.0649796 ,  0.11243374, -0.04737599,
        -0.25380454,  0.21691169,  0.04980174,  0.19123282,  0.12596733,
         0.01971407, -0.1727822 , -0.16086587,  0.03812562, -0.02516585,
        -0.11980421, -0.06404743, -0.14069857,  0.01676893,  0.20732787,
         0.13062032, -0.0732914 ,  0.21038181,  0.16341856, -0.05329621,
         0.1948512 , -0.02858808, -0.11468347, -0.03775833,  0.17974125]])
Coordinates:
  * y        (y) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
  * grouper  (grouper) int64 16B 0 1

xarray 2025.4.0

First test: code as above

----> 1 da_grouped = da.groupby(grouper).mean()

File ~/.../site-packages/xarray/util/deprecation_helpers.py:118, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
    114     kwargs.update(zip_args)
    116     return func(*args[:-n_extra_args], **kwargs)
--> 118 return func(*args, **kwargs)

File ~/.../site-packages/xarray/core/dataarray.py:6940, in DataArray.groupby(self, group, squeeze, restore_coord_dims, eagerly_compute_group, **groupers)
   6933 from xarray.core.groupby import (
   6934     DataArrayGroupBy,
   6935     _parse_group_and_groupers,
   6936     _validate_groupby_squeeze,
   6937 )
   6939 _validate_groupby_squeeze(squeeze)
-> 6940 rgroupers = _parse_group_and_groupers(
   6941     self, group, groupers, eagerly_compute_group=eagerly_compute_group
   6942 )
   6943 return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

File ~/.../site-packages/xarray/core/groupby.py:410, in _parse_group_and_groupers(obj, group, groupers, eagerly_compute_group)
    407 rgroupers: tuple[ResolvedGrouper, ...]
...
    352     if isinstance(self.grouper, BinGrouper) and isinstance(
    353         self.grouper.bins, int
    354     ):
ValueError: Please pass `labels` to UniqueGrouper when grouping by a chunked array.

Second test: use a UniqueGrouper

from xarray.groupers import UniqueGrouper

unique_grouper = UniqueGrouper(grouper, labels=np.unique(grouper))

da_grouped = da.groupby(unique_grouper).mean()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 da_grouped = da.groupby(unique_grouper1).mean()
      2 da_grouped

File ~/.../util/deprecation_helpers.py:118, in ... 
    114     kwargs.update(zip_args)
    116     return func(*args[:-n_extra_args], **kwargs)
--> 118 return func(*args, **kwargs)

File ~/.../core/dataarray.py:6940, in ...
   6933 ...
   6939 ...
-> 6940 rgroupers = _parse_group_and_groupers(
   6941     self, group, groupers, eagerly_compute_group=eagerly_compute_group
   6942 )
   6943 return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)

File ~/.../core/groupby.py:421, in ...
    417         assert isinstance(group, str | Sequence)
    418     group_iter: Sequence[Hashable] = (
    419         (group,) if isinstance(group, str) else group
    420     )
--> 421     grouper_mapping = {g: UniqueGrouper() for g in group_iter}
    422 elif groupers:
    423     grouper_mapping = cast("Mapping[Hashable, Grouper]", groupers)

TypeError: 'UniqueGrouper' object is not iterable

Third test: specify a grouping dimension (t)

da_grouped = da.groupby(t=unique_grouper).mean()

Getting an empty array:

<xarray.DataArray (t: 2, y: 30)> Size: 480B
array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
        nan, nan, nan, nan]])
Coordinates:
  * y        (y) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
  * t        (t) int64 16B 0 1

Fourth way: add grouper to da

This method works, but it requires me to always keep track of or recompute the group name and the UniqueGrouper.

da = da.assign_coords({"grouper": ("t", unique_grouper1.group_as_index.data)})
da_grouped = da.groupby({"grouper": unique_grouper1}).mean()

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions