-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
What is your issue?
The latest version of xarray (2025.4.0) breaks a pattern I relied on extensively: grouping by an "external" array (one not included in the data itself).
I understand that, when using Dask arrays, groupby now requires specifying the labels in advance. However, I would expect the "Second test" below to be sufficient to obtain the grouped results.
Using flox == 0.10.3, but I don't believe it matters here.
Example
import numpy as np
import pandas as pd
import xarray as xr
data = np.random.randn(100, 30)
times = pd.date_range(start="2020-01-01", periods=100, freq="1min")
da = xr.DataArray(data, dims=["t", "y"], coords={"t": times, "y": range(30)})
groups = [0] * 20 + [1] * 30 + [0] * 30 + [1] * 20
grouper = xr.DataArray(
groups, dims=["t"], coords={"t": times}, name="grouper"
).chunk({"t": 20})
da_grouped = da.groupby(grouper).mean()
da_grouped
xarray 2025.3.1
<xarray.DataArray (grouper: 2, y: 30)> Size: 480B
array([[ 0.01227056, -0.05314055, -0.0197765 , 0.01933484, -0.24461523,
0.03464199, -0.04162751, 0.100811 , 0.07839175, -0.17579131,
-0.19086663, -0.03855792, -0.11889067, -0.10432551, -0.08128964,
0.18822824, 0.06078923, -0.09421675, -0.05877735, -0.04124922,
-0.05763315, -0.15810688, -0.06592451, 0.2640506 , -0.0817198 ,
-0.14321519, -0.04152522, -0.00993262, 0.03271208, 0.08647186],
[ 0.06181896, -0.03559387, 0.0649796 , 0.11243374, -0.04737599,
-0.25380454, 0.21691169, 0.04980174, 0.19123282, 0.12596733,
0.01971407, -0.1727822 , -0.16086587, 0.03812562, -0.02516585,
-0.11980421, -0.06404743, -0.14069857, 0.01676893, 0.20732787,
0.13062032, -0.0732914 , 0.21038181, 0.16341856, -0.05329621,
0.1948512 , -0.02858808, -0.11468347, -0.03775833, 0.17974125]])
Coordinates:
* y (y) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
* grouper (grouper) int64 16B 0 1
xarray 2025.4.0
First test: code as above
----> 1 da_grouped = da.groupby(grouper).mean()
File ~/.../site-packages/xarray/util/deprecation_helpers.py:118, in _deprecate_positional_args.<locals>._decorator.<locals>.inner(*args, **kwargs)
114 kwargs.update(zip_args)
116 return func(*args[:-n_extra_args], **kwargs)
--> 118 return func(*args, **kwargs)
File ~/.../site-packages/xarray/core/dataarray.py:6940, in DataArray.groupby(self, group, squeeze, restore_coord_dims, eagerly_compute_group, **groupers)
6933 from xarray.core.groupby import (
6934 DataArrayGroupBy,
6935 _parse_group_and_groupers,
6936 _validate_groupby_squeeze,
6937 )
6939 _validate_groupby_squeeze(squeeze)
-> 6940 rgroupers = _parse_group_and_groupers(
6941 self, group, groupers, eagerly_compute_group=eagerly_compute_group
6942 )
6943 return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)
File ~/.../site-packages/xarray/core/groupby.py:410, in _parse_group_and_groupers(obj, group, groupers, eagerly_compute_group)
407 rgroupers: tuple[ResolvedGrouper, ...]
...
352 if isinstance(self.grouper, BinGrouper) and isinstance(
353 self.grouper.bins, int
354 ):
ValueError: Please pass `labels` to UniqueGrouper when grouping by a chunked array.
Second test: use a UniqueGrouper
from xarray.groupers import UniqueGrouper
unique_grouper = UniqueGrouper(grouper, labels=np.unique(grouper))
da_grouped = da.groupby(unique_grouper).mean()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 da_grouped = da.groupby(unique_grouper1).mean()
2 da_grouped
File ~/.../util/deprecation_helpers.py:118, in ...
114 kwargs.update(zip_args)
116 return func(*args[:-n_extra_args], **kwargs)
--> 118 return func(*args, **kwargs)
File ~/.../core/dataarray.py:6940, in ...
6933 ...
6939 ...
-> 6940 rgroupers = _parse_group_and_groupers(
6941 self, group, groupers, eagerly_compute_group=eagerly_compute_group
6942 )
6943 return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)
File ~/.../core/groupby.py:421, in ...
417 assert isinstance(group, str | Sequence)
418 group_iter: Sequence[Hashable] = (
419 (group,) if isinstance(group, str) else group
420 )
--> 421 grouper_mapping = {g: UniqueGrouper() for g in group_iter}
422 elif groupers:
423 grouper_mapping = cast("Mapping[Hashable, Grouper]", groupers)
TypeError: 'UniqueGrouper' object is not iterable
Third test: specify a grouping dimension (t)
da_grouped = da.groupby(t=unique_grouper).mean()
Getting an empty array:
<xarray.DataArray (t: 2, y: 30)> Size: 480B
array([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan],
[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan]])
Coordinates:
* y (y) int64 240B 0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29
* t (t) int64 16B 0 1
Fourth way: add grouper to da
This method works, but it requires me to always keep track of or recompute the group name and the UniqueGrouper.
da = da.assign_coords({"grouper": ("t", unique_grouper1.group_as_index.data)})
da_grouped = da.groupby({"grouper": unique_grouper1}).mean()
frazane