-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What happened?
Description
Since ea9f02b and #10571, the following snippet hangs indefinitely — the files are opened successfully, but xr.open_mfdataset
seems to hang and never completes. I end up having to kill the process. Reverting to the commit just before this change works fine.
open_mfdataset_params = {
"engine": engine,
"parallel": True,
"preprocess": partial_preprocess,
"data_vars": "all",
"concat_characters": True,
"mask_and_scale": True,
"decode_cf": True,
"decode_times": self.time_coder,
"decode_coords": True,
"compat": "override",
"coords": "minimal",
"drop_variables": drop_vars_list,
}
ds = xr.open_mfdataset(batch_files, **open_mfdataset_params)
Context
- Engine:
h5netcdf
- Scheduler: Dask running on a Coiled cluster
- Input:
batch_files
is a fileset array of NetCDF objects stored in S3 (fileset = [s3_fs.open(file) for file in s3_paths]
preprocess
runs successfully on all files, butopen_mfdataset
itself gets stuck
What I’ve checked
-
I’m aware of Regression: "h5py objects cannot be pickled" with cloudpickle #10712, and like @rabernat, that change broke my code. I actually caught it in my own integration tests, whereas the xarray unit tests didn’t expose it at the time. More generally, I think some of these regressions could be avoided if there were higher-level integration tests in addition to the existing unit tests. That particular issue was fixed about two weeks ago and resolved the pickle error I was seeing. However, I’m still encountering the hang I describe here.
-
Rolling back to the commit before ea9f02b avoids the problem. In my case, 938e186 is the latest working commit
What did you expect to happen?
xr.open_mfdataset
should complete successfully, as it does on earlier commits.
Minimal Complete Verifiable Example
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!
import xarray as xr
xr.show_versions()
# your reproducer code ...
Steps to reproduce
No response
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:45:31) [GCC 13.3.0]
python-bits: 64
OS: Linux
OS-release: 6.8.0-39-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: en_IE.UTF-8
LOCALE: ('C', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.9.3-development
xarray: 2025.7.2.dev31+gea9f02bbe
pandas: 2.3.3
numpy: 1.26.4
scipy: 1.16.2
netCDF4: 1.6.5
pydap: 3.5.7
h5netcdf: 1.6.4
h5py: 3.11.0
zarr: 2.18.7
cftime: 1.6.4.post1
nc_time_axis: 1.4.1
iris: None
bottleneck: 1.6.0
dask: 2025.9.1
distributed: 2025.9.1
matplotlib: 3.10.6
cartopy: 0.25.0
seaborn: 0.13.2
numbagg: 0.9.3
fsspec: 2025.5.1
cupy: None
pint: None
sparse: 0.17.0
flox: 0.10.7
numpy_groupies: 0.11.3
setuptools: 80.9.0
pip: 25.2
conda: None
pytest: 8.4.2
mypy: None
IPython: 7.34.0
sphinx: 8.2.3