Change default netCDF engine to use h5netcdf and add netcdf_engine_order #10755

shoyer · 2025-09-16T02:43:12Z

The default engine when reading/writing netCDF files is now h5netcdf or scipy, which are typically faster than the prior default of netCDF4-python. You can control this default behavior explicitly via the new netcdf_engine_order parameter in set_options(), e.g., xr.set_options(netcdf_engine_order=['netcdf4', 'scipy', 'h5netcdf']) to restore the prior defaults.

I've also updated the documentation page which misled @lesserwhirls about Xarray supporting invalid netCDF files without invalid_netcdf=True.

Closes Should Xarray prefer h5netcdf and scipy to netCDF4? #10657
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

@lesserwhirls

The default `engine` when reading/writing netCDF files is now h5netcdf or scipy, which are typically faster than the prior default of netCDF4-python. You can control this default behavior explicitly via the new `netcdf_engine_order` parameter in `set_options()`, e.g., `xr.set_options(netcdf_engine_order=['netcdf4', 'scipy', 'h5netcdf'])` to restore the prior defaults. I've also updated the documentation page which misled @lesserwhirls about Xarray supporting invalid netCDF files without `invalid_netcdf=True`. Fixes pydata#10657

shoyer · 2025-09-16T21:31:02Z

Looking at the test failures, it looks like we previously supported writing NCZarr with ds.to_netcdf(f"file://{filename}#mode=nczarr"). Now we require also passing engine='netcdf4' explicitly.

Should we try to auto-detect URLs like this and use netcdf4 as the backend? Or is it better to encourage users to make an explicit choice?

dcherian · 2025-09-16T22:42:24Z

in general I'm pro "explicit choice", but this would be a breaking change.

@malmans2 how common is nczarr use? I haven't really seen it.

shoyer · 2025-09-17T00:13:49Z

I went ahead and added automatic support for writing nczarr. This wasn't hard to check.

This reverts commit 18fe84f.

This reverts commit 4131449.

malmans2 · 2025-09-17T06:45:53Z

in general I'm pro "explicit choice", but this would be a breaking change.

@malmans2 how common is nczarr use? I haven't really seen it.

I've never seen it actually used in python applications either. From a quick search on GitHub, it looks like the few packages that write to nczarr directly use netcdf4-python rather than xarray

shoyer · 2025-09-17T07:05:38Z

I added supports_groups to BackendEntrypoint. Otherwise, we have no way to check if a backend supports open_datatree() short of calling the open_datatree() method.

This turned up because scipy is now used in preference to netcdf4 when opening netcdf v3 files, but scipy doesn't support opening groups.

In principle we could add support for reading groups to the SciPy backend (netcdf3 files arguably contain a single group, at the root node), but in any case this will also come up for custom backends.

shoyer · 2025-09-23T21:25:25Z

I would love to get this in before the next release, to avoid needing repeated breaking changes.

kmuehlbauer

LGTM, Stephan. Nice to be able to parametrize this.

…tcdf engine first for xarray loading - caused due to pydata/xarray#10755

mraspaud · 2025-10-01T07:57:52Z

I know I'm late to the party, but I just wanted to mention that the (local) netcdf files we use in our community (Earth Observation Satellite processing) are in general faster read with netcdf4 than h5netcdf, as h5netcdf takes about twice the time.
No harm done, since we now have the possibility to change the engine preference, but I just thought I'll let you know for reference.

djhoese · 2025-10-03T13:40:15Z

Sorry if I missed this documentation somewhere else, but I didn't see it mentioned here or in the related issue. Does anyone know of any benchmarks done between the engines with recent versions of other dependency libraries (ex. numpy, pandas, dask). I have the same use cases as @mraspaud above, but I'll admit it's been a while since I've compared the netcdf4 and h5netcdf engines. Since there are so many ways to access files (local, S3 URI, open file-like object, parallel or single-threaded, etc) and so many different types of files (array size, on-disk chunking, etc) I'm wondering if anyone has done the work and documented what they've found for performance for some of these cases.

It seems there is ongoing (or at least wishful thinking) optimizations for h5netcdf (see h5netcdf/h5netcdf#195) that would be interesting to compare against any existing numbers.

shoyer · 2025-10-03T18:37:05Z

@djhoese Let's discuss this back in #10657

I am thinking that perhaps the change to the default ordering here was pre-mature.

github-actions bot added topic-backends topic-DataTree Related to the implementation of a DataTree class io labels Sep 16, 2025

shoyer changed the title ~~Add option for netcdf_engine_order~~ Change default netCDF engine to use h5netcdf and add netcdf_engine_order Sep 16, 2025

shoyer mentioned this pull request Sep 16, 2025

Should Xarray prefer h5netcdf and scipy to netCDF4? #10657

Open

shoyer added 2 commits September 15, 2025 20:02

Merge branch 'main' into netcdf_engine_order

ea8ef94

Fix test failures

c6eb82d

Merge branch 'main' into netcdf_engine_order

6d425db

Automatically support NCZarr

18fe84f

shoyer added 6 commits September 16, 2025 17:14

Revert "Automatically support NCZarr"

4131449

This reverts commit 18fe84f.

Reapply "Automatically support NCZarr"

913cded

This reverts commit 4131449.

Fix mypy

a397f1f

spelling

6a86d3b

Improve typing for _normalize_path()

e48ab59

hard code engine="netcdf4" for test_encoding_enum__no_fill_value

fdc7efb

shoyer added 2 commits September 16, 2025 23:57

Fix reading netcdf3 files with open_datatree

db8ec22

Set engine in test_encoding_enum__multiple_variable_with_enum

68bc5f8

shoyer added 2 commits September 17, 2025 00:12

set yet another test to only use netcdf4

b5ac76d

Merge branch 'main' into netcdf_engine_order

12e1fcd

kmuehlbauer approved these changes Sep 24, 2025

View reviewed changes

shoyer merged commit 4722bf1 into pydata:main Sep 24, 2025
37 checks passed

shoyer deleted the netcdf_engine_order branch September 24, 2025 18:40

rajeeja mentioned this pull request Sep 26, 2025

ESMF netCDF4 engine issue found with upstream CI failure UXARRAY/uxarray#1381

Open

rajeeja added a commit to UXARRAY/uxarray that referenced this pull request Sep 29, 2025

o Try to get it to work with fallback internal API and use default ne…

de22fb9

…tcdf engine first for xarray loading - caused due to pydata/xarray#10755

sjperkins mentioned this pull request Oct 2, 2025

Zarr backend isn't configured with supports_groups=True #10808

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Change default netCDF engine to use h5netcdf and add netcdf_engine_order #10755

Change default netCDF engine to use h5netcdf and add netcdf_engine_order #10755

Uh oh!

shoyer commented Sep 16, 2025

Uh oh!

shoyer commented Sep 16, 2025

Uh oh!

dcherian commented Sep 16, 2025

Uh oh!

shoyer commented Sep 17, 2025

Uh oh!

malmans2 commented Sep 17, 2025

Uh oh!

shoyer commented Sep 17, 2025

Uh oh!

shoyer commented Sep 23, 2025

Uh oh!

kmuehlbauer left a comment

Uh oh!

Uh oh!

mraspaud commented Oct 1, 2025

Uh oh!

djhoese commented Oct 3, 2025

Uh oh!

shoyer commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Change default netCDF engine to use h5netcdf and add netcdf_engine_order #10755

Change default netCDF engine to use h5netcdf and add netcdf_engine_order #10755

Uh oh!

Conversation

shoyer commented Sep 16, 2025

Uh oh!

shoyer commented Sep 16, 2025

Uh oh!

dcherian commented Sep 16, 2025

Uh oh!

shoyer commented Sep 17, 2025

Uh oh!

malmans2 commented Sep 17, 2025

Uh oh!

shoyer commented Sep 17, 2025

Uh oh!

shoyer commented Sep 23, 2025

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mraspaud commented Oct 1, 2025

Uh oh!

djhoese commented Oct 3, 2025

Uh oh!

shoyer commented Oct 3, 2025

Uh oh!

Uh oh!