Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 178 additions & 0 deletions doc/user-guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,182 @@ You can learn more about using and developing backends in the
linkStyle default font-size:18pt,stroke-width:4


.. _io.backend_resolution:

Backend Selection
-----------------

When opening a file or URL without explicitly specifying the ``engine`` parameter,
xarray automatically selects an appropriate backend based on the file path or URL.
The backends are tried in order: **netcdf4 → h5netcdf → scipy → pydap → zarr**.

.. note::
You can customize the order in which netCDF backends are tried using the
``netcdf_engine_order`` option in :py:func:`~xarray.set_options`:

.. code-block:: python

# Prefer h5netcdf over netcdf4
xr.set_options(netcdf_engine_order=['h5netcdf', 'netcdf4', 'scipy'])

See :ref:`options` for more details on configuration options.

The following tables show which backend will be selected for different types of URLs and files.

.. important::
✅ means the backend will **guess it can open** the URL or file based on its path, extension,
or magic number, but this doesn't guarantee success. For example, not all Zarr stores are
xarray-compatible.

❌ means the backend will not attempt to open it.

Remote URL Resolution
~~~~~~~~~~~~~~~~~~~~~

.. list-table::
:header-rows: 1
:widths: 50 10 10 10 10 10

* - URL
- :ref:`netcdf4 <io.netcdf>`
- :ref:`h5netcdf <io.hdf5>`
- :ref:`scipy <io.netcdf>`
- :ref:`pydap <io.opendap>`
- :ref:`zarr <io.zarr>`
* - ``https://example.com/store.zarr``
- ❌
- ❌
- ❌
- ❌
- ✅
* - ``https://example.com/data.nc``
- ✅
- ✅
- ❌
- ❌
- ❌
* - ``http://example.com/data.nc?var=temp``
- ✅
- ❌
- ❌
- ❌
- ❌
* - ``http://example.com/dap4/data.nc?var=x``
- ✅
- ❌
- ❌
- ✅
- ❌
* - ``dap2://opendap.nasa.gov/dataset``
- ❌
- ❌
- ❌
- ✅
- ❌
* - ``https://example.com/DAP4/data``
- ❌
- ❌
- ❌
- ✅
- ❌
* - ``http://test.opendap.org/dap4/file.nc4``
- ✅
- ✅
- ❌
- ✅
- ❌
* - ``https://example.com/DAP4/data.nc``
- ✅
- ✅
- ❌
- ✅
- ❌

Local File Resolution
~~~~~~~~~~~~~~~~~~~~~

For local files, backends first try to read the file's **magic number** (first few bytes).
If the magic number **cannot be read** (e.g., file doesn't exist, no permissions), they fall
back to checking the file **extension**. If the magic number is readable but invalid, the
backend returns False (does not fall back to extension).

.. list-table::
:header-rows: 1
:widths: 40 20 10 10 10 10

* - File Path
- Magic Number
- :ref:`netcdf4 <io.netcdf>`
- :ref:`h5netcdf <io.hdf5>`
- :ref:`scipy <io.netcdf>`
- :ref:`zarr <io.zarr>`
* - ``/path/to/file.nc``
- ``CDF\x01`` (netCDF3)
- ✅
- ❌
- ✅
- ❌
* - ``/path/to/file.nc4``
- ``\x89HDF\r\n\x1a\n`` (HDF5/netCDF4)
- ✅
- ✅
- ❌
- ❌
* - ``/path/to/file.nc.gz``
- ``\x1f\x8b`` + ``CDF`` inside
- ❌
- ❌
- ✅
- ❌
* - ``/path/to/store.zarr/``
- (directory)
- ❌
- ❌
- ❌
- ✅
* - ``/path/to/file.nc``
- *(no magic number)*
- ✅
- ✅
- ✅
- ❌
* - ``/path/to/file.xyz``
- ``CDF\x01`` (netCDF3)
- ✅
- ❌
- ✅
- ❌
* - ``/path/to/file.xyz``
- ``\x89HDF\r\n\x1a\n`` (HDF5/netCDF4)
- ✅
- ✅
- ❌
- ❌
* - ``/path/to/file.xyz``
- *(no magic number)*
- ❌
- ❌
- ❌
- ❌

.. note::
Remote URLs ending in ``.nc`` are **ambiguous**:

- They could be netCDF files stored on a remote HTTP server (readable by ``netcdf4`` or ``h5netcdf``)
- They could be OPeNDAP/DAP endpoints (readable by ``netcdf4`` with DAP support or ``pydap``)

These interpretations are fundamentally incompatible. If xarray's automatic
selection chooses the wrong backend, you must explicitly specify the ``engine`` parameter:

.. code-block:: python

# Force interpretation as a DAP endpoint
ds = xr.open_dataset("http://example.com/data.nc", engine="pydap")

# Force interpretation as a remote netCDF file
ds = xr.open_dataset("https://example.com/data.nc", engine="netcdf4")


.. _io.netcdf:

netCDF
Expand Down Expand Up @@ -1213,6 +1389,8 @@ See for example : `ncdata usage examples`_
.. _Ncdata: https://ncdata.readthedocs.io/en/latest/index.html
.. _ncdata usage examples: https://github.com/pp-mo/ncdata/tree/v0.1.2?tab=readme-ov-file#correct-a-miscoded-attribute-in-iris-input

.. _io.opendap:

OPeNDAP
-------

Expand Down
2 changes: 1 addition & 1 deletion doc/user-guide/options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Xarray offers a small number of configuration options through :py:func:`set_opti

2. Control behaviour during operations: ``arithmetic_join``, ``keep_attrs``, ``use_bottleneck``.
3. Control colormaps for plots:``cmap_divergent``, ``cmap_sequential``.
4. Aspects of file reading: ``file_cache_maxsize``, ``warn_on_unclosed_files``.
4. Aspects of file reading: ``file_cache_maxsize``, ``netcdf_engine_order``, ``warn_on_unclosed_files``.


You can set these options either globally
Expand Down
8 changes: 7 additions & 1 deletion doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

.. _whats-new:


What's New
==========

Expand All @@ -28,6 +29,11 @@ Deprecations
Bug Fixes
~~~~~~~~~

- ``netcdf4`` and ``pydap`` backends now use stricter URL detection to avoid incorrectly claiming
remote URLs. The ``pydap`` backend now only claims URLs with explicit DAP protocol indicators
(``dap2://`` or ``dap4://`` schemes, or ``/dap2/`` or ``/dap4/`` in the URL path). This prevents
both backends from claiming remote Zarr stores and other non-DAP URLs without an explicit
``engine=`` argument. (:pull:`10804`). By `Ian Hunt-Isaak <https://github.com/ianhi>`_.

Documentation
~~~~~~~~~~~~~
Expand Down Expand Up @@ -63,12 +69,12 @@ New features

Bug fixes
~~~~~~~~~

- Fix error raised when writing scalar variables to Zarr with ``region={}``
(:pull:`10796`).
By `Stephan Hoyer <https://github.com/shoyer>`_.



.. _whats-new.2025.09.1:

v2025.09.1 (September 29, 2025)
Expand Down
12 changes: 9 additions & 3 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,10 +462,16 @@ class H5netcdfBackendEntrypoint(BackendEntrypoint):
supports_groups = True

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
from xarray.core.utils import is_remote_uri

filename_or_obj = _normalize_filename_or_obj(filename_or_obj)
magic_number = try_read_magic_number_from_file_or_path(filename_or_obj)
if magic_number is not None:
return magic_number.startswith(b"\211HDF\r\n\032\n")

# Try to read magic number for local files only
is_remote = isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj)
if not is_remote:
magic_number = try_read_magic_number_from_file_or_path(filename_or_obj)
if magic_number is not None:
return magic_number.startswith(b"\211HDF\r\n\032\n")

if isinstance(filename_or_obj, str | os.PathLike):
_, ext = os.path.splitext(filename_or_obj)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally not stripping any query params that might be present in dap query so that h5netcdf does not claim to be able to open it, as it's my undersstanding that it cannot

Expand Down
39 changes: 27 additions & 12 deletions xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -701,21 +701,36 @@ class NetCDF4BackendEntrypoint(BackendEntrypoint):
supports_groups = True

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
return True
# Helper to check if magic number is netCDF or HDF5
def _is_netcdf_magic(magic: bytes) -> bool:
return magic.startswith((b"CDF", b"\211HDF\r\n\032\n"))

# Helper to check if extension is netCDF
def _has_netcdf_ext(path: str | os.PathLike, is_remote: bool = False) -> bool:
from xarray.core.utils import strip_uri_params

path = str(path).rstrip("/")
# For remote URIs, strip query parameters and fragments
if is_remote:
path = strip_uri_params(path)
_, ext = os.path.splitext(path)
return ext in {".nc", ".nc4", ".cdf"}

magic_number = (
bytes(filename_or_obj[:8])
if isinstance(filename_or_obj, bytes | memoryview)
else try_read_magic_number_from_path(filename_or_obj)
)
if magic_number is not None:
# netcdf 3 or HDF5
return magic_number.startswith((b"CDF", b"\211HDF\r\n\032\n"))
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
# For remote URIs, check extension (accounting for query params/fragments)
# Remote netcdf-c can handle both regular URLs and DAP URLs
return _has_netcdf_ext(filename_or_obj, is_remote=True)

if isinstance(filename_or_obj, str | os.PathLike):
_, ext = os.path.splitext(filename_or_obj)
return ext in {".nc", ".nc4", ".cdf"}
# For local paths, check magic number first, then extension
magic_number = try_read_magic_number_from_path(filename_or_obj)
if magic_number is not None:
return _is_netcdf_magic(magic_number)
# No magic number available, fallback to extension
return _has_netcdf_ext(filename_or_obj)

if isinstance(filename_or_obj, bytes | memoryview):
return _is_netcdf_magic(bytes(filename_or_obj[:8]))

return False

Expand Down
19 changes: 18 additions & 1 deletion xarray/backends/pydap_.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from __future__ import annotations

import os
from collections.abc import Iterable
from typing import TYPE_CHECKING, Any

Expand Down Expand Up @@ -209,7 +210,23 @@ class PydapBackendEntrypoint(BackendEntrypoint):
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.PydapBackendEntrypoint.html"

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
return isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj)
if not isinstance(filename_or_obj, str):
return False

# Check for explicit DAP protocol indicators:
# 1. DAP scheme: dap2:// or dap4:// (case-insensitive, may not be recognized by is_remote_uri)
# 2. Remote URI with /dap2/ or /dap4/ in URL path (case-insensitive)
# Note: We intentionally do NOT check for .dap suffix as that would match
# file extensions like .dap which trigger downloads of binary data
url_lower = filename_or_obj.lower()
if url_lower.startswith(("dap2://", "dap4://")):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mikejmnez is it ok that this will accept both DAP2:// and dap2://?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I tried it and it works with pydap. same for DAP4 v dap4

return True

# For standard remote URIs, check for DAP indicators in path
if is_remote_uri(filename_or_obj):
return "/dap2/" in url_lower or "/dap4/" in url_lower

return False

def open_dataset(
self,
Expand Down
19 changes: 14 additions & 5 deletions xarray/backends/scipy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,12 +330,12 @@ class ScipyBackendEntrypoint(BackendEntrypoint):
"""
Backend for netCDF files based on the scipy package.

It can open ".nc", ".nc4", ".cdf" and ".gz" files but will only be
It can open ".nc", ".cdf", and "nc..gz" files but will only be
selected as the default if the "netcdf4" and "h5netcdf" engines are
not available. It has the advantage that is is a lightweight engine
that has no system requirements (unlike netcdf4 and h5netcdf).

Additionally it can open gizp compressed (".gz") files.
Additionally it can open gzip compressed (".gz") files.

For more information about the underlying library, visit:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.netcdf_file.html
Expand All @@ -347,14 +347,21 @@ class ScipyBackendEntrypoint(BackendEntrypoint):
backends.H5netcdfBackendEntrypoint
"""

description = "Open netCDF files (.nc, .nc4, .cdf and .gz) using scipy in Xarray"
description = "Open netCDF files (.nc, .cdf and .nc.gz) using scipy in Xarray"
url = "https://docs.xarray.dev/en/stable/generated/xarray.backends.ScipyBackendEntrypoint.html"

def guess_can_open(
self,
filename_or_obj: T_PathFileOrDataStore,
) -> bool:
from xarray.core.utils import is_remote_uri

filename_or_obj = _normalize_filename_or_obj(filename_or_obj)

# scipy can only handle local files - check this before trying to read magic number
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
return False

magic_number = try_read_magic_number_from_file_or_path(filename_or_obj)
if magic_number is not None and magic_number.startswith(b"\x1f\x8b"):
with gzip.open(filename_or_obj) as f: # type: ignore[arg-type]
Expand All @@ -363,8 +370,10 @@ def guess_can_open(
return magic_number.startswith(b"CDF")

if isinstance(filename_or_obj, str | os.PathLike):
_, ext = os.path.splitext(filename_or_obj)
return ext in {".nc", ".nc4", ".cdf", ".gz"}
from pathlib import Path

suffix = "".join(Path(filename_or_obj).suffixes)
return suffix in {".nc", ".cdf", ".nc.gz"}

return False

Expand Down
Loading
Loading