Skip to content

netcdf4 backend claims **all** remote files - preventing reading zarr #10801

@ianhi

Description

@ianhi

What happened?

If you point xr.open_datatree at a remote url with netcdf4 installed then netcdf4

What did you expect to happen?

open wiht zarr (in this case i expect the zarr backend to fail) rather than the netcdf backend

Minimal Complete Verifiable Example

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "xarray[complete]@git+https://github.com/pydata/xarray.git@main",
#   "zarr>=2.18.0",
#   "numpy>=1.24.0",
# ]
# ///
#
# This script automatically imports the development branch of xarray to check for issues.
# Please delete this header if you have _not_ tested this script with `uv run`!

"""Download and test loading OME-Zarr example data."""

import xarray as xr

xr.show_versions()

url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr"
remote_dt = xr.open_datatree(url)
print(remote_dt)

Steps to reproduce

uv run above script

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

syntax error, unexpected WORD_WORD, expecting SCAN_ATTR or SCAN_DATASET or SCAN_ERROR
context: <?xml^ version="1.0" encoding="UTF-8"?><Error><Code>NoSuchKey</Code><BucketName>idr</BucketName><RequestId>tx0000000000000131f743f-0068dbfc97-7518e06c-default</RequestId><HostId>7518e06c-default-default</HostId></Error>
Traceback (most recent call last):
  File "/Users/ian/Documents/dev/xarray/xarray/backends/file_manager.py", line 219, in _acquire_with_cache_info
    file = self._cache[self._key]
           ~~~~~~~~~~~^^^^^^^^^^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/lru_cache.py", line 56, in __getitem__
    value = self._cache[key]
            ~~~~~~~~~~~^^^^^
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), '6a4d8a9f-b9b0-44b1-8fed-a0d8f5bd69bb']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/ian/Documents/dev/xarray/repro.py", line 22, in <module>
    remote_dt = xr.open_datatree(url)
  File "/Users/ian/Documents/dev/xarray/xarray/backends/api.py", line 1066, in open_datatree
    backend_tree = backend.open_datatree(
        filename_or_obj,
    ...<2 lines>...
        **kwargs,
    )
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 792, in open_datatree
    groups_dict = self.open_groups_as_dict(
        filename_or_obj,
    ...<15 lines>...
        **kwargs,
    )
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 839, in open_groups_as_dict
    store = NetCDF4DataStore.open(
        filename_or_obj,
    ...<7 lines>...
        autoclose=autoclose,
    )
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 524, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 428, in __init__
    self.format = self.ds.data_model
                  ^^^^^^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 533, in ds
    return self._acquire()
           ~~~~~~~~~~~~~^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/netCDF4_.py", line 527, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/Users/ian/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/lib/python3.13/contextlib.py", line 141, in __enter__
    return next(self.gen)
  File "/Users/ian/Documents/dev/xarray/xarray/backends/file_manager.py", line 207, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/Users/ian/Documents/dev/xarray/xarray/backends/file_manager.py", line 225, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "src/netCDF4/_netCDF4.pyx", line 2521, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2158, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -90] NetCDF: file not found: 'https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr'

Anything else we need to know?

seemingly due to these lines:

def guess_can_open(self, filename_or_obj: T_PathFileOrDataStore) -> bool:
if isinstance(filename_or_obj, str) and is_remote_uri(filename_or_obj):
return True

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions