Skip to content

TypeError: xarray.backends.api.open_dataset() got multiple values for keyword argument 'engine' #17

@weiji14

Description

@weiji14

Opening a STAC asset which has already set the xarray:open_kwargs field can result in a duplicated engine argument, causing a TypeError.

Here's a minimum working example using one of the CIL Global Downscaled Projections for Climate Impacts Research Zarr datasets on Planetary Computer adapted from https://discourse.pangeo.io/t/stac-and-earth-systems-datasets/1472/24.

import pystac_client
import xarray as xr

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
)
search: pystac_client.ItemSearch = catalog.search(
    collections=["cil-gdpcir-cc-by"],
    query={"cmip6:source_id": {"eq": "NESM3"}, "cmip6:experiment_id": {"eq": "ssp585"}},
)

asset = search.item_collection()[0].assets["tasmax"]
print(asset.extra_fields)
# {
#     "cmip6:grid": "T63",
#     "msft:https-url": "https://rhgeuwest.blob.core.windows.net/cil-gdpcir/ScenarioMIP/NUIST/NESM3/ssp585/r1i1p1f1/day/tasmax/v1.1.zarr",
#     "cmip6:grid_label": "gn",
#     "cmip6:tracking_id": "hdl:21.14100/01f58d9c-1317-467e-813d-a2358cdf7954\\\\\\\\nhdl:21.14100/e027cda7-ac28-4e00-b03f-53ad2e6d30b4\\\\\\\\nhdl:21.14100/3898bc79-5174-4fcc-9637-4573914ad548",
#     "cmip6:variable_id": "tasmax",
#     "xarray:open_kwargs": {
#         "chunks": {},
#         "engine": "zarr",
#         "consolidated": True,
#         "storage_options": {"account_name": "rhgeuwest"},
#     },
#     "cmip6:creation_date": "2019-08-11T09:50:24Z",
# }

ds = xr.open_dataset(filename_or_obj=asset)

results in the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[73], line 1
----> 1 ds = xr.open_dataset(filename_or_obj=asset)

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xarray/backends/api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    483 decoders = _resolve_decoders_kwargs(
    484     decode_cf,
    485     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    491     decode_coords=decode_coords,
    492 )
    494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 495 backend_ds = backend.open_dataset(
    496     filename_or_obj,
    497     drop_variables=drop_variables,
    498     **decoders,
    499     **kwargs,
    500 )
    501 ds = _dataset_from_backend_dataset(
    502     backend_ds,
    503     filename_or_obj,
   (...)
    510     **kwargs,
    511 )
    512 return ds

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xpystac/xarray_plugin.py:17, in STACBackend.open_dataset(self, obj, drop_variables, **kwargs)
     10 def open_dataset(
     11     self,
     12     obj,
   (...)
     15     **kwargs,
     16 ):
---> 17     return to_xarray(obj, drop_variables=drop_variables, **kwargs)

File ~/mambaforge/envs/zen3geo/lib/python3.10/functools.py:889, in singledispatch.<locals>.wrapper(*args, **kw)
    885 if not args:
    886     raise TypeError(f'{funcname} requires at least '
    887                     '1 positional argument')
--> 889 return dispatch(args[0].__class__)(*args, **kw)

File ~/mambaforge/envs/zen3geo/lib/python3.10/site-packages/xpystac/core.py:67, in _(obj, **kwargs)
     64 else:
     65     default_kwargs = {}
---> 67 ds = xarray.open_dataset(obj.href, **default_kwargs, **open_kwargs, **kwargs)
     68 return ds

TypeError: xarray.backends.api.open_dataset() got multiple values for keyword argument 'engine'

Suggested fix would be to check that the engine argument is not already set in the open_kwargs variable before setting it in the default_kwargs here:

https://github.com/jsignell/xpystac/blob/051d0ac15b42a60dfc330b6cacf5e8653d4edbd9/xpystac/core.py#L33-L68

This was ran using xpystac=0.0.1. I'll spend some time to submit a quick bugfix 😃

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions