Skip to content

Dataset.to_zarr append_dim behaviour is inconsistent with xr.concat for new dimensions. #9892

@owenlittlejohns

Description

@owenlittlejohns

What happened?

This issue relates to #9858, and captures the Dataset.append_dim behaviour noted by @TomNicholas in this comment.

What did you expect to happen?

Dataset.append_dim should be consistent with Dataset.concat, which does not raise an error. For empty Dataset objects, this results in the example Tom provided, with no time dimension, because there are no variables using that time dimension. For Dataset objects with variables, using Dataset.append_dim on those Datasets should introduce a new, single-element, dimension added to the contained variables, and the output variables should be concatenated along that new dimension.

Minimal Complete Verifiable Example

import xarray as xr

# Create Datasets
ds_one = xr.Dataset(
    data_vars={"temp": (["lat", "lon"], np.array([[270, 271, 270], [273, 272, 272]]))},
    coords={"lat": [10, 20], "lon": [-20, -10, 0]},
)

ds_two = xr.Dataset(
    data_vars={"temp": (["lat", "lon"], np.array([[271, 272, 271], [274, 273, 273]]))},
    coords={"lat": [10, 20], "lon": [-20, -10, 0]},
)

ds.to_zarr("ds.zarr")
ds_two.to_zarr("ds.zarr", append_dim="time")

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

File ~/Documents/git/pydata/xarray/xarray/core/dataset.py:2622, in Dataset.to_zarr(self, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, zarr_format, write_empty_chunks, chunkmanager_store_kwargs)
   2454 """Write dataset contents to a zarr group.
   2455
   2456 Zarr chunks are determined in the following way:
   (...)
   2618     The I/O user guide, with more details and examples.
   2619 """
   2620 from xarray.backends.api import to_zarr
-> 2622 return to_zarr(  # type: ignore[call-overload,misc]
   2623     self,
   2624     store=store,
   2625     chunk_store=chunk_store,
   2626     storage_options=storage_options,
   2627     mode=mode,
   2628     synchronizer=synchronizer,
   2629     group=group,
   2630     encoding=encoding,
   2631     compute=compute,
   2632     consolidated=consolidated,
   2633     append_dim=append_dim,
   2634     region=region,
   2635     safe_chunks=safe_chunks,
   2636     zarr_version=zarr_version,
   2637     zarr_format=zarr_format,
   2638     write_empty_chunks=write_empty_chunks,
   2639     chunkmanager_store_kwargs=chunkmanager_store_kwargs,
   2640 )

File ~/Documents/git/pydata/xarray/xarray/backends/api.py:2184, in to_zarr(dataset, store, chunk_store, mode, synchronizer, group, encoding, compute, consolidated, append_dim, region, safe_chunks, storage_options, zarr_version, zarr_format, write_empty_chunks, chunkmanager_store_kwargs)
   2182 writer = ArrayWriter()
   2183 # TODO: figure out how to properly handle unlimited_dims
-> 2184 dump_to_store(dataset, zstore, writer, encoding=encoding)
   2185 writes = writer.sync(
   2186     compute=compute, chunkmanager_store_kwargs=chunkmanager_store_kwargs
   2187 )
   2189 if compute:

File ~/Documents/git/pydata/xarray/xarray/backends/api.py:1920, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1917 if encoder:
   1918     variables, attrs = encoder(variables, attrs)
-> 1920 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/Documents/git/pydata/xarray/xarray/backends/zarr.py:907, in ZarrStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    905     existing_dims = self.get_dimensions()
    906     if self._append_dim not in existing_dims:
--> 907         raise ValueError(
    908             f"append_dim={self._append_dim!r} does not match any existing "
    909             f"dataset dimensions {existing_dims}"
    910         )
    912 variables_encoded, attributes = self.encode(
    913     {vn: variables[vn] for vn in new_variable_names}, attributes
    914 )
    916 if existing_variable_names:
    917     # We make sure that values to be appended are encoded *exactly*
    918     # as the current values in the store.
    919     # To do so, we decode variables directly to access the proper encoding,
    920     # without going via xarray.Dataset to avoid needing to load
    921     # index variables into memory.

ValueError: append_dim='time' does not match any existing dataset dimensions {'lat': 2, 'lon': 3}

Anything else we need to know?

No response

Environment

Details

<function xarray.util.print_versions.show_versions(file=<_io.TextIOWrapper name='' mode='w' encoding='utf-8'>)>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugneeds triageIssue that has not been reviewed by xarray team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions