Skip to content

Commit

Permalink
Merge branch 'main' into optimize-zarr-appends
Browse files Browse the repository at this point in the history
  • Loading branch information
dcherian committed Mar 28, 2024
2 parents e0a3e10 + cf36559 commit 0dc71d4
Show file tree
Hide file tree
Showing 36 changed files with 563 additions and 452 deletions.
6 changes: 4 additions & 2 deletions doc/user-guide/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -748,7 +748,7 @@ Whether array indexing returns a view or a copy of the underlying
data depends on the nature of the labels.

For positional (integer)
indexing, xarray follows the same rules as NumPy:
indexing, xarray follows the same `rules`_ as NumPy:

* Positional indexing with only integers and slices returns a view.
* Positional indexing with arrays or lists returns a copy.
Expand All @@ -765,8 +765,10 @@ Whether data is a copy or a view is more predictable in xarray than in pandas, s
unlike pandas, xarray does not produce `SettingWithCopy warnings`_. However, you
should still avoid assignment with chained indexing.

.. _SettingWithCopy warnings: https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
Note that other operations (such as :py:meth:`~xarray.DataArray.values`) may also return views rather than copies.

.. _SettingWithCopy warnings: https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
.. _rules: https://numpy.org/doc/stable/user/basics.copies.html

.. _multi-level indexing:

Expand Down
22 changes: 16 additions & 6 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,26 @@ v2024.03.0 (unreleased)
New Features
~~~~~~~~~~~~

- Grouped and resampling quantile calculations now use the vectorized algorithm in ``flox>=0.9.4`` if present.
By `Deepak Cherian <https://github.com/dcherian>`_.
- Do not broadcast in arithmetic operations when global option ``arithmetic_broadcast=False``
(:issue:`6806`, :pull:`8784`).
By `Etienne Schalk <https://github.com/etienneschalk>`_ and `Deepak Cherian <https://github.com/dcherian>`_.
- Add the ``.oindex`` property to Explicitly Indexed Arrays for orthogonal indexing functionality. (:issue:`8238`, :pull:`8750`)
By `Anderson Banihirwe <https://github.com/andersy005>`_.

- Add the ``.vindex`` property to Explicitly Indexed Arrays for vectorized indexing functionality. (:issue:`8238`, :pull:`8780`)
By `Anderson Banihirwe <https://github.com/andersy005>`_.

- Expand use of ``.oindex`` and ``.vindex`` properties. (:pull: `8790`)
By `Anderson Banihirwe <https://github.com/andersy005>`_ and `Deepak Cherian <https://github.com/dcherian>`_.
- Allow creating :py:class:`xr.Coordinates` objects with no indexes (:pull:`8711`)
By `Benoit Bovy <https://github.com/benbovy>`_ and `Tom Nicholas
<https://github.com/TomNicholas>`_.

Breaking changes
~~~~~~~~~~~~~~~~

- Don't allow overwriting index variables with ``to_zarr`` region writes. (:issue:`8589`, :pull:`8876`).
By `Deepak Cherian <https://github.com/dcherian>`_.

Deprecations
~~~~~~~~~~~~
Expand All @@ -57,15 +62,17 @@ Bug fixes
`CFMaskCoder`/`CFScaleOffsetCoder` (:issue:`2304`, :issue:`5597`,
:issue:`7691`, :pull:`8713`, see also discussion in :pull:`7654`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
- do not cast `_FillValue`/`missing_value` in `CFMaskCoder` if `_Unsigned` is provided
- Do not cast `_FillValue`/`missing_value` in `CFMaskCoder` if `_Unsigned` is provided
(:issue:`8844`, :pull:`8852`).
- Adapt handling of copy keyword argument for numpy >= 2.0dev
(:issue:`8844`, :pull:`8851`, :pull:`8865``).
(:issue:`8844`, :pull:`8851`, :pull:`8865`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.
- import trapz/trapezoid depending on numpy version.
- Import trapz/trapezoid depending on numpy version
(:issue:`8844`, :pull:`8865`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.

- Warn and return bytes undecoded in case of UnicodeDecodeError in h5netcdf-backend
(:issue:`5563`, :pull:`8874`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.


Documentation
Expand All @@ -77,6 +84,9 @@ Internal Changes
- Migrates ``treenode`` functionality into ``xarray/core`` (:pull:`8757`)
By `Matt Savoie <https://github.com/flamingbear>`_ and `Tom Nicholas
<https://github.com/TomNicholas>`_.
- Migrates ``datatree`` functionality into ``xarray/core``. (:pull: `8789`)
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_, `Matt Savoie
<https://github.com/flamingbear>`_ and `Tom Nicholas <https://github.com/TomNicholas>`_.


.. _whats-new.2024.02.0:
Expand Down
20 changes: 6 additions & 14 deletions xarray/backends/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
T_NetcdfTypes = Literal[
"NETCDF4", "NETCDF4_CLASSIC", "NETCDF3_64BIT", "NETCDF3_CLASSIC"
]
from xarray.datatree_.datatree import DataTree
from xarray.core.datatree import DataTree

DATAARRAY_NAME = "__xarray_dataarray_name__"
DATAARRAY_VARIABLE = "__xarray_dataarray_variable__"
Expand Down Expand Up @@ -1562,24 +1562,19 @@ def _auto_detect_regions(ds, region, open_kwargs):
return region


def _validate_and_autodetect_region(
ds, region, mode, open_kwargs
) -> tuple[dict[str, slice], bool]:
def _validate_and_autodetect_region(ds, region, mode, open_kwargs) -> dict[str, slice]:
if region == "auto":
region = {dim: "auto" for dim in ds.dims}

if not isinstance(region, dict):
raise TypeError(f"``region`` must be a dict, got {type(region)}")

if any(v == "auto" for v in region.values()):
region_was_autodetected = True
if mode != "r+":
raise ValueError(
f"``mode`` must be 'r+' when using ``region='auto'``, got {mode}"
)
region = _auto_detect_regions(ds, region, open_kwargs)
else:
region_was_autodetected = False

for k, v in region.items():
if k not in ds.dims:
Expand Down Expand Up @@ -1612,7 +1607,7 @@ def _validate_and_autodetect_region(
f".drop_vars({non_matching_vars!r})"
)

return region, region_was_autodetected
return region


def _validate_datatypes_for_zarr_append(zstore, dataset):
Expand Down Expand Up @@ -1784,12 +1779,9 @@ def to_zarr(
storage_options=storage_options,
zarr_version=zarr_version,
)
region, region_was_autodetected = _validate_and_autodetect_region(
dataset, region, mode, open_kwargs
)
# drop indices to avoid potential race condition with auto region
if region_was_autodetected:
dataset = dataset.drop_vars(dataset.indexes)
region = _validate_and_autodetect_region(dataset, region, mode, open_kwargs)
# can't modify indexed with region writes
dataset = dataset.drop_vars(dataset.indexes)
if append_dim is not None and append_dim in region:
raise ValueError(
f"cannot list the same dimension in both ``append_dim`` and "
Expand Down
4 changes: 2 additions & 2 deletions xarray/backends/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@
from netCDF4 import Dataset as ncDataset

from xarray.core.dataset import Dataset
from xarray.core.datatree import DataTree
from xarray.core.types import NestedSequence
from xarray.datatree_.datatree import DataTree

# Create a logger object, but don't add any handlers. Leave that to user code.
logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -137,8 +137,8 @@ def _open_datatree_netcdf(
**kwargs,
) -> DataTree:
from xarray.backends.api import open_dataset
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath
from xarray.datatree_.datatree import DataTree

ds = open_dataset(filename_or_obj, **kwargs)
tree_root = DataTree.from_dict({"/": ds})
Expand Down
21 changes: 12 additions & 9 deletions xarray/backends/h5netcdf_.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
from xarray.core import indexing
from xarray.core.utils import (
FrozenDict,
emit_user_level_warning,
is_remote_uri,
read_magic_number_from_file,
try_read_magic_number_from_file_or_path,
Expand All @@ -39,7 +40,7 @@

from xarray.backends.common import AbstractDataStore
from xarray.core.dataset import Dataset
from xarray.datatree_.datatree import DataTree
from xarray.core.datatree import DataTree


class H5NetCDFArrayWrapper(BaseNetCDF4Array):
Expand All @@ -58,21 +59,23 @@ def _getitem(self, key):
return array[key]


def maybe_decode_bytes(txt):
if isinstance(txt, bytes):
return txt.decode("utf-8")
else:
return txt


def _read_attributes(h5netcdf_var):
# GH451
# to ensure conventions decoding works properly on Python 3, decode all
# bytes attributes to strings
attrs = {}
for k, v in h5netcdf_var.attrs.items():
if k not in ["_FillValue", "missing_value"]:
v = maybe_decode_bytes(v)
if isinstance(v, bytes):
try:
v = v.decode("utf-8")
except UnicodeDecodeError:
emit_user_level_warning(
f"'utf-8' codec can't decode bytes for attribute "
f"{k!r} of h5netcdf object {h5netcdf_var.name!r}, "
f"returning bytes undecoded.",
UnicodeWarning,
)
attrs[k] = v
return attrs

Expand Down
2 changes: 1 addition & 1 deletion xarray/backends/netCDF4_.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@

from xarray.backends.common import AbstractDataStore
from xarray.core.dataset import Dataset
from xarray.datatree_.datatree import DataTree
from xarray.core.datatree import DataTree

# This lookup table maps from dtype.byteorder to a readable endian
# string used by netCDF4.
Expand Down
4 changes: 2 additions & 2 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

from xarray.backends.common import AbstractDataStore
from xarray.core.dataset import Dataset
from xarray.datatree_.datatree import DataTree
from xarray.core.datatree import DataTree


# need some special secret attributes to tell us the dimensions
Expand Down Expand Up @@ -1053,8 +1053,8 @@ def open_datatree(
import zarr

from xarray.backends.api import open_dataset
from xarray.core.datatree import DataTree
from xarray.core.treenode import NodePath
from xarray.datatree_.datatree import DataTree

zds = zarr.open_group(filename_or_obj, mode="r")
ds = open_dataset(filename_or_obj, engine="zarr", **kwargs)
Expand Down
Loading

0 comments on commit 0dc71d4

Please sign in to comment.