Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis strategy for generating Variable objects #8404

Merged
merged 176 commits into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from 162 commits
Commits
Show all changes
176 commits
Select commit Hold shift + click to select a range
587ebb8
copied files defining strategies over to this branch
TomNicholas Aug 11, 2022
acbfa69
placed testing functions in their own directory
TomNicholas Aug 11, 2022
73d763f
moved hypothesis strategies into new testing directory
TomNicholas Aug 11, 2022
db2deff
begin type hinting strategies
TomNicholas Aug 11, 2022
746cfc8
renamed strategies for consistency with hypothesis conventions
TomNicholas Aug 11, 2022
03cd9de
added strategies to public API (with experimental warning)
TomNicholas Aug 11, 2022
2fe3583
strategies for chunking patterns
TomNicholas Aug 11, 2022
4db3629
rewrote variables strategy to have same signature as Variable constru…
TomNicholas Aug 13, 2022
14d11aa
test variables strategy
TomNicholas Aug 13, 2022
418a359
fixed most tests
TomNicholas Aug 13, 2022
c8a7d0e
added helpers so far to API docs
TomNicholas Aug 13, 2022
d48aceb
add hypothesis to docs CI env
TomNicholas Aug 13, 2022
a20e341
add todo about attrs
TomNicholas Aug 13, 2022
3a4816f
draft of new user guide page on testing
TomNicholas Aug 13, 2022
d0406a2
types for dataarrays strategy
TomNicholas Aug 13, 2022
65a222d
draft for chained chunking example
TomNicholas Aug 13, 2022
e1d718a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 13, 2022
57d0f5b
only accept strategy objects
TomNicholas Aug 14, 2022
82c734c
fixed failure with passing in two custom strategies that must be comp…
TomNicholas Aug 14, 2022
029f19a
syntax error in example
TomNicholas Aug 14, 2022
46895fe
allow sizes dict as argument to variables
TomNicholas Aug 15, 2022
50c62e9
copied subsequences_of strategy
TomNicholas Aug 15, 2022
e21555a
coordinate_variables generates non-dimensional coords
TomNicholas Aug 15, 2022
1688779
dataarrays strategy given nothing working!
TomNicholas Aug 15, 2022
0a29d32
improved docstrings
TomNicholas Aug 15, 2022
3259849
datasets strategy works (given nothing)
TomNicholas Aug 15, 2022
717fabe
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 15, 2022
d76e5b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2022
c25940c
pass dims or data to dataarrays() strategy
TomNicholas Aug 16, 2022
cd7b065
importorskip hypothesis in tests
TomNicholas Aug 16, 2022
742b18c
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 16, 2022
8e548b1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2022
d1487d4
added warning about inefficient example generation
TomNicholas Aug 16, 2022
c8b53f2
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 16, 2022
8bac610
remove TODO about deterministic examples in docs
TomNicholas Aug 17, 2022
cf3beb5
un-restrict names strategy
TomNicholas Aug 17, 2022
d991357
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
a6405cf
removed convert kwarg
TomNicholas Aug 17, 2022
400ae3e
removed convert kwarg
TomNicholas Aug 17, 2022
3609a34
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
63ad529
avoid using subsequences_of
TomNicholas Aug 17, 2022
4ffbcbd
refactored into separate function for unique subset of dims
TomNicholas Aug 17, 2022
469482d
removed subsequences_of
TomNicholas Aug 17, 2022
472de00
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 17, 2022
ced1a9f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
a3c9ad0
fix draw(st.booleans())
TomNicholas Aug 17, 2022
b387304
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 17, 2022
404111d
remove all references to chunking until chunks strategy merged upstre…
TomNicholas Aug 23, 2022
3764a7b
added example of complicated strategy for dims dict
TomNicholas Aug 23, 2022
9723e45
remove superfluous utils file
TomNicholas Aug 30, 2022
2e44860
removed elements strategy
TomNicholas Aug 30, 2022
1cc073b
removed np_arrays strategy from public API
TomNicholas Aug 30, 2022
603e6bb
min_ndims -> min_dims
TomNicholas Aug 30, 2022
63bb362
forbid non-matching dims and data completely
TomNicholas Aug 31, 2022
69ec230
simple test for data_variables strategy
TomNicholas Aug 31, 2022
e5c7e23
passing arguments to datasets strategy
TomNicholas Sep 2, 2022
fd3d357
Merge branch 'main' into hypothesis-strategies
TomNicholas Sep 2, 2022
52f2490
whatsnew
TomNicholas Sep 2, 2022
9b96470
add attrs strategy
TomNicholas Sep 2, 2022
41fe0b4
autogenerate attrs for all objects
TomNicholas Sep 2, 2022
0e53aa1
attempt to make attrs strategy quicker
TomNicholas Sep 2, 2022
f659b4b
extend deadline
TomNicholas Sep 2, 2022
d1be3ee
attempt to speed up attrs strategy
TomNicholas Sep 6, 2022
e88f5f0
promote all strategies to be functions
TomNicholas Sep 7, 2022
4b88887
valid_dtypes -> numeric_dtypes
TomNicholas Sep 7, 2022
2a1dc66
changed hypothesis error type
TomNicholas Sep 7, 2022
9bddcec
make all strategies keyword-arg only
TomNicholas Sep 7, 2022
b2887d4
min_length -> min_side
TomNicholas Sep 7, 2022
3b8e8ae
correct error type
TomNicholas Sep 7, 2022
0980061
remove coords kwarg
TomNicholas Sep 7, 2022
0313b3e
test different types of coordinates are sometimes generated
TomNicholas Sep 7, 2022
e6ebb1f
zip dict
TomNicholas Sep 7, 2022
4da8772
add dim_names kwarg to dimension_sizes strategy
TomNicholas Sep 7, 2022
e6d7a34
return a dict from _alignable_variables
TomNicholas Sep 7, 2022
5197d1b
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Sep 7, 2022
15812fd
add coord_names arg to coordinate_variables strategy
TomNicholas Sep 7, 2022
3dc9c7b
Merge branch 'main' into hypothesis-strategies
TomNicholas Sep 7, 2022
4374681
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2022
0f0c4fb
change typing of dims arg
TomNicholas Sep 7, 2022
6a30af5
support dims as list to datasets strat when data not given
TomNicholas Sep 7, 2022
cac46dc
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Sep 7, 2022
177d908
put coord and data var generation in optional branch to try to improv…
TomNicholas Sep 7, 2022
5424e37
improve simple test example
TomNicholas Sep 7, 2022
c871273
add documentation on creating duck arrays
TomNicholas Sep 7, 2022
7730a27
okexcept for sparse examples
TomNicholas Sep 7, 2022
24549bc
fix sparse dataarrays example
TomNicholas Sep 7, 2022
3082a09
todo about building a duck array dataset
TomNicholas Sep 7, 2022
5df60dc
fix imports and cross-links
TomNicholas Sep 7, 2022
01078de
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2022
53290e2
add hypothesis library to intersphinx mapping
TomNicholas Sep 8, 2022
bd2cb6e
fix many links
TomNicholas Sep 8, 2022
c5e83c2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 8, 2022
de26b2f
fixed all local mypy errors
TomNicholas Sep 9, 2022
f81e14f
move numpy strategies import
TomNicholas Sep 9, 2022
129e2c3
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Sep 9, 2022
601d9e2
Merge branch 'main' into hypothesis-strategies
TomNicholas Sep 10, 2022
af24af5
reduce sizes
TomNicholas Sep 10, 2022
9777c2a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 10, 2022
4dcbc60
Merge branch 'main' into hypothesis-strategies
TomNicholas Jul 20, 2023
7841dd5
fix some api links in docs
TomNicholas Jul 24, 2023
968ee72
Merge branch 'main' into hypothesis-strategies
TomNicholas Nov 2, 2023
fd6aa06
remove every strategy beyond variables
TomNicholas Nov 2, 2023
df3341e
variable strategy now accepts callable generating array strategies
TomNicholas Nov 2, 2023
de4de8f
use only readable unicode characters in names
TomNicholas Nov 2, 2023
af14dc2
examples
TomNicholas Nov 2, 2023
d001dbb
only use unicode characters that docs can deal with
TomNicholas Nov 2, 2023
d4c9cb5
docs: dataarrays -> variables
TomNicholas Nov 2, 2023
7983e34
update tests for variables strategy
TomNicholas Nov 2, 2023
2ad7bb0
test values in attrs dict
TomNicholas Nov 2, 2023
a9f7cd5
duck array type examples
TomNicholas Nov 2, 2023
49a1c64
altered whatsnew
TomNicholas Nov 2, 2023
c1f1974
maybe fix mypy
TomNicholas Nov 2, 2023
6482ad3
fix some mypy errors
TomNicholas Nov 2, 2023
95cab79
more typing changes
TomNicholas Nov 2, 2023
839c4f0
fix import
TomNicholas Nov 2, 2023
ded711a
skip doctests in docstrings
TomNicholas Nov 2, 2023
f3c80ed
fix link to duckarrays page
TomNicholas Nov 2, 2023
010f28c
don't actually try to run cupy in docs env
TomNicholas Nov 2, 2023
4b07992
missed a skip
TomNicholas Nov 2, 2023
ba99afa
okwarning
TomNicholas Nov 2, 2023
700d652
just remove the cupy example
TomNicholas Nov 2, 2023
0e01d76
ensure shape is always passed to array_strategy_fn
TomNicholas Nov 2, 2023
79f40f0
test using make_strategies_namespace
TomNicholas Nov 3, 2023
4ff57ec
test catching array_strategy_fn that returns different dtype
TomNicholas Nov 3, 2023
959222e
test catching array_strategy_fn that returns different shape
TomNicholas Nov 3, 2023
78825c4
generalise test of attrs strategy
TomNicholas Nov 3, 2023
2418a61
remove misguided comments
TomNicholas Nov 3, 2023
331f521
Merge branch 'main' into hypothesis-strategies-variable
TomNicholas Nov 3, 2023
adca1d2
save working version of test_mean
TomNicholas Nov 4, 2023
14998c1
expose unique_subset_of
TomNicholas Nov 4, 2023
71f01f9
generalize unique_subset_of to handle iterables
TomNicholas Nov 4, 2023
9c10895
type hint unique_subset_of using overloads
TomNicholas Nov 4, 2023
2833f01
use iterables in test_mean example
TomNicholas Nov 4, 2023
1ddc515
test_mean example in docs now uses iterable of dimension_names
TomNicholas Nov 4, 2023
618bfea
fix some warnings in docs build
TomNicholas Nov 4, 2023
fe1ff1a
example of passing list to unique_subset_of
TomNicholas Nov 4, 2023
2e038ea
fix import in docs page
TomNicholas Nov 4, 2023
04c3dc1
try to satisfy sphinx
TomNicholas Nov 4, 2023
cf35fb9
Minor corrections to docs
TomNicholas Nov 4, 2023
4811e8a
Add supported_dtypes to list of public strategies in docs
TomNicholas Nov 5, 2023
a036253
Generate number of dimensions in test_given_arbitrary_dims_list
TomNicholas Nov 5, 2023
054a0dc
Update minimum version of hypothesis
TomNicholas Nov 5, 2023
ececa07
fix incorrect indentation in autosummary
TomNicholas Nov 10, 2023
0fa090d
link to docs page on testing
TomNicholas Nov 11, 2023
a9ac6f1
use warning imperative for array API non-compliant dtypes
TomNicholas Nov 11, 2023
43831ce
fix bugs in sparse examples
TomNicholas Nov 11, 2023
62dbe88
add tag for array API standard info
TomNicholas Nov 11, 2023
af5eb25
move no-dependencies-on-other-values-inputs to given decorator
TomNicholas Nov 11, 2023
dc78254
generate everything that can be generated
TomNicholas Nov 11, 2023
5822390
fix internal link to page on strategies
TomNicholas Nov 11, 2023
eeb6b32
split up TypeError messages for each arg
TomNicholas Nov 11, 2023
e13c6ac
use hypothesis.errors.InvalidArgument
TomNicholas Nov 11, 2023
a169e1f
generalize tests for generating specific number of dimensions
TomNicholas Nov 11, 2023
46b36b9
fix some typing errors
TomNicholas Nov 11, 2023
00ed3d6
test that reduction example in docs actually works
TomNicholas Nov 12, 2023
d265ddb
fix typing errors
TomNicholas Nov 12, 2023
0e872a8
simply generation of sparse arrays in example
TomNicholas Nov 12, 2023
a941e60
Merge branch 'main' into hypothesis-strategies-variable
TomNicholas Nov 12, 2023
3d43ed6
fix impot in docs example
TomNicholas Nov 13, 2023
6c912d2
Merge branch 'main' into hypothesis-strategies-variable
TomNicholas Nov 13, 2023
bdf3aed
correct type hints in sparse example
TomNicholas Nov 13, 2023
afd526d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 13, 2023
6bbd13b
Use .copy in convert_to_sparse
TomNicholas Nov 13, 2023
29ecd7d
Use st.builds in sparse example
TomNicholas Nov 13, 2023
631e810
correct intersphinx link in whatsnew
TomNicholas Nov 13, 2023
c613027
Merge branch 'hypothesis-strategies-variable' of https://github.com/T…
TomNicholas Nov 13, 2023
4412d98
rename module containing assertion functions
TomNicholas Nov 13, 2023
1ea0dcf
clarify sentence
TomNicholas Nov 13, 2023
cf1a45e
add general ImportError if hypothesis not installed
TomNicholas Nov 13, 2023
ea738cd
add See Also link to strategies docs page from docstring of every str…
TomNicholas Nov 14, 2023
79b0094
typo in ImportError message
TomNicholas Nov 14, 2023
c6d43ca
Merge branch 'main' into hypothesis-strategies-variable
TomNicholas Nov 15, 2023
00079bd
Merge branch 'main' into hypothesis-strategies-variable
TomNicholas Dec 4, 2023
cbcd486
remove extra blank lines in examples
TomNicholas Dec 5, 2023
69ddd08
remove smallish_arrays
TomNicholas Dec 5, 2023
ea90162
Merge branch 'main' into hypothesis-strategies-variable
TomNicholas Dec 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/requirements/doc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ dependencies:
- cartopy
- cfgrib
- dask-core>=2022.1
- hypothesis>=6.75.8
- h5netcdf>=0.13
- ipykernel
- ipywidgets # silence nbsphinx warning
Expand Down
21 changes: 21 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1071,6 +1071,27 @@ Testing
testing.assert_allclose
testing.assert_chunks_equal

Hypothesis Testing Strategies
=============================
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

.. currentmodule:: xarray

See the :ref:`documentation page on testing <testing.hypothesis>` for a guide on how to use these strategies.

.. warning::
These strategies should be considered highly experimental, and liable to change at any time.

.. autosummary::
:toctree: generated/

testing.strategies.supported_dtypes
testing.strategies.names
testing.strategies.dimension_names
testing.strategies.dimension_sizes
testing.strategies.attrs
testing.strategies.variables
testing.strategies.unique_subset_of

Exceptions
==========

Expand Down
1 change: 1 addition & 0 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,7 @@
"dask": ("https://docs.dask.org/en/latest", None),
"cftime": ("https://unidata.github.io/cftime", None),
"sparse": ("https://sparse.pydata.org/en/latest/", None),
"hypothesis": ("https://hypothesis.readthedocs.io/en/latest/", None),
"cubed": ("https://tom-e-white.com/cubed/", None),
"datatree": ("https://xarray-datatree.readthedocs.io/en/latest/", None),
"xarray-tutorial": ("https://tutorial.xarray.dev/", None),
Expand Down
2 changes: 2 additions & 0 deletions doc/internals/duck-arrays-integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ property needs to obey `numpy's broadcasting rules <https://numpy.org/doc/stable
(see also the `Python Array API standard's explanation <https://data-apis.org/array-api/latest/API_specification/broadcasting.html>`_
of these same rules).

.. _internals.duckarrays.array_api_standard:

Python Array API standard support
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
1 change: 1 addition & 0 deletions doc/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@ examples that describe many common tasks that you can accomplish with xarray.
dask
plotting
options
testing
duckarrays
304 changes: 304 additions & 0 deletions doc/user-guide/testing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,304 @@
.. _testing:

Testing your code
=================
Comment on lines +3 to +4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of creating this page is that material on using xarray.testing.assert_equal etc. should go here, before the material on hypothesis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this whole page should go under the xarray internals section of the docs instead of the user guide? Because this realistically is only going to be used by other library developers, not most users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. It is true that the page has a different target audience than the other pages in the user guide, but then again applications can also be tested. And, so far the "internals" section describes implementation details or extension mechanisms that affect the internals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can work this out later. This location seems fine for now, and changing it isn't a backwards-compatibility issue.


.. ipython:: python
:suppress:

import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(123456)

.. _testing.hypothesis:

Hypothesis testing
------------------
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

.. note::

Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look
at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in
`pytest <https://docs.pytest.org/>`_, and have seen the
`hypothesis library documentation <https://hypothesis.readthedocs.io/>`_.

`The hypothesis library <https://hypothesis.readthedocs.io/>`_ is a powerful tool for property-based testing.
Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many
dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set
of all possible integers via :py:func:`hypothesis.strategies.integers()`.

Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs
that you did not even think to look for!

Strategies
~~~~~~~~~~

Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray
data structures containing arbitrary data. You can use these to efficiently test downstream code,
quickly ensuring that your code can handle xarray objects of all possible structures and contents.

These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides

.. currentmodule:: xarray

.. autosummary::

TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
testing.strategies.supported_dtypes
testing.strategies.names
testing.strategies.dimension_names
testing.strategies.dimension_sizes
testing.strategies.attrs
testing.strategies.variables
testing.strategies.unique_subset_of

These build upon the numpy and array API strategies offered in :py:mod:`hypothesis.extra.numpy` and :py:mod:`hypothesis.extra.array_api`:

.. ipython:: python

import hypothesis.extra.numpy as npst

Generating Examples
~~~~~~~~~~~~~~~~~~~

To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method,
which is a general hypothesis method valid for all strategies.

.. ipython:: python

import xarray.testing.strategies as xrst

xrst.variables().example()
xrst.variables().example()
xrst.variables().example()

You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide
range of data that the xarray strategies can generate.

In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the
:py:func:`hypothesis.given` decorator:

.. ipython:: python

from hypothesis import given

.. ipython:: python

@given(xrst.variables())
def test_function_that_acts_on_variables(var):
assert func(var) == ...


Chaining Strategies
~~~~~~~~~~~~~~~~~~~

Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated
examples.

.. ipython:: python

# generate a Variable containing an array with a complex number dtype, but all other details still arbitrary
from hypothesis.extra.numpy import complex_number_dtypes

xrst.variables(dtype=complex_number_dtypes()).example()

This also works with custom strategies, or strategies defined in other packages.
For example you could imagine creating a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array.

Fixing Arguments
~~~~~~~~~~~~~~~~

If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples
over all other aspects, then use :py:func:`hypothesis.strategies.just()`.

.. ipython:: python

import hypothesis.strategies as st

# Generates only variable objects with dimensions ["x", "y"]
xrst.variables(dims=st.just(["x", "y"])).example()

(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a
special strategy that just contains a single example.)

To fix the length of dimensions you can instead pass ``dims`` as a mapping of dimension names to lengths
(i.e. following xarray objects' ``.sizes()`` property), e.g.

.. ipython:: python

# Generates only variables with dimensions ["x", "y"], of lengths 2 & 3 respectively
xrst.variables(dims=st.just({"x": 2, "y": 3})).example()

You can also use this to specify that you want examples which are missing some part of the data structure, for instance

.. ipython:: python

# Generates a Variable with no attributes
xrst.variables(attrs=st.just({})).example()

Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the
objects your chained strategy will generate.

.. ipython:: python

fixed_x_variable_y_maybe_z = st.fixed_dictionaries(
{"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)}
)
Comment on lines +145 to +147
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a great place to introduce the shared() strategy, which is the idiomatic way to meet a common need like "variables might have a z-dimension, which can vary, but if present must be the same size in all variables". It's a complication, but one which I expect many readers will benefit from.

fixed_x_variable_y_maybe_z.example()

special_variables = xrst.variables(dims=fixed_x_variable_y_maybe_z)

special_variables.example()
special_variables.example()

Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a
strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want).
This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of
length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2.
By feeding this strategy for dictionaries into the ``dims`` argument of xarray's :py:func:`~st.variables` strategy,
we can generate arbitrary :py:class:`~xarray.Variable` objects whose dimensions will always match these specifications.

Generating Duck-type Arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a
numpy array (so-called "duck array wrapping", see :ref:`wrapping numpy-like arrays <internals.duckarrays>`).

Imagine we want to write a strategy which generates arbitrary ``Variable`` objects, each of which wraps a
:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways:

1. Create a xarray object with numpy data and use the hypothesis' ``.map()`` method to convert the underlying array to a
different type:

.. ipython:: python

import sparse

.. ipython:: python

def convert_to_sparse(var):
var.data = sparse.COO.from_numpy(var.to_numpy())
return var
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

.. ipython:: python

sparse_variables = xrst.variables(dims=xrst.dimension_names(min_dims=1)).map(
convert_to_sparse
)

sparse_variables.example()
sparse_variables.example()

2. Pass a function which returns a strategy which generates the duck-typed arrays directly to the ``array_strategy_fn`` argument of the xarray strategies:

.. ipython:: python

@st.composite
def sparse_random_arrays(draw, shape: tuple[int]) -> sparse._coo.core.COO:
"""Strategy which generates random sparse.COO arrays"""
if shape is None:
shape = draw(npst.array_shapes())
density = draw(st.integers(min_value=0, max_value=1))
return sparse.random(
shape=shape, density=density
) # note sparse.random does not accept a dtype kwarg
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved


def sparse_random_arrays_fn(
*, shape: tuple[int, ...], dtype: np.dtype
) -> st.SearchStrategy[sparse._coo.core.COO]:
return sparse_random_arrays(shape=shape)


.. ipython:: python

sparse_random_variables = xrst.variables(
array_strategy_fn=sparse_random_arrays_fn, dtype=st.just(np.dtype("float64"))
)
sparse_random_variables.example()

Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you
want to wrap.

Compatibility with the Python Array API Standard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray aims to be compatible with any duck-array type that conforms to the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_
(see our :ref:`docs on Array API Standard support <internals.duckarrays.array_api_standard>`).

.. warning::

The strategies defined in :py:mod:`testing.strategies` are **not** guaranteed to use array API standard-compliant
dtypes by default.
For example arrays with the dtype ``np.dtype('float16')`` may be generated by :py:func:`testing.strategies.variables`
(assuming the ``dtype`` kwarg was not explicitly passed), despite ``np.dtype('float16')`` not being in the
array API standard.

If the array type you want to generate has an array API-compliant top-level namespace
(e.g. that which is conventionally imported as ``xp`` or similar),
you can use this neat trick:

.. ipython:: python
:okwarning:

from numpy import array_api as xp # available in numpy 1.26.0

from hypothesis.extra.array_api import make_strategies_namespace

xps = make_strategies_namespace(xp)

xp_variables = xrst.variables(
array_strategy_fn=xps.arrays,
dtype=xps.scalar_dtypes(),
)
xp_variables.example()
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

Another array API-compliant duck array library would replace the import, e.g. ``import cupy as cp`` instead.

Testing over Subsets of Dimensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A common task when testing xarray user code is checking that your function works for all valid input dimensions.
We can chain strategies to achieve this, for which the helper strategy :py:func:`~testing.strategies.unique_subset_of`
is useful.

It works for lists of dimension names

.. ipython:: python

dims = ["x", "y", "z"]
xrst.unique_subset_of(dims).example()
xrst.unique_subset_of(dims).example()

as well as for mappings of dimension names to sizes

.. ipython:: python

dim_sizes = {"x": 2, "y": 3, "z": 4}
xrst.unique_subset_of(dim_sizes).example()
xrst.unique_subset_of(dim_sizes).example()

This is useful because operations like reductions can be performed over any subset of the xarray object's dimensions.
For example we can write a pytest test that tests that a reduction gives the expected result when applying that reduction
along any possible valid subset of the Variable's dimensions.

.. code-block:: python

import numpy.testing as npt


@given(st.data(), xrst.variables(dims=xrst.dimension_names(min_dims=1)))
def test_mean(data, var):
"""Test that the mean of an xarray Variable is always equal to the mean of the underlying array."""

# specify arbitrary reduction along at least one dimension
reduction_dims = data.draw(xrst.unique_subset_of(var.dims, min_size=1))

# create expected result (using nanmean because arrays with Nans will be generated)
reduction_axes = tuple(var.get_axis_num(dim) for dim in reduction_dims)
expected = np.nanmean(var.data, axis=reduction_axes)

# assert property is always satisfied
result = var.mean(dim=reduction_dims).data
npt.assert_equal(expected, result)
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ v2023.10.2 (unreleased)
New Features
~~~~~~~~~~~~

- Added hypothesis strategies for generating :py:class:`xarray.Variable` objects containing arbitrary data, useful for parametrizing downstream tests.
Accessible under :py:func:`testing.strategies`, and documented in a new page on testing in the User Guide.
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
(:issue:`6911`, :pull:`8404`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.

- Use `opt_einsum <https://optimized-einsum.readthedocs.io/en/stable/>`_ for :py:func:`xarray.dot` by default if installed.
By `Deepak Cherian <https://github.com/dcherian>`_. (:issue:`7764`, :pull:`8373`).
- Add ``DataArray.dt.total_seconds()`` method to match the Pandas API. (:pull:`8435`).
Expand Down
3 changes: 2 additions & 1 deletion xarray/core/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,8 @@ def copy(

# Temporary placeholder for indicating an array api compliant type.
# hopefully in the future we can narrow this down more:
T_DuckArray = TypeVar("T_DuckArray", bound=Any)
T_DuckArray = TypeVar("T_DuckArray", bound=Any, covariant=True)


ScalarOrArray = Union["ArrayLike", np.generic, np.ndarray, "DaskArray"]
VarCompatible = Union["Variable", "ScalarOrArray"]
Expand Down