Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis strategies in xarray.testing.strategies #6908

Open
wants to merge 107 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 89 commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
587ebb8
copied files defining strategies over to this branch
TomNicholas Aug 11, 2022
acbfa69
placed testing functions in their own directory
TomNicholas Aug 11, 2022
73d763f
moved hypothesis strategies into new testing directory
TomNicholas Aug 11, 2022
db2deff
begin type hinting strategies
TomNicholas Aug 11, 2022
746cfc8
renamed strategies for consistency with hypothesis conventions
TomNicholas Aug 11, 2022
03cd9de
added strategies to public API (with experimental warning)
TomNicholas Aug 11, 2022
2fe3583
strategies for chunking patterns
TomNicholas Aug 11, 2022
4db3629
rewrote variables strategy to have same signature as Variable constru…
TomNicholas Aug 13, 2022
14d11aa
test variables strategy
TomNicholas Aug 13, 2022
418a359
fixed most tests
TomNicholas Aug 13, 2022
c8a7d0e
added helpers so far to API docs
TomNicholas Aug 13, 2022
d48aceb
add hypothesis to docs CI env
TomNicholas Aug 13, 2022
a20e341
add todo about attrs
TomNicholas Aug 13, 2022
3a4816f
draft of new user guide page on testing
TomNicholas Aug 13, 2022
d0406a2
types for dataarrays strategy
TomNicholas Aug 13, 2022
65a222d
draft for chained chunking example
TomNicholas Aug 13, 2022
e1d718a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 13, 2022
57d0f5b
only accept strategy objects
TomNicholas Aug 14, 2022
82c734c
fixed failure with passing in two custom strategies that must be comp…
TomNicholas Aug 14, 2022
029f19a
syntax error in example
TomNicholas Aug 14, 2022
46895fe
allow sizes dict as argument to variables
TomNicholas Aug 15, 2022
50c62e9
copied subsequences_of strategy
TomNicholas Aug 15, 2022
e21555a
coordinate_variables generates non-dimensional coords
TomNicholas Aug 15, 2022
1688779
dataarrays strategy given nothing working!
TomNicholas Aug 15, 2022
0a29d32
improved docstrings
TomNicholas Aug 15, 2022
3259849
datasets strategy works (given nothing)
TomNicholas Aug 15, 2022
717fabe
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 15, 2022
d76e5b6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 15, 2022
c25940c
pass dims or data to dataarrays() strategy
TomNicholas Aug 16, 2022
cd7b065
importorskip hypothesis in tests
TomNicholas Aug 16, 2022
742b18c
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 16, 2022
8e548b1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2022
d1487d4
added warning about inefficient example generation
TomNicholas Aug 16, 2022
c8b53f2
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 16, 2022
8bac610
remove TODO about deterministic examples in docs
TomNicholas Aug 17, 2022
cf3beb5
un-restrict names strategy
TomNicholas Aug 17, 2022
d991357
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
a6405cf
removed convert kwarg
TomNicholas Aug 17, 2022
400ae3e
removed convert kwarg
TomNicholas Aug 17, 2022
3609a34
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
63ad529
avoid using subsequences_of
TomNicholas Aug 17, 2022
4ffbcbd
refactored into separate function for unique subset of dims
TomNicholas Aug 17, 2022
469482d
removed subsequences_of
TomNicholas Aug 17, 2022
472de00
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 17, 2022
ced1a9f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 17, 2022
a3c9ad0
fix draw(st.booleans())
TomNicholas Aug 17, 2022
b387304
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Aug 17, 2022
404111d
remove all references to chunking until chunks strategy merged upstre…
TomNicholas Aug 23, 2022
3764a7b
added example of complicated strategy for dims dict
TomNicholas Aug 23, 2022
9723e45
remove superfluous utils file
TomNicholas Aug 30, 2022
2e44860
removed elements strategy
TomNicholas Aug 30, 2022
1cc073b
removed np_arrays strategy from public API
TomNicholas Aug 30, 2022
603e6bb
min_ndims -> min_dims
TomNicholas Aug 30, 2022
63bb362
forbid non-matching dims and data completely
TomNicholas Aug 31, 2022
69ec230
simple test for data_variables strategy
TomNicholas Aug 31, 2022
e5c7e23
passing arguments to datasets strategy
TomNicholas Sep 2, 2022
fd3d357
Merge branch 'main' into hypothesis-strategies
TomNicholas Sep 2, 2022
52f2490
whatsnew
TomNicholas Sep 2, 2022
9b96470
add attrs strategy
TomNicholas Sep 2, 2022
41fe0b4
autogenerate attrs for all objects
TomNicholas Sep 2, 2022
0e53aa1
attempt to make attrs strategy quicker
TomNicholas Sep 2, 2022
f659b4b
extend deadline
TomNicholas Sep 2, 2022
d1be3ee
attempt to speed up attrs strategy
TomNicholas Sep 6, 2022
e88f5f0
promote all strategies to be functions
TomNicholas Sep 7, 2022
4b88887
valid_dtypes -> numeric_dtypes
TomNicholas Sep 7, 2022
2a1dc66
changed hypothesis error type
TomNicholas Sep 7, 2022
9bddcec
make all strategies keyword-arg only
TomNicholas Sep 7, 2022
b2887d4
min_length -> min_side
TomNicholas Sep 7, 2022
3b8e8ae
correct error type
TomNicholas Sep 7, 2022
0980061
remove coords kwarg
TomNicholas Sep 7, 2022
0313b3e
test different types of coordinates are sometimes generated
TomNicholas Sep 7, 2022
e6ebb1f
zip dict
TomNicholas Sep 7, 2022
4da8772
add dim_names kwarg to dimension_sizes strategy
TomNicholas Sep 7, 2022
e6d7a34
return a dict from _alignable_variables
TomNicholas Sep 7, 2022
5197d1b
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Sep 7, 2022
15812fd
add coord_names arg to coordinate_variables strategy
TomNicholas Sep 7, 2022
3dc9c7b
Merge branch 'main' into hypothesis-strategies
TomNicholas Sep 7, 2022
4374681
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2022
0f0c4fb
change typing of dims arg
TomNicholas Sep 7, 2022
6a30af5
support dims as list to datasets strat when data not given
TomNicholas Sep 7, 2022
cac46dc
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Sep 7, 2022
177d908
put coord and data var generation in optional branch to try to improv…
TomNicholas Sep 7, 2022
5424e37
improve simple test example
TomNicholas Sep 7, 2022
c871273
add documentation on creating duck arrays
TomNicholas Sep 7, 2022
7730a27
okexcept for sparse examples
TomNicholas Sep 7, 2022
24549bc
fix sparse dataarrays example
TomNicholas Sep 7, 2022
3082a09
todo about building a duck array dataset
TomNicholas Sep 7, 2022
5df60dc
fix imports and cross-links
TomNicholas Sep 7, 2022
01078de
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 7, 2022
53290e2
add hypothesis library to intersphinx mapping
TomNicholas Sep 8, 2022
bd2cb6e
fix many links
TomNicholas Sep 8, 2022
c5e83c2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 8, 2022
de26b2f
fixed all local mypy errors
TomNicholas Sep 9, 2022
f81e14f
move numpy strategies import
TomNicholas Sep 9, 2022
129e2c3
Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…
TomNicholas Sep 9, 2022
601d9e2
Merge branch 'main' into hypothesis-strategies
TomNicholas Sep 10, 2022
af24af5
reduce sizes
TomNicholas Sep 10, 2022
9777c2a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 10, 2022
4dcbc60
Merge branch 'main' into hypothesis-strategies
TomNicholas Jul 20, 2023
7841dd5
fix some api links in docs
TomNicholas Jul 24, 2023
968ee72
Merge branch 'main' into hypothesis-strategies
TomNicholas Nov 2, 2023
a6fc063
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 2, 2023
6a4a403
Merge branch 'main' into hypothesis-strategies
TomNicholas Apr 1, 2024
0b13771
remove np_arrays strategy
TomNicholas Apr 1, 2024
b44a4a2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 1, 2024
cdcfbf4
fix bad merge of whatsnew
TomNicholas Apr 1, 2024
0aab116
fix bad merge in strategies
TomNicholas Apr 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/requirements/doc.yml
Expand Up @@ -10,6 +10,7 @@ dependencies:
- cfgrib>=0.9
- dask-core>=2.30
- h5netcdf>=0.7.4
- hypothesis
- ipykernel
- ipython
- iris>=2.3
Expand Down
22 changes: 22 additions & 0 deletions doc/api.rst
Expand Up @@ -1060,6 +1060,28 @@ Testing
testing.assert_allclose
testing.assert_chunks_equal

Hypothesis Testing Strategies
=============================

.. currentmodule:: xarray

.. warning::
These strategies should be considered highly experimental, and liable to change at any time.

.. autosummary::
:toctree: generated/

testing.strategies.numeric_dtypes
testing.strategies.names
testing.strategies.dimension_names
testing.strategies.dimension_sizes
testing.strategies.attrs
testing.strategies.variables
testing.strategies.coordinate_variables
testing.strategies.dataarrays
testing.strategies.data_variables
testing.strategies.datasets

Exceptions
==========

Expand Down
1 change: 1 addition & 0 deletions doc/user-guide/index.rst
Expand Up @@ -25,4 +25,5 @@ examples that describe many common tasks that you can accomplish with xarray.
dask
plotting
options
testing
duckarrays
272 changes: 272 additions & 0 deletions doc/user-guide/testing.rst
@@ -0,0 +1,272 @@
.. _testing:

Testing your code
=================

.. ipython:: python
:suppress:

import numpy as np
import pandas as pd
import xarray as xr

np.random.seed(123456)

.. _hypothesis:

Hypothesis testing
------------------

.. note::

Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look
at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in pytest, and have seen
the hypothesis library documentation.

``Hypothesis`` is a powerful library for property-based testing.
Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many
dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set
of all possible ``integers()``.

Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs
that you did not even think to look for!

Strategies
~~~~~~~~~~

Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray
data structures containing arbitrary data. You can use these to efficiently test downstream code,
quickly ensuring that your code can handle xarray objects of all possible structures and contents.

These strategies are accessible in the :py:module::`xarray.testing.strategies` module, which provides

.. currentmodule:: xarray

.. autosummary::

testing.strategies.numeric_dtypes
testing.strategies.np_arrays
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all np_arrays does is wrap around hypothesis.extra.numpy.arrays, so it's probably better not to expose this as public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of that is because xarray only accepts certain dtypes right? I don't have strong feelings about this though, except that users should have all the tools they need to build their own valid xarray strategies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think xarray would be fine with almost every dtype (except maybe the structured dtypes), but sparse in particular is very restricted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest leaving out np_arrays and valid_dtypes, in favor of documenting how to use hypothesis.extra.numpy strategies for Xarray. Users will need to do that anyway for nontrivial tests, and IMO the benefits of a consistent API outweigh the convenience factor for beginning users.

testing.strategies.names
testing.strategies.dimension_names
testing.strategies.dimension_sizes
testing.strategies.attrs
testing.strategies.variables
testing.strategies.coordinate_variables
testing.strategies.dataarrays
testing.strategies.data_variables
testing.strategies.datasets

Generating Examples
~~~~~~~~~~~~~~~~~~~

To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method,
which is a general hypothesis method valid for all strategies.

.. ipython:: python

import xarray.testing.strategies as xrst

xrst.dataarrays().example()
xrst.dataarrays().example()
xrst.dataarrays().example()

You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide
range of data that the xarray strategies can generate.

In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the
``hypothesis.given`` decorator:

.. ipython:: python

from hypothesis import given

.. ipython:: python

@given(xrst.dataarrays())
def test_function_that_acts_on_dataarrays(da):
assert func(da) == ...


Chaining Strategies
~~~~~~~~~~~~~~~~~~~

Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated
examples.

.. ipython:: python

# generate a DataArray with shape (3, 4), but all other details still arbitrary
xrst.dataarrays(
data=xrst.np_arrays(shape=(3, 4), dtype=np.dtype("int32"))
).example()

This also works with custom strategies, or strategies defined in other packages.
For example you could create a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array.

.. warning::
When passing multiple different strategies to the same constructor the drawn examples must be mutually compatible.

In order to construct a valid xarray object to return, our strategies must check that the
variables / dimensions / coordinates are mutually compatible. If you pass multiple custom strategies to a strategy
constructor which are not compatible in all cases, an error will be raised, *even if they are still compatible in
other cases*. For example

.. code-block::

@given(st.data())
def test_something_else_inefficiently(data):
arrs = npst.arrays(dtype=numeric_dtypes) # generates arrays of any shape
dims = xrst.dimension_names() # generates lists of any number of dimensions

# Drawing examples from this strategy will raise a hypothesis.errors.InvalidArgument error.
var = data.draw(xrst.variables(data=arrs, dims=dims))

assert ...

Here we have passed custom strategies which won't often be compatible: only rarely will the array's ``ndims``
correspond to the number of dimensions drawn. We forbid arguments that are only *sometimes* compatible in order to
avoid extremely poor example generation performance (as generating invalid examples and rejecting them is
potentially unboundedly inefficient).


Fixing Arguments
~~~~~~~~~~~~~~~~

If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples
over all other aspects, then use ``hypothesis.strategies.just()``.

.. ipython:: python

import hypothesis.strategies as st

# Generates only dataarrays with dimensions ["x", "y"]
xrst.dataarrays(dims=st.just(["x", "y"])).example()

(This is technically another example of chaining strategies - ``hypothesis.strategies.just`` is simply a special
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved
strategy that just contains a single example.)

To fix the length of dimensions you can instead pass `dims` as a mapping of dimension names to lengths
(i.e. following xarray objects' ``.sizes()`` property), e.g.

.. ipython:: python

# Generates only dataarrays with dimensions ["x", "y"], of lengths 2 & 3 respectively
xrst.dataarrays(dims=st.just({"x": 2, "y": 3})).example()
TomNicholas marked this conversation as resolved.
Show resolved Hide resolved

You can also use this to specify that you want examples which are missing some part of the data structure, for instance

.. ipython:: python

# Generates only dataarrays with no coordinates
xrst.datasets(data_vars=st.just({})).example()

Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the
objects your chained strategy will generate.

.. ipython:: python

fixed_x_variable_y_maybe_z = st.fixed_dictionaries(
{"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)}
)

fixed_x_variable_y_maybe_z.example()

special_dataarrays = xrst.dataarrays(dims=fixed_x_variable_y_maybe_z)

special_dataarrays.example()
special_dataarrays.example()

Here we have used one of hypothesis' built-in strategies ``fixed_dictionaries`` to create a strategy which generates
mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want).
This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of
length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2.
By feeding this strategy for dictionaries into the `dims` argument of xarray's `dataarrays` strategy, we can generate
arbitrary ``DataArray`` objects whose dimensions will always match these specifications.


Creating Duck-type Arrays
~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a
numpy array (so-called "duck array wrapping", see :ref:`internals.duck_arrays`).

Imagine we want to write a strategy which generates arbitrary `DataArray` objects, each of which wraps a
``sparse.COO`` array instead of a ``numpy.ndarray``. How could we do that? There are two ways:

1. Create a xarray object with numpy data and use ``.map()`` to convert the underlying array to a
different type:

.. ipython:: python
:okexcept:

import sparse
import hypothesis.extra.numpy as npst

.. ipython:: python
:okexcept:

def convert_to_sparse(da):
if da.ndim == 0:
return da
else:
da.data = sparse.COO.from_numpy(da.values)
return da

.. ipython:: python
:okexcept:

sparse_dataarrays = xrst.dataarrays().map(convert_to_sparse)

sparse_dataarrays.example()
sparse_dataarrays.example()

2. Pass a strategy which generates the duck-typed arrays directly to the ``data`` argument of the xarray
strategies:

.. ipython:: python
:okexcept:

@st.composite
def sparse_arrays(draw) -> st.SearchStrategy[sparse._coo.core.COO]:
"""Strategy which generates random sparse.COO arrays"""
shape = draw(npst.array_shapes())
density = draw(st.integers(min_value=0, max_value=1))
return sparse.random(shape, density=density)

.. ipython:: python
:okexcept:

sparse_dataarrays = xrst.dataarrays(data=sparse_arrays())

sparse_dataarrays.example()
sparse_dataarrays.example()

Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you
want to wrap.

Creating datasets can be a little more involved. Using method (1) is simple:

.. ipython:: python
:okexcept:

def convert_ds_to_sparse(ds):
return ds.map(convert_to_sparse)

.. ipython:: python
:okexcept:

sparse_datasets = xrst.datasets().map(convert_ds_to_sparse)

sparse_datasets.example()

but building a dataset from scratch (i.e. method (2)) requires building the dataset object in such as way that all of
the data variables have compatible dimensions. You can build up a dictionary of the form ``{var_name: data_variable}``
yourself, or you can use the ``data_vars`` argument to the ``data_variables`` strategy (TODO):

.. ipython:: python
:okexcept:

sparse_data_vars = xrst.data_variables(data=sparse_arrays())
sparse_datasets = xrst.datasets(data_vars=sparse_data_vars)

sparse_datasets.example()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keewis do you have any thoughts on this section? Given that half the point of this PR is to facilitate testing the duck array wrapping.

I'm worried that currently whilst it's easy to generate a DataArray with duck-typed data, it's actually kind of hard to generate a Dataset with duck-typed data. I'm trying to think what the minimum set of extra kwargs I would need to add to data_variables/datasets to make this easier would be.

The issue is that you can't just pass data=sparse_arrays() to datasets, because sparse_arrays() will then generate un-alignable variables. For this to work you would have to pass something like a callable which accepts a shape and produces a sparse array of that size, so that I can use dim_sizes to create only alignable duck arrays...

Copy link
Collaborator

@keewis keewis Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a callable is what I tried with the create class method in #4972, but I'm not sure that is actually the best solution in this case (creating the final strategy becomes a bit complicated). Instead, I wonder if we would actually need some kind of strategy object that can be specialized after having been instantiated. For example, with this:

@st.composite
def pint_arrays(draw, *, shape=None, dtype=None, units=None):
    if shape is None:
        shape = shapes()

    if dtype is None:
        dtype = dtypes()

    if units is None:
        units = units()

    arrays = npst.arrays(shape, dtype)

    return pint.Quantity(draw(arrays), draw(units))

we would be able to "pin" the shape:

strategy = pint_arrays()

dim_sizes = ...

specialized_strategy = strategy.pin(shape=dim_sizes)

In other words, instead of calling .filter to drop examples that don't fit we'd modify the input parameters of the composite strategy (looking at the code of CompositeStrategy this actually does not seem too complicated to implement).

@Zac-HD, what do you think? Does that make sense to you, or would you recommend to solve this problem differently?

Copy link
Contributor

@Zac-HD Zac-HD Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be strongly in favor of accepting a callable taking *, shape, dtype, ... kwargs and returning a st.SearchStrategy[DuckArray]]. We've got remarkably good support for the Array API standard which will make this increasingly easy to handle (pass array_strategy_fn=make_strategies_namespace(duckarray_module).arrays). Or, of course, array_strategy_fn=pint_arrays as per the example above - note the lack of parens!

Then internally, you pass in the shape, dtype, and maybe elements arguments, and then draw() from the resulting strategy. This is a little awkward, but it's an awkward situation and I've found that focussing on principled composability is really valuable when designing strategies.

The .pin() solution is much harder than it sounds - users might try to supply strategies created with st.builds() or .map() or something else, and then you've broken the uniformity of the API: if a strategy is required, any strategy that produces reasonable values ought to be acceptable. Also, bluntly, I expect that the Hypothesis internals would break this for you at some point and then you're in a nasty bind.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had intended to push .pin in some form upstream, but I of course forgot about the other types of strategies so I can see why that would not be desirable.

Putting the code into the definition of the composite strategy is much better than what I had before (constructing the examples using data.draw directly in the test), so that would be fine with me.

Do you know if it is possible to use make_strategies_namespace with additional parameters to the array's constructor, like units for pint or chunks for dask? I guess if we use the pint_arrays function from above we could use partial for this (and anyway, pint does not implement __array_namespace__ at the moment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if it is possible to use make_strategies_namespace with additional parameters to the array's constructor, like units for pint or chunks for dask? I guess if we use the pint_arrays function from above we could use partial for this (and anyway, pint does not implement __array_namespace__ at the moment).

make_strategies_namespace() does not currently support additional parameters, largely because the array creation functions don't support them. I think there would be some forward-compatibility issues with accepting **kwargs too, though perhaps not prohibitive. I can see why that would be desirable though, and I'd be very happy to support it if standardized or just written into a draft.

For Pint, I'd write something like the following:

def pint_arrays(draw, *, shape, dtype, units=units(), array_strategy_fn=npst.arrays):
    return st.builds(pint.Quantity, array_strategy_fn(shape=shape, dtype=dtype), units=units)

and then, as you say, use array_strategy_fn=partial(pint_arrays, units=...) to customize.

Where possible, I prefer return st.builds(...) to @st.composite because the latter has to re-run any conditionals or argument validation every time it generates an example. This is usually just an aesthetic preference though.

4 changes: 4 additions & 0 deletions doc/whats-new.rst
Expand Up @@ -22,6 +22,10 @@ v2022.07.0 (unreleased)
New Features
~~~~~~~~~~~~

- Added a suite of hypothesis strategies for generating xarray objects containing arbitrary data, useful for testing.
Accessible under :py:func:`testing.strategies`, and documented in a new page on testing in the User Guide.
(:issue:`6911`, :pull:`6908`)
By `Tom Nicholas <https://github.com/TomNicholas>`_.

Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
23 changes: 23 additions & 0 deletions xarray/testing/__init__.py
@@ -0,0 +1,23 @@
from .testing import ( # noqa: F401
_assert_dataarray_invariants,
_assert_dataset_invariants,
_assert_indexes_invariants_checks,
_assert_internal_invariants,
_assert_variable_invariants,
_data_allclose_or_equiv,
assert_allclose,
assert_chunks_equal,
assert_duckarray_allclose,
assert_duckarray_equal,
assert_equal,
assert_identical,
)

__all__ = [
"assert_allclose",
"assert_chunks_equal",
"assert_duckarray_equal",
"assert_duckarray_allclose",
"assert_equal",
"assert_identical",
]