Hypothesis strategies in xarray.testing.strategies #6908

TomNicholas · 2022-08-11T15:20:56Z

Adds a whole suite of hypothesis strategies for generating xarray objects, inspired by and separated out from the new hypothesis strategies in #4972. They are placed into the namespace xarray.testing.strategies, and publicly mentioned in the API docs, but with a big warning message. There is also a new testing page in the user guide documenting how to use these strategies.

Closes Public hypothesis strategies for generating xarray data #6911
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

EDIT: A variables strategy and user-facing documentation were shipped in #8404

TomNicholas · 2022-08-11T15:33:52Z

I also added my chunking strategy from HypothesisWorks/hypothesis#3433

…ctor

for more information, see https://pre-commit.ci

…atible

…s/xarray into hypothesis-strategies

for more information, see https://pre-commit.ci

TomNicholas · 2022-09-19T14:58:29Z

@Zac-HD if I could request one more review please! The two remaining problems for me are:

How should we alter the API of datasets to make it easier to construct Dataset objects containing duck-typed arrays (see Hypothesis strategies in xarray.testing.strategies #6908 (comment))
Why does the example generation performance seem to have gotten worse? 😕 I added what I thought were small refactors (e.g. _sizes_from_dim_names) which may somehow be related, but I'm not sure.

Zac-HD · 2022-09-20T06:59:10Z

@Zac-HD if I could request one more review please! The two remaining problems for me are:

Absolutely! Some quick comments this evening; I would also like to do a full review again before merge but that might be next week or weekend - I'm out for a conference from early Thursday.

How should we alter the API of datasets to make it easier to construct Dataset objects containing duck-typed arrays (see Hypothesis strategies in xarray.testing.strategies #6908 (comment))

Replied in the thread above.

Why does the example generation performance seem to have gotten worse? 😕 I added what I thought were small refactors (e.g. _sizes_from_dim_names) which may somehow be related, but I'm not sure.

I'd be pretty surprised if that was related, st.fixed_dictionaries() is internally basically just that zip() trick anyway. I'd guess that this is mostly "as you implement the last few complicated data-gen options, they start taking nonzero time", but not confident in that without reading closely and probably measuring some perf things.

keewis · 2022-09-20T09:26:22Z

xarray/testing/strategies.py

+    if draw(
+        st.booleans()
+    ):  # Allow for no coordinate variables - explicit possibility not to helps with shrinking


Suggested change

if draw(

st.booleans()

): # Allow for no coordinate variables - explicit possibility not to helps with shrinking

# Allow for no coordinate variables - explicit possibility not to helps with shrinking

if draw(st.booleans()):

keewis · 2022-09-20T12:38:09Z

doc/user-guide/testing.rst

+but building a dataset from scratch (i.e. method (2)) requires building the dataset object in such as way that all of
+the data variables have compatible dimensions. You can build up a dictionary of the form ``{var_name: data_variable}``
+yourself, or you can use the ``data_vars`` argument to the ``data_variables`` strategy (TODO):
+
+.. ipython:: python
+    :okexcept:
+
+    sparse_data_vars = xrst.data_variables(data=sparse_arrays())
+    sparse_datasets = xrst.datasets(data_vars=sparse_data_vars)
+
+    sparse_datasets.example()


I had intended to push .pin in some form upstream, but I of course forgot about the other types of strategies so I can see why that would not be desirable.

Putting the code into the definition of the composite strategy is much better than what I had before (constructing the examples using data.draw directly in the test), so that would be fine with me.

Do you know if it is possible to use make_strategies_namespace with additional parameters to the array's constructor, like units for pint or chunks for dask? I guess if we use the pint_arrays function from above we could use partial for this (and anyway, pint does not implement __array_namespace__ at the moment).

for more information, see https://pre-commit.ci

dcherian · 2024-04-01T02:26:00Z

How do we move this forward? Even Xarray objects with just numpy arrays would be quite useful

Zac-HD · 2024-04-01T04:29:44Z

I think #8404 made a lot of progress on this, including shipping the user-facing documentation. If you wanted to open a PR rebasing this set of changes on main, I think that might be most of the remaining work.

for more information, see https://pre-commit.ci

TomNicholas · 2024-04-01T14:58:13Z

So I just did a monster merge of main into this branch (probably should still rebase). It won't work yet because we still need to propagate all the array_strategy_fn stuff that went through with #8404 into the signatures of the new strategies in this PR.

How do we move this forward?

It's mostly just dealing with the above and also making sure we can generate sets of variables with alignable dimensions efficiently. We also probably should think about what we want the signatures of the more complicated strategies to be: e.g. are we wanting to pass variables to datasets? or array_strategy_fn to datasets?

Even Xarray objects with just numpy arrays would be quite useful

A lot of the work that went into #8404 was working out how to make it general enough to handle non-numpy arrays.

doc/whats-new.rst

xarray/testing/strategies.py

TomNicholas added 7 commits August 11, 2022 03:11

copied files defining strategies over to this branch

587ebb8

placed testing functions in their own directory

acbfa69

moved hypothesis strategies into new testing directory

73d763f

begin type hinting strategies

db2deff

renamed strategies for consistency with hypothesis conventions

746cfc8

added strategies to public API (with experimental warning)

03cd9de

strategies for chunking patterns

2fe3583

TomNicholas mentioned this pull request Aug 11, 2022

Automatic duck array testing - reductions #4972

Draft

4 tasks

This was referenced Aug 11, 2022

Strategy for chunking arrays HypothesisWorks/hypothesis#3433

Closed

Public hypothesis strategies for generating xarray data #6911

Open

TomNicholas added the topic-hypothesis Strategies or tests using the hypothesis library label Aug 12, 2022

TomNicholas added 4 commits August 12, 2022 21:31

rewrote variables strategy to have same signature as Variable constru…

4db3629

…ctor

test variables strategy

14d11aa

fixed most tests

418a359

added helpers so far to API docs

c8a7d0e

github-actions bot added topic-testing documentation and removed topic-hypothesis Strategies or tests using the hypothesis library labels Aug 13, 2022

TomNicholas added 2 commits August 13, 2022 00:11

add hypothesis to docs CI env

d48aceb

add todo about attrs

a20e341

github-actions bot added CI Continuous Integration tools dependencies Pull requests that update a dependency file labels Aug 13, 2022

TomNicholas and others added 7 commits August 13, 2022 12:24

draft of new user guide page on testing

3a4816f

types for dataarrays strategy

d0406a2

draft for chained chunking example

65a222d

[pre-commit.ci] auto fixes from pre-commit.com hooks

e1d718a

for more information, see https://pre-commit.ci

only accept strategy objects

57d0f5b

fixed failure with passing in two custom strategies that must be comp…

82c734c

…atible

syntax error in example

029f19a

TomNicholas and others added 6 commits September 9, 2022 16:02

fixed all local mypy errors

de26b2f

move numpy strategies import

f81e14f

Merge branch 'hypothesis-strategies' of https://github.com/TomNichola…

129e2c3

…s/xarray into hypothesis-strategies

Merge branch 'main' into hypothesis-strategies

601d9e2

reduce sizes

af24af5

[pre-commit.ci] auto fixes from pre-commit.com hooks

9777c2a

for more information, see https://pre-commit.ci

keewis reviewed Sep 20, 2022

View reviewed changes

This was referenced Sep 20, 2022

Array objects of arbitrary rank are infeasible - require a reasonable range of ranks instead data-apis/array-api#479

Closed

Warn on bool(st.booleans()) HypothesisWorks/hypothesis#3463

Closed

TomNicholas mentioned this pull request Oct 20, 2022

combine_by_coords allows one overlapping coordinate value, but not more than one #7189

Open

4 tasks

TomNicholas mentioned this pull request Mar 8, 2023

Generalize handling of chunked array types #7019

Merged

15 tasks

TomNicholas added 2 commits July 19, 2023 22:45

Merge branch 'main' into hypothesis-strategies

4dcbc60

fix some api links in docs

7841dd5

TomNicholas mentioned this pull request Jul 24, 2023

Test suite cubed-dev/cubed-xarray#6

Open

TomNicholas and others added 2 commits November 2, 2023 07:31

Merge branch 'main' into hypothesis-strategies

968ee72

[pre-commit.ci] auto fixes from pre-commit.com hooks

a6fc063

for more information, see https://pre-commit.ci

TomNicholas mentioned this pull request Nov 2, 2023

Hypothesis strategy for generating Variable objects #8404

Merged

4 tasks

TomNicholas and others added 3 commits April 1, 2024 10:46

Merge branch 'main' into hypothesis-strategies

6a4a403

remove np_arrays strategy

0b13771

[pre-commit.ci] auto fixes from pre-commit.com hooks

b44a4a2

for more information, see https://pre-commit.ci

dcherian reviewed Apr 1, 2024

View reviewed changes

doc/whats-new.rst Show resolved Hide resolved

dcherian reviewed Apr 1, 2024

View reviewed changes

xarray/testing/strategies.py Outdated Show resolved Hide resolved

TomNicholas added 2 commits April 1, 2024 12:00

fix bad merge of whatsnew

cdcfbf4

fix bad merge in strategies

0aab116

Zac-HD mentioned this pull request Apr 4, 2024

Create hypothesis.extra.xarray HypothesisWorks/hypothesis#3948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypothesis strategies in xarray.testing.strategies #6908

Hypothesis strategies in xarray.testing.strategies #6908

TomNicholas commented Aug 11, 2022 •

edited

TomNicholas commented Aug 11, 2022

TomNicholas commented Sep 19, 2022 •

edited

Zac-HD commented Sep 20, 2022

keewis Sep 20, 2022 •

edited

keewis Sep 20, 2022

dcherian commented Apr 1, 2024

Zac-HD commented Apr 1, 2024

TomNicholas commented Apr 1, 2024

Hypothesis strategies in xarray.testing.strategies #6908

Are you sure you want to change the base?

Hypothesis strategies in xarray.testing.strategies #6908

Conversation

TomNicholas commented Aug 11, 2022 • edited

TomNicholas commented Aug 11, 2022

TomNicholas commented Sep 19, 2022 • edited

Zac-HD commented Sep 20, 2022

keewis Sep 20, 2022 • edited

Choose a reason for hiding this comment

keewis Sep 20, 2022

Choose a reason for hiding this comment

dcherian commented Apr 1, 2024

Zac-HD commented Apr 1, 2024

TomNicholas commented Apr 1, 2024

TomNicholas commented Aug 11, 2022 •

edited

TomNicholas commented Sep 19, 2022 •

edited

keewis Sep 20, 2022 •

edited