Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis strategy for generating Variable objects #8404

Merged
merged 176 commits into from
Dec 5, 2023

Conversation

TomNicholas
Copy link
Contributor

@TomNicholas TomNicholas commented Nov 2, 2023

Breaks out just the part of #6908 needed for generating arbitrary xarray.Variable objects. (so ignore the ginormous number of commits)

EDIT: Check out this test which performs a mean on any subset of any Variable object!

In [36]: from xarray.testing.strategies import variables

In [37]: variables().example()
<xarray.Variable (ĭ: 3)>
array([-2.22507386e-313-6.62447795e+016j,
                    nan-6.46207519e+185j,
       -2.22507386e-309+3.33333333e-001j])

@andersy005 @maxrjones @jhamman I thought this might be useful for the NamedArray testing. (xref #8370 and #8244)

@keewis and @Zac-HD sorry for letting that PR languish for literally a year 😅 This PR addresses your feedback about accepting a callable that returns a strategy generating arrays. That suggestion makes some things a bit more complex in user code but actually allows me to simplify the internals of the variables strategy significantly. I'm actually really happy with this PR - I think it solves what we were discussing, and is a sensible checkpoint to merge before going back to making strategies for generating composite objects like DataArrays/Datasets work.

TomNicholas and others added 30 commits August 11, 2022 03:11
Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have time to check the tests yet, but here are a few comments

Comment on lines +3 to +4
Testing your code
=================
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. It is true that the page has a different target audience than the other pages in the user guide, but then again applications can also be tested. And, so far the "internals" section describes implementation details or extension mechanisms that affect the internals.

doc/user-guide/testing.rst Outdated Show resolved Hide resolved
doc/user-guide/testing.rst Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/testing/strategies.py Outdated Show resolved Hide resolved
xarray/testing/strategies.py Outdated Show resolved Hide resolved
Comment on lines +49 to +54
return (
npst.integer_dtypes()
| npst.unsigned_integer_dtypes()
| npst.floating_dtypes()
| npst.complex_number_dtypes()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do support string dtypes, but only for a subset of operations. Is this worth mentioning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not meant to be an exhaustive list (yet). It doesn't include datetimes either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, but most operations don't make sense on string or datetime dtypes so it might be better to make a separate list of dtypes for those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I'm just saying let's defer detailed discussions of which types to test until another issue / PR, the point of this PR is to provide a framework flexible enough to easily test xarray functions with any type we want, which this achieves.

xarray/testing/strategies.py Show resolved Hide resolved
xarray/testing/testing.py Outdated Show resolved Hide resolved
@Zac-HD
Copy link
Contributor

Zac-HD commented Nov 13, 2023

The only final thing is that the docs don't build because of one weird warning (our docs are set to fail on any warnings):

xarray/xarray/testing/strategies.py:docstring of xarray.testing.strategies.accept.<locals>.variables:47:
    WARNING: Block quote ends without a blank line; unexpected unindent.

Given that I don't define any local variables called accept, but hypothesis apparently does, I guess this must be hypothesis' fault somehow?

My guess is that this is an existing docstring, the location of which is being misreported due to the various wrappers that Hypothesis inserts. I'd be very surprised if Hypothesis is modifying docstrings somehow, but I guess trimming trailing whitespace is the kind of thing that could happen somewhere in the stack.

No direct insight, but getting the full text of the docstring it's complaining about should help?

@TomNicholas
Copy link
Contributor Author

TomNicholas commented Dec 5, 2023

I got the docs build to pass! The warning was due to extra lines in the examples of the variables strategy docstring. I only managed to find it by trial and error 🙄

@keewis do you want to review the tests before I merge it? (The test failures now are something groupby-related, and are also happening in #8521, so definitely not my fault!)

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spot anything that we wouldn't be able to change after merging / releasing, so I'd say let's merge and see how well it works in practice.

)


def smallish_arrays(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the only reason we have this function the default strategy for shape (and maybe some additional typing)? If so, we might be able to use functools.partial on npst.arrays? Unless you meant to expose this as public API (it's not in the API reference)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the only reason. I did not think of using functools.partial - that's a good idea, I can try that out before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That actually won't work because we do need to be able to pass shape and dtype to the array_strategy_fn.

But I tried removing smallish_arrays completely and the tests still seem to complete in a reasonable amount of time, so I've actually just taken it out for now.

@TomNicholas TomNicholas merged commit ab6a255 into pydata:main Dec 5, 2023
18 of 28 checks passed
Variable refactor automation moved this from In progress to Done Dec 5, 2023
TomNicholas added a commit to TomNicholas/datatree that referenced this pull request Dec 10, 2023
TomNicholas added a commit to xarray-contrib/datatree that referenced this pull request Dec 10, 2023
* fix import of xarray.testing internals that was changed by pydata/xarray#8404

* bump minimum required version of xarray

* linting
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 18, 2023
* main: (26 commits)
  Filter null values before plotting (pydata#8535)
  Update concat.py (pydata#8538)
  Add getitem to array protocol (pydata#8406)
  Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
  Filter out doctest warning (pydata#8539)
  Bump actions/setup-python from 4 to 5 (pydata#8540)
  Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
  Add Cumulative aggregation (pydata#8512)
  dev whats-new
  Whats-new for 2023.12.0 (pydata#8532)
  explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
  Add `eval` method to Dataset (pydata#7163)
  Deprecate ds.dims returning dict (pydata#8500)
  test and fix empty xindexes repr (pydata#8521)
  Remove PR labeler bot (pydata#8525)
  Hypothesis strategy for generating Variable objects (pydata#8404)
  Use numbagg for `rolling` methods (pydata#8493)
  Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
  fix RTD docs build (pydata#8519)
  Fix type of `.assign_coords` (pydata#8495)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 20, 2023
* main: (58 commits)
  Adapt map_blocks to use new Coordinates API (pydata#8560)
  add xeofs to ecosystem.rst (pydata#8561)
  Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
  Generalize cumulative reduction (scan) to non-dask types (pydata#8019)
  Filter null values before plotting (pydata#8535)
  Update concat.py (pydata#8538)
  Add getitem to array protocol (pydata#8406)
  Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
  Filter out doctest warning (pydata#8539)
  Bump actions/setup-python from 4 to 5 (pydata#8540)
  Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
  Add Cumulative aggregation (pydata#8512)
  dev whats-new
  Whats-new for 2023.12.0 (pydata#8532)
  explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
  Add `eval` method to Dataset (pydata#7163)
  Deprecate ds.dims returning dict (pydata#8500)
  test and fix empty xindexes repr (pydata#8521)
  Remove PR labeler bot (pydata#8525)
  Hypothesis strategy for generating Variable objects (pydata#8404)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 4, 2024
commit 0a0f800
Merge: 33c8033 41d33f5
Author: Deepak Cherian <dcherian@users.noreply.github.com>
Date:   Tue Jan 2 20:42:51 2024 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

commit 33c8033
Author: Deepak Cherian <deepak@cherian.net>
Date:   Tue Jan 2 20:40:42 2024 -0700

    Don't skip for resampling

commit d7be352
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Jan 3 03:24:13 2024 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit d13fa0e
Author: Deepak Cherian <dcherian@users.noreply.github.com>
Date:   Tue Jan 2 20:23:43 2024 -0700

    Apply suggestions from code review

    Co-authored-by: Michael Niklas  <mick.niklas@gmail.com>

commit dd6ea53
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 19:29:40 2023 -0700

    Silence more warnings

commit 44e5a41
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 19:21:06 2023 -0700

    minimize test mods

commit 94c1c1f
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:55:46 2023 -0700

    Add tests for pydata#8263

commit 0ab4eb6
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:47:41 2023 -0700

    Fix typing

commit a064430
Merge: d6a3f2d 03ec3cb
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:47:04 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main:
      Fix mypy type ignore (pydata#8564)
      Support for the new compression arguments. (pydata#7551)
      FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
      Adapt map_blocks to use new Coordinates API (pydata#8560)
      add xeofs to ecosystem.rst (pydata#8561)
      Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
      Generalize cumulative reduction (scan) to non-dask types (pydata#8019)

commit d6a3f2d
Author: Deepak Cherian <deepak@cherian.net>
Date:   Thu Dec 21 18:46:50 2023 -0700

    Fix generator for aggregations

commit 97f1695
Author: Deepak Cherian <deepak@cherian.net>
Date:   Tue Dec 19 10:58:11 2023 -0700

    Fix docs

commit 5b33b98
Author: Deepak Cherian <deepak@cherian.net>
Date:   Sun Dec 17 20:35:53 2023 -0700

    fix whats-new

commit 80b2b36
Author: Deepak Cherian <deepak@cherian.net>
Date:   Sun Dec 17 20:26:17 2023 -0700

    Reduce more warnings

commit 5f6f4ea
Merge: a57d4ae 2971994
Author: Deepak Cherian <deepak@cherian.net>
Date:   Sat Dec 16 20:33:13 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main: (26 commits)
      Filter null values before plotting (pydata#8535)
      Update concat.py (pydata#8538)
      Add getitem to array protocol (pydata#8406)
      Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
      Filter out doctest warning (pydata#8539)
      Bump actions/setup-python from 4 to 5 (pydata#8540)
      Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
      Add Cumulative aggregation (pydata#8512)
      dev whats-new
      Whats-new for 2023.12.0 (pydata#8532)
      explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
      Add `eval` method to Dataset (pydata#7163)
      Deprecate ds.dims returning dict (pydata#8500)
      test and fix empty xindexes repr (pydata#8521)
      Remove PR labeler bot (pydata#8525)
      Hypothesis strategy for generating Variable objects (pydata#8404)
      Use numbagg for `rolling` methods (pydata#8493)
      Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
      fix RTD docs build (pydata#8519)
      Fix type of `.assign_coords` (pydata#8495)
      ...

commit a57d4ae
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 21:36:04 2023 -0700

    Test one more warning

commit bf8139d
Author: Deepak Cherian <dcherian@users.noreply.github.com>
Date:   Fri Dec 1 21:33:45 2023 -0700

    Update xarray/tests/test_groupby.py

commit 4e9a063
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 21:10:14 2023 -0700

    Set squeeze=None for Dataset too

commit c2e576e
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 20:54:17 2023 -0700

    Fix first, last

commit 6d8e822
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 20:46:21 2023 -0700

    better warning

commit 62c334b
Author: Deepak Cherian <deepak@cherian.net>
Date:   Fri Dec 1 20:45:17 2023 -0700

    silence warnings

commit b7805a8
Author: dcherian <deepak@cherian.net>
Date:   Tue Aug 15 10:54:25 2023 -0600

    Deprecate `squeeze` in GroupBy.

    Closes pydata#2157
flamingbear pushed a commit to flamingbear/rewritten-datatree that referenced this pull request Jan 19, 2024
* fix import of xarray.testing internals that was changed by pydata/xarray#8404

* bump minimum required version of xarray

* linting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration tools dependencies Pull requests that update a dependency file topic-testing topic-typing
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants