Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial property-based tests using Hypothesis #22280

Merged
merged 6 commits into from Aug 25, 2018

Conversation

Projects
None yet
6 participants
@Zac-HD
Copy link
Contributor

commented Aug 11, 2018

This pull request has no features or bugfixes for Pandas itself (though it does have a whatsnew entry for contributors), but adds support for and actually adds a few tests using Hypothesis. The new section of the contributing guide below should explain what this means and why it's desirable. There are basically three contributions here:

  1. Contributor documentation and all the dependency setup to support Hypothesis tests in CI.
  2. A pleasantly concise pair of passing tests in test_ticks.py.
  3. Three failing tests in test_offsets_properties.py based on #18761 by @jbrockmendel. I've made these idiomatic from the Hypothesis side, but while it seems likely that at least test_on_offset_implementations might be finding real bugs fixing them is beyond me at the moment. I thought it worth including them anyway, with an xfail decorator.

Future work on Hypothesis tests could focus on such fruitful areas as round-tripping data via any of the serialisation formats Pandas offers (I tried to write a demo of this and couldn't get it to pass), or the melt and merge tests that @TomAugspurger mentioned.

Closes #17978, after which more specific issues could be opened with suggested tests.

@Zac-HD Zac-HD force-pushed the Zac-HD:hypothesis branch 2 times, most recently from cff98d8 to e13decd Aug 11, 2018

@codecov

This comment has been minimized.

Copy link

commented Aug 11, 2018

Codecov Report

Merging #22280 into master will decrease coverage by <.01%.
The diff coverage is 0%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #22280      +/-   ##
==========================================
- Coverage   92.04%   92.03%   -0.01%     
==========================================
  Files         169      169              
  Lines       50776    50780       +4     
==========================================
  Hits        46737    46737              
- Misses       4039     4043       +4
Flag Coverage Δ
#multiple 90.44% <0%> (-0.01%) ⬇️
#single 42.22% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/util/_tester.py 23.8% <0%> (-5.61%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55d176d...779b49a. Read the comment docs.

Behavioral based tests for offsets and date_range.
This file is adapted from https://github.com/pandas-dev/pandas/pull/18761 -
which was more

This comment has been minimized.

Copy link
@jbrockmendel

jbrockmendel Aug 11, 2018

Member

you stopped talking in the middle of a

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Aug 11, 2018

This looks really nice, definitely prettier than #18761.

Do the xfailed tests actually fail or is that just being defensive? Ideally we'd like to have strict=True in xfails.

@Zac-HD Zac-HD force-pushed the Zac-HD:hypothesis branch from e13decd to 9a72a1a Aug 12, 2018

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 12, 2018

Thanks! I have an an unfair advantage in Hypothesis idioms, but I couldn't have written anything like this without your example to follow on the Pandas side 😄

The tests really do fail - I've added strict=True to the xfails and a note to one which is inconsistent. Hopefully someone who knows more about Pandas internals than I do can either improve the tests or actually fix the bugs!

@Zac-HD Zac-HD force-pushed the Zac-HD:hypothesis branch from 9a72a1a to cdc4c4c Aug 12, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2018

Agreed this looks quite nice, thanks @Zac-HD .

For better or worse, our test dependencies are all optional other than pytest. I think hypothesis should follow that pattern. If others agree with that policy, I suppose the easiest way to do that would be to keep tests using hypothesis in a separate file, and pytest.importorskip('hypothesis'), or add a helper in pandas.util.testing.

I'll look at those xfailing tests later.

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2018

In principle I don't have any particular objections to making Hypothesis optional and keeping it in particular files - that's how it works for Xarray, and creating a new sub-directory tests/hypothesis/ isn't hard.

The only catch is that they should be executed as part of every test run in CI! Otherwise you miss all the platform-specific issues that it could turn up, including the thing where test_on_offset_implementations has to be marked strict=False because it only fails on some operating systems.

If you feel that property-based tests are going to be valuable - and there are obvious applications to testing e.g. round-tripping serialization that I couldn't get to pass - personally I'd just make Hypothesis a lightweight required dependency for development. Happy to let you decide either way though!

@gfyoung gfyoung added the Testing label Aug 13, 2018

@gfyoung

This comment has been minimized.

Copy link
Member

commented Aug 13, 2018

@TomAugspurger : I think hypothesis could be quite valuable for testing purposes. However, I would play conservative first and make this optional as you suggested. If it proves itself to be very important for testing purposes, then we could consider making it required later on.

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2018

OK, optional it is then! Do I just list the dependency in ci/requirements-optional-conda.txt? Or is there something more involved?

And just to make this more fun, it seems like the conda-forge package doesn't work for Python3 under OSX... cc @jochym for help please 😨

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2018

@Zac-HD namespace collision I think :/ Seems to be hypothesis-python, at least on conda-forge.

Our CI setup is a bit messy, but I think

  1. Include hypothesis-python in ci/environment-dev.yaml
  2. Include a rename in scripts/convert_deps.py from hypothesis-python to hypothesis
  3. Run python scripts/convert_deps.py and stage / commit the files
  4. Update some of the CI files to include hypothesis-python
    a. circle-27-compat.yaml
    b. circle-36-locale.yaml
    c. travis-27.yaml
    c. travis-35-osx.yaml
    d. travis-37.yaml
    e. appveyor-27.yaml
    f. appveyor-36.yaml
@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 13, 2018

Hmm, I've just checked and I don't think the name is the issue after all (and I can't find a package called hypothesis-python on conda-forge at all).

And it's currently listed in ci/environment-dev.yaml - do you mean moving it to ci/requirements-optional-conda.txt?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2018

Hmm, I've just checked and I don't think the name is the issue after all

Ah, sorry I misread the travis output. Ignore that.

And it's currently listed in ci/environment-dev.yaml

That's perfect. Could you also run python scripts/convert_deps.py and include the output (for our contributors using pip).

@@ -795,6 +795,44 @@ Tests that we have ``parametrized`` are now accessible via the test name, for ex
test_cool_feature.py::test_series[int8] PASSED

This comment has been minimized.

Copy link
@jreback

jreback Aug 13, 2018

Contributor

add a sub-section ref

This comment has been minimized.

Copy link
@Zac-HD

Zac-HD Aug 13, 2018

Author Contributor

Done.

import pandas as pd

from pandas.tseries.offsets import (
Hour, Minute, Second, Milli, Micro, Nano,

This comment has been minimized.

Copy link
@jreback

jreback Aug 13, 2018

Contributor

these already exist as fixtures, i'd like to avoid specifically importing things like this and use the fixture / generating fixture instead

This comment has been minimized.

Copy link
@Zac-HD

Zac-HD Aug 13, 2018

Author Contributor

This is actually a non-trivial trade-off! Because we can't call fixtures outside of a test, we'll still need to import these types to register the strategy for each of them in conftest.py (see next comment).

In the test file, we have two options:

  1. Import them, and let Hypothesis decide which to try.
  2. Use a fixture which creates a test for each, and have Hypothesis try cases for each independently.

(1) is faster at runtime because Hypothesis is running one instead of n tests, while (2) is more explicit if a little less elegant. Your call.

QuarterBegin, QuarterEnd, BQuarterBegin, BQuarterEnd,
YearBegin, YearEnd, BYearBegin, BYearEnd]

# ----------------------------------------------------------------

This comment has been minimized.

Copy link
@jreback

jreback Aug 13, 2018

Contributor

this is the problem I had with the original PR. There is a lot of code to create the things that need testing (e.g. the case generating statements).

Putting all of this inside a specific test file makes this very hard to re-use / document. Is there a standard like a conftest.py where these typically exist? We can create infrastructure to do this, but a) want to share as much of this code as possible, and b) put it in standard, well-defined, easy to discover locations.

This comment has been minimized.

Copy link
@Zac-HD

Zac-HD Aug 13, 2018

Author Contributor
  • Global setup using register_type_strategy could easily be moved to conftest.py (heuristic: "equivalent to defining fixtures"), with a comment at each end explaining where the other part is.

  • For the gen_* variables, I'd actually leave them right where they are - they look fairly specialised to these tests, and strategies would be idiomatically constructed within the call to @given if they were shorter or unique.
    Common bits can be extracted out as they get identified, but TBH we'd love to supply anything useful downstream in hypothesis.extra.pandas so some of that will hopefully vanish again 😉

@@ -35,6 +36,39 @@ def test_delta_to_tick():
assert (tick == offsets.Day(3))


@given(cls=st.sampled_from(tick_classes),

This comment has been minimized.

Copy link
@jreback

jreback Aug 13, 2018

Contributor

you should be able to remove tests which this duplicates.

This comment has been minimized.

Copy link
@Zac-HD

Zac-HD Aug 13, 2018

Author Contributor

Sure; I'll convert them to explicit inputs with the @example(...) decorator 😄

assert left - right == expected


@given(cls=st.sampled_from(tick_classes),

This comment has been minimized.

Copy link
@jreback

jreback Aug 13, 2018

Contributor

do these generate individual tests like pytest params?

This comment has been minimized.

Copy link
@Zac-HD

Zac-HD Aug 13, 2018

Author Contributor

No, it's only one test at runtime - @given(...) creates a wrapper which handles the details of repeatedly calling the inner test. It also shrinks each error it sees to the minimal example, and will report multiple errors if it saw more than one (distinguished by exception type and line of code).

@jbrockmendel

This comment has been minimized.

Copy link
Member

commented Aug 13, 2018

@Zac-HD Is there anything else from #18761 worth trying to salvage?

@Zac-HD Zac-HD force-pushed the Zac-HD:hypothesis branch from 16a3bd9 to f94cf92 Aug 14, 2018

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2018

@jbrockmendel - it looks like you had some neat ideas for further tests of e.g. timedeltas, but I thought it would be better for me to stick to the groundwork and leave them for a follow-up PR that could do them justice.

If you or anyone else would like to work on that, I'd be happy to advise on Hypothesis idioms (a standing offer for any open source project).

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2018

@jreback - I've implemented all the changes from your review. Is there anything else I need to do before this can be merged?

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 18, 2018

Ping @TomAugspurger & @jreback for a final review. It would be great to iterate on this at the PyCon AU sprints if it can be merged by next weekend, so if there's anything left to do I would really like to hear about it soon!

# ----------------------------------------------------------------
# Global setup for tests using Hypothesis

from hypothesis import strategies as st

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Aug 18, 2018

Contributor

This will put a hard dependency on hypothesis for testing. Are we OK with that? After some thought, I think it's fine. It's a well-maintained project, and working around it in the test suite seems silly.

If we're ok with that, then @Zac-HD could you update

  • pandas/util/_tester.py to have a nice message if either pytest or hypothesis is missing?
  • pandas/ci/check_imports.py to ensure hypothesis is not imported with the main import pandas?
  • doc/source/whatsnew/0.24.0.txt with a small subsection saying hypothesis is required for running the tests (with a link to the hypothesis docs :)

This comment has been minimized.

Copy link
@Zac-HD

Zac-HD Aug 19, 2018

Author Contributor

My read of the reviews so far is that @jreback was in favor of a mandatory dependency (also my recommendation), and you're now in favor too.

I've therefore made the relevant changes and it's all ready to go 🎉

(though one build on Travis has errored out, the tests passed until the timeout)

This comment has been minimized.

Copy link
@jreback

jreback Aug 20, 2018

Contributor

so, I still think we need to a) remove hypothesis from 1 build (the same one we have removed moto from is good). and use pyimportor.skip('hypthoesis'). The reason is not for our CI really, rather so when a user does pd.test() is doesn't fail, rather it will just skip those tests.

This comment has been minimized.

Copy link
@Zac-HD

Zac-HD Aug 21, 2018

Author Contributor

@jreback there are two problems with making Hypothesis optional for pd.test():

  1. It makes adding further Hypothesis tests - eg for serialisation round-trips, timedeltas, or reshaping logic - much harder. They'd have to be in separate files, guard any global setup and configuration, handle import-or-skips, etc.
  2. It forces us to choose to either duplicate tests, or skip them at runtime.

That doesn't make it completely unreasonable, I'd prefer to just have the dependency - and I've been using Pandas for much longer than Hypothesis!

TLDR - what's wrong with putting Hypothesis in the same category as pytest?

This comment has been minimized.

Copy link
@TomAugspurger

TomAugspurger Aug 21, 2018

Contributor

I think moto is a bit different, since it's relatively unimportant to mainline pandas, and so is easy to work around.

IMO, hypothesis should be treated the same as pytest.

@Zac-HD Zac-HD force-pushed the Zac-HD:hypothesis branch from f94cf92 to 95ec029 Aug 19, 2018

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2018

Merged master. The travis timeout should be fixed now.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2018

let me have a look

# ----------------------------------------------------------------
# Global setup for tests using Hypothesis

from hypothesis import strategies as st

This comment has been minimized.

Copy link
@jreback

jreback Aug 20, 2018

Contributor

so, I still think we need to a) remove hypothesis from 1 build (the same one we have removed moto from is good). and use pyimportor.skip('hypthoesis'). The reason is not for our CI really, rather so when a user does pd.test() is doesn't fail, rather it will just skip those tests.

@@ -12,6 +12,10 @@ def test(extra_args=None):
import pytest
except ImportError:
raise ImportError("Need pytest>=3.0 to run tests")
try:

This comment has been minimized.

Copy link
@jreback

jreback Aug 20, 2018

Contributor

remove this here

@jreback jreback referenced this pull request Aug 20, 2018

Closed

TST: add hypothesis-based tests #20590

3 of 4 tasks complete

@jreback jreback added this to the 0.24.0 milestone Aug 20, 2018

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 23, 2018

@jreback my and Tom's view is that we're best off with Hypothesis as a hard dependency to run tests. Can you live with this and give an approving review? If not, what problems do you see?

(if possible I'd really like to merge this Friday or Saturday so I can talk about it at PyCon Australia and get some follow-up work done at the sprints)

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

it still can be a hard dependency for our testing

but as i said users code will immediately fail if they only have pytest

for user code that’s all that we can / should require

I am not sure why hisbis a problem
you simply put a skip if in the conftest itself
if it’s not installed

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

What's the issue with having it as a required test dependency? Having to write tests around that constraint seems unnecessarily burdensome on contributors / maintainers.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

I guess its ok to add as a dep for testing. can you update the install.rst as well.

@Zac-HD Zac-HD force-pushed the Zac-HD:hypothesis branch 2 times, most recently from 7c86fac to 835f352 Aug 24, 2018

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

@TomAugspurger - I'm quite certain these build issues are unrelated to the pull. Any idea on how to make it green?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

Zac-HD and others added some commits Aug 11, 2018

TST: Add Hypothesis tests for ticks, offsets
These tests are derived from GH18761, by jbrockmendel

Co-authored-by: jbrockmendel <jbrockmendel@users.noreply.github.com>
TST: Improve integration of Hypothesis
Responding to review from jreback on GH22280.

@Zac-HD Zac-HD force-pushed the Zac-HD:hypothesis branch from 70510be to 779b49a Aug 24, 2018

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2018

🎉 - it's working! @TomAugspurger, if you (or anyone else!) can hit 'merge' by, uh, 4am UTC that would be in time for my talk at PyCon Australia 😅

@TomAugspurger TomAugspurger merged commit fa47b8d into pandas-dev:master Aug 25, 2018

6 checks passed

ci/circleci: py27_compat Your tests passed on CircleCI!
Details
ci/circleci: py35_ascii Your tests passed on CircleCI!
Details
ci/circleci: py36_locale Your tests passed on CircleCI!
Details
ci/circleci: py36_locale_slow Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@TomAugspurger

This comment has been minimized.

Copy link
Contributor

commented Aug 25, 2018

Thanks!

@h-vetinari

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2018

After rebasing #22236, I'm getting a failure related to this PR in the circleci/py36_locale job (https://circleci.com/gh/pandas-dev/pandas/18093):

=================================== FAILURES ===================================
___________________________ test_tick_add_sub[Nano] ____________________________

cls = <class 'pandas.tseries.offsets.Nano'>

    @pytest.mark.parametrize('cls', tick_classes)
>   @example(n=2, m=3)
    @example(n=800, m=300)
    @example(n=1000, m=5)
    @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
    def test_tick_add_sub(cls, n, m):
E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 2 valid examples in 1.09 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

pandas/tests/tseries/offsets/test_ticks.py:40: FailedHealthCheck
---------------------------------- Hypothesis ----------------------------------
You can add @seed(183585162026757523236170973153487462479) to this test or run pytest with --hypothesis-seed=183585162026757523236170973153487462479 to reproduce this failure.
@h-vetinari

This comment has been minimized.

Copy link
Contributor

commented Aug 27, 2018

And again, this time in another test, but same build job (https://circleci.com/gh/pandas-dev/pandas/18150):

=================================== FAILURES ===================================
___________________________ test_tick_add_sub[Milli] ___________________________

cls = <class 'pandas.tseries.offsets.Milli'>

    @pytest.mark.parametrize('cls', tick_classes)
>   @example(n=2, m=3)
    @example(n=800, m=300)
    @example(n=1000, m=5)
    @given(n=st.integers(-999, 999), m=st.integers(-999, 999))
    def test_tick_add_sub(cls, n, m):
E   hypothesis.errors.FailedHealthCheck: Data generation is extremely slow: Only produced 2 valid examples in 1.14 seconds (0 invalid ones and 0 exceeded maximum size). Try decreasing size of the data you're generating (with e.g.max_size or max_leaves parameters).
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.too_slow to the suppress_health_check settings for this test.

pandas/tests/tseries/offsets/test_ticks.py:40: FailedHealthCheck
---------------------------------- Hypothesis ----------------------------------
You can add @seed(188896409658629737425924686489321258929) to this test or run pytest with --hypothesis-seed=188896409658629737425924686489321258929 to reproduce this failure.

I think this healthcheck is too restrictive. If the server is busy for that single second that the samples are generated, it might already be triggered... The error message already mentions how to disable it, maybe this should be considered as a follow-up...?

@Zac-HD

This comment has been minimized.

Copy link
Contributor Author

commented Aug 28, 2018

Ah, looks like a CI clock issue. I'll open a follow-up PR in the next day with the following in conftest.py:

from hypothesis import settings, HealthCheck

settings.default.suppress_health_check = (HealthCheck.TooSlow,)
                                       # HealthCheck.all() to disable all health checks

@topper-123 topper-123 referenced this pull request Sep 8, 2018

Merged

ENH: better MultiIndex.__repr__ #22511

3 of 3 tasks complete

Sup3rGeo added a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

Add initial property-based tests using Hypothesis (pandas-dev#22280)
* BLD: Add Hypothesis to build system

* TST: Add Hypothesis tests for ticks, offsets

These tests are derived from GH18761, by jbrockmendel

Co-authored-by: jbrockmendel <jbrockmendel@users.noreply.github.com>

* DOC: Explain Hypothesis in contributing guide

* TST: remove pointless loop

* TST: Improve integration of Hypothesis

Responding to review from jreback on GH22280.

* Final review fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.