Hypothesis #71

orsinium · 2020-12-02T15:03:02Z

Better compatibility with Hypothesis. Based on the official Hypothesis' integration with dpcontracts.

orsinium · 2020-12-02T15:03:43Z

@Zac-HD review, please :)

orsinium · 2020-12-02T15:28:50Z

I had it on my list for a long time and your email reminded me about it. I don't want to drop deal.cases because of its simplicity (you're right, it is based on hypothesis-auto which has the same motivation). So, let's provide a new function that works like fullfill from hypothesis integration with dpcontracts and internally uses the same logic to reject wrong options checking them through contracts. And also, it suppresses allowed exceptions. So, does internally the same as deal.cases.

Zac-HD · 2020-12-03T01:40:22Z

I know that deal.cases is simple, but I really, really don't like it sorry 😭

It doesn't, and can't, support replaying failing examples on subsequent runs - and because it's also nondeterministic, that means every test is flaky 😬
It also doesn't, and can't, support shrinking examples - so failures will be way more complicated and difficult to debug than they need to be 😢

Unfortunately this is inherent in the "generate a list of inputs" design; Hypothesis works by executing each input and seeing what happens. For example if Hypothesis generates an input of type e.g. Callable[[int, int], float], you can only call it within the original @given-decorated test function: there isn't actually a strong distinction between "generating inputs" and "running the test"!

I'll open a PR soon with an API I'd recommend:

@deal.raises(ZeroDivisionError)
def div(a: int, b: int) -> float:
    assert a == 1
    return a / b

# BEFORE
for case in deal.cases(div, kwargs=dict(a=1)):
    case()

# AFTER
div.get_test(a=1)()           # passes
div.get_test(a=st.just(1))()  # passes
div.get_test()()              # fails with a=0

That's implemented along the lines of

div = ...
def get_test(**kwargs):
	kwargs = {k: v if isinstance(st.SearchStrategy) else st.just(v) for k, v in kwargs.items()}

    @given(st.fixed_dictionaries(kwargs))
    def test(kw):
        deal.hypothesis(div)(**kw)  # as already in this PR

    # add some magic to make the Hypothesis database work nicely here
    return test
div.get_test = get_test

HypothesisWorks/hypothesis#2679 has some more details about a recent conversation with the icontract developers; this is pretty much the same idea but adapted slightly to the Deal api. I also have some ideas about how to make pytest integration easy-to-automatic, but more on that once the basics are working 🙂

orsinium · 2020-12-03T09:44:33Z

Hey, thank you for helping me there!

I know about the limitations of the current implementation of deal.cases and I designed it having in mind the tradeoffs. The deal.case function is designed as API-first rather than Hypothesis-first. So, I'm for any changes until it doesn't change the API. The API was designed together with students based on what works best from the cognitive perspective. Understanding of iteration goes much better rather than the understanding of the ability of decorators to totally rewrite your function and call it n-times when you call it once (how it is done in Hypothesis).

Unfortunately this is inherent in the "generate a list of inputs" design

I'm not sure how limiting the API of deal.case. It provides not a list but an iterator. So, you still can communicate failures back and do shrinking, providing more test cases based on the result of execution of the previous ones. I had a few failed attempts to do so and the issue I had is I can't extend Hypothesis for that purpose since it hides most of the logic in functions inside of functions rather than classes with small methods.

What about reproducing failures, it definitely doesn't depend on how you generate the test cases. I have on my list to provide the seed argument and feed it into Hypothesis.

To summarize, I understand the limitations and see the current API as the most simple one to understand which I value more than a bit more beautiful failure reports.

On the other hand, I want to provide the extendability and room to use the full power of Hypothesis at need. This is why I'm happy to provide the fulfill function as in this PR. Is something wrong with this implementation of the wrapper?

Zac-HD · 2020-12-04T05:07:47Z

Your implementation of deal.hypothesis() works perfectly! I'd personally try to find a more specific name, but either way once it's merged into Deal I'll update the Hypothesis docs for our dpcontracts extra to point over this way as a recommended alternative 😃

I really like your API-first approach, and working with students to design it is great! I also think you're right about the conceptual challenges of decorators... we do need them in general, but it's great to have simpler interfaces where we can.

That's why I thought that a deal.get_hypothesis_test(func) function could be a nice API: there's no decorator involved. It's actually pretty similar to deal.cases(), except that instead of returning an iterable of callables, it returns a single callable which has the loop internally. (OK, and the test is a function instead of a NamedTuple)

This is also important because deal.cases() doesn't actually work for all examples - for example

import deal
from typing import Callable

@deal.pre(lambda callback: callable(callback))
def do_call(callback: Callable[[], int]) -> int:
    return callback()

# This fails, because `case()` is called outside the Hypothesis data-generation
# scope, and the generated `callback` function is therefore unable to generate
# a return value!
for case in deal.cases(do_call):
    case()

# But returning a Hypothesis-wrapped test function, which users can call like
# they would each case, does work:
test_do_call = deal.get_hypothesis_test(do_call)
test_do_call()  # or let pytest handle it ;-)

I've pushed an implementation you can run to hypothesis...Zac-HD:hypothesis

That patch also makes deal.cases() deterministic, so you don't need to thread a seed value through (though you still can if you want to). Replaying from the database is in general a lot faster than rediscovering and re-shrinking failures each time.

Unfortunately the iterator-based approach is not feasible for us... I tried something very similar for HypoFuzz as a way to interleave examples from multiple tests, but changing that much of the engine would make it unmaintainable. It would also conflict with our current requirement that wrapped test functions may only return None (not e.g. a generator), which continues to catch user errors dealing with async tests and certain test runners 😞

orsinium · 2020-12-04T09:09:07Z

That's interesting points, thank you. I need a bit of time to think about it. I'm always hard to change something when it is about breaking API :)

Could you tell me a bit more about your example, please? Sorry, I didn't get it. Why Hypothesis can't generate Callable[[], int]?

orsinium · 2020-12-06T13:16:24Z

Please, have a look at the new implementation. I've made it possible to use deal.cases as a decorator which returns a proper hypothesis test function:

@deal.cases(div2)
def test_decorator_div2_smoke(case):
    case()

TODO:

Make it possible to pass inside the function other fixtures.
Update the docs, describe the decorator as the preferred way.

Zac-HD · 2020-12-07T07:33:08Z

This looks good! Some quick improvements:

Ensure that the derandomize=True setting is used with the iterable API, to avoid flakiness - the @seed() decorator overrides this if both are used.
Rename the fuzz property fuzz_one_input - the former implies that there is an internal loop, whereas the latter means "BYO loop".
Please leave the phases setting out of your customised defaults! I'd even suggest using Hypothesis actual default settings - we have them that way for good reasons - but the phases setting is especially prone to creating surprising and confusing UX issues.

I think a simpler API is possible though: why not have deal.make_hypothesis_test(func, **kwargs) return the @given-wrapped test function? The docs could then present

You can use deal.make_hypothesis_test(func) to get a pre-defined test function, which is implemented with Hypothesis. Assign it to a variable like test_func to run it via pytest, or just call it yourself!
If you want to define your own test function, usually because you can make stronger assertions about a subset of allowed inputs, you can use func = deal.hypothesis_assume(func) to get a wrapped function which tells Hypothesis to skip inputs which failed preconditions or raised allowed exceptions (a better version of hypothesis.extra.dpcontracts.fulfill).
The deal.cases(func) API is deprecated (and perhaps undocumented), and returns [deal.make_hypothesis_test(func)].

# current
@deal.cases(div2)
def test_decorator_div2_smoke(case):
    case()

# proposed
test_decorator_div2_smoke = deal.make_hypothesis_test(div2)

This reduces the API surface to learn, because all of the usual Hypothesis tools work in the obvious ways - you can test = settings(...)(test) or test = seed(0)(test) or even atheris.Setup(sys.argv, test.hypothesis.fuzz_one_input). Returning a function from a function is a conceptual step up from novice-level, but I don't think it's harder to grasp than the status quo of returning an iterable of callable objects.

orsinium · 2020-12-07T11:22:57Z

Thank you for the review! I really appreciate your help :)

Ensure that the derandomize=True setting is used with the iterable API

I'm not sure. I expect it to generate new values each time as QuickTest does. Probably, we can change it later, in a breaking release.

Rename the fuzz property fuzz_one_input

I'm thinking even about putting it into the same __call__ as a single dispatch. Then you can use it as fuzzer-fuzz(deal.cases(div)). I'm not sure if it's not too magic but it is definitely fun 🙃

And the same with .run but this is more driven by the possibility of an error: test_div = deal.cases(div).run works but if you forget .run (which seems easy) it won't run. I think, having it all under one single dispatch will make it polymorphic, more stable, and definitely shorter.

Please leave the phases setting out of your customized defaults!

That's a good idea. Yes, I think we can do it now. Let me see if we can leave it for iteration but still keep it redefinable via the settings argument.

you can use func = deal.hypothesis_assume(func) to get a wrapped function

Do we need such an API if we have a decorator? I'm not sure. You already can define custom strategies, custom settings, wrap it into more decorators from Hypothesis, add additional operations, wrap the call of case(), provide additional checks. Am I missing something?

The deal.cases(func) API is deprecated (and perhaps undocumented), and returns

I'll update the documentation and move the iterator into the "advanced" documentation but I won't break it in any way, at least for now. Maybe, I have not many users but I see no reason to break the API or public behavior. Sorry, iteration protocol stays and returns the actual cases. Also, for a while it stays in python3 -m deal test because it requires careful changes of the behavior to still have coverage only when the actual case is running, I'll make it a bit later.

This reduces the API surface to learn because all of the usual Hypothesis tools work in the obvious ways - you can test = settings(...)(test) or test = seed(0)(test) or even atheris.Setup(sys.argv, test.hypothesis.fuzz_one_input)

I believe, it all will work after I make it a single dispatch:

test = deal.cases(div)

# This one is tricky because Hypothesis doesn't merge settings.
# This is why I provide `settings` argument.
test = settings(...)(test)

test = seed(0)(test) 
atheris.Setup(sys.argv, test)

I think I see the picture. Let me take care of single dispatch, support for passing fixtures in side the wrapped function, proper phases for the decorator, and the decorator-centric documentation, and we're ready to go :)

Zac-HD · 2020-12-07T11:37:12Z

Oooh, yeah, the single-dispatch approach is lovely. Definitely do that!

Keeping iterable cases also makes sense - I totally respect backwards compatiblity (enough to do this); and making it non-default is enough. Generating new values each time also makes sense, but I'd suggest using a random @seed(x) internally and then printing the seed value if any of the cases fail - that way it's still possible to reproduce failing cases even without the database.

Overall, thanks for all your work on this - I feel like I've just dumped piles of not-always-positive feedback, and you've come to a fantastic design and API at the end. Very very nice work 😄

Zac-HD · 2020-12-07T11:37:48Z

Do we need such an API if we have a decorator? I'm not sure. You already can define custom strategies, custom settings, wrap it into more decorators from Hypothesis, add additional operations, wrap the call of case(), provide additional checks. Am I missing something?

No, we don't need it; even @reproduce_failure will work with your version I think 😄

orsinium · 2020-12-07T12:00:11Z

but I'd suggest using a random @seed(x) internally and then printing the seed value if any of the cases fail

That's a great idea! Let me think about how to make it better, not only for the test CLI command. Maybe, I can try to plug into the exception message :)

I feel like I've just dumped piles of not-always-positive feedback

It's ok, I'm the most not-welcoming-changes person here. I said right at the beginning that I am stubborn when it comes to breaking API. I'm happy that we managed to make it friendly and compatible.

even @reproduce_failure will work with your version I think

Right, reproduce_failure is an important thing, we have to be sure that it works, I'll try to make a test for it.

Zac-HD

Some minor comments below, but overall this looks fantastic! I look forward to recommending Deal to anyone who wants a property-testable contracts library 😃

README.md

deal/_testing.py

Zac-HD · 2020-12-16T01:40:22Z

deal/_testing.py

+        func: typing.Callable, *,
+        count: int = 50,
+        kwargs: typing.Dict[str, typing.Any] = None,
+        check_types: bool = True,


I realise there are backwards compatiblity issues, but I'd rename check_types to check_return_type and make the former a deprecated alias. As-is, the name implies that argument types are also checked (or that =False disables such a check).

deal/_testing.py

Zac-HD · 2020-12-16T01:49:27Z

deal/_cli/_test.py

    """
    Iterate over `iterator` disabling tracer on every iteration step.
    This is a trick to avoid using our coverage tracer when calling hypothesis machinery.
    Without it, testing is about 3 times slower.
    """
+    iterator = iter(items)
    default_trace = sys.gettrace()


I'd move this inside the loop body, in case the user has e.g. started a debugger between your first invocation of this function and the end of iteration. It's a little slower, but avoids a weird-if-rare edge case.

docs/basic/tests.md

tests/test_doctest.py

docs/basic/tests.md

docs/details/tests.md

sobolevn · 2020-12-24T09:12:52Z

docs/details/tests.md

+
+## Fuzzing
+
+[Fuzzer](https://en.wikipedia.org/wiki/Fuzzing) is when an external tool or library (fuzzer) generates a bunch of random data in hope to break your program. That means, fuzzing requires a lot of resources and is performance-critical. This is why most of the fuzzers are written on C. However, there are few Python wrappers for existing fuzzers, with awful interface but working:


with awful interface but working

Maybe we can make this part a bit more friendly?

You're right, let's be more friendly :) The reason why I say "awful" is that I still can't figure out why to pass sys.argv into atheris. I tried to search issues, readme, just to experiment with it, try to pass --help inside. No effect. The only thing I managed to do through atheris API is just to run the test function with defaults.

This is a good point, and something I've thought of before. They are actually separate for a reason though: atheris.Setup() might modify the input argument list, by removing arguments handled by libFuzzer. This allows other argument parsing libraries to subsequently be called and not think there are a lot of excess arguments.

We use this feature inside of Google. We call atheris.Setup() to remove libFuzzer arguments, then pass sys.argv to our argument-parsing code to initialize Google infrastructure, then finally call atheris.Fuzz().

Originally posted by @IanPudney in google/atheris#8 (comment)

I do agree that the Python-level user experience of current fuzzers could be improved, and hence https://hypofuzz.com/ exists, but that's more a question of "who are current fuzzers designed for" than "are they any good" 🙂

@Zac-HD is hypofuzz that project you told me about in the summer? This looks really interesting!

I would love to test this out whenever you need my help.

That's the one! I'll be in touch in the new year 😁

TheShiftedBit · 2020-12-24T13:48:19Z

With Atheris, try passing -help instead of --help. It will print out all of the libFuzzer args. LibFuzzer takes arguments that start with a single dash. Atheris then removes those arguments from sys.argv, so that they don't get mixed up with any arguments your program itself has.

orsinium · 2020-12-27T10:50:57Z

With Atheris, try passing -help instead of --help. It will print out all of the libFuzzer args

Thank you! -help=1 works. I'll put it in the usage example.

Zac-HD · 2020-12-27T13:03:03Z

This looks fantastic - congratulations @orsinium! I'll add a link to Hypothesis' docs shortly 😃

orsinium · 2020-12-28T11:32:09Z

Thank you!

orsinium added 5 commits December 1, 2020 12:57

+deal.hypothesis

13f7339

do not suppress inner pre-contract errors

923ae80

+unit tests for hypothesis

6d66459

Merge branch 'memtest' into hypothesis

79c20bd

+docs 4 hypothesis integration

98a90fb

orsinium added 4 commits December 6, 2020 13:20

make deal.cases capable to be a decorator

f3849b4

cache deal.cases properties

f7109c3

drop deal.hypothesis

2d74e62

make settings and seed redefinable through deal.cases

2d297fc

orsinium added 3 commits December 6, 2020 14:20

fix typing issues

519c7a8

better docstrings for deal.cases

b1af29d

provide maethod for fuzzing

ca910c2

orsinium added 6 commits December 7, 2020 18:29

make single dispatch based API for deal.cases

b9dbe30

reduce pytest traceback

52fe9c9

actualize examples to use decorator

775afe8

actualize basic testing docs

1628dd4

adjust hypothesis settings

8287ba3

docs 4 fuzzing, seed, strategies

7b6a367

orsinium added 5 commits December 10, 2020 16:11

docs for iteration and hypothesis decorators

cf56f58

support pytest fixtures

8f6d56d

drop unused import

7f30862

test reproduce_failure

b8811ce

actualize badges

1c1c716

Zac-HD reviewed Dec 16, 2020

View reviewed changes

sobolevn reviewed Dec 24, 2020

View reviewed changes

Apply review from @Zac-HD

778d8d5

Apply review from @sobolevn

dc0f763

orsinium merged commit 6b17398 into master Dec 27, 2020

orsinium deleted the hypothesis branch December 27, 2020 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypothesis #71

Hypothesis #71

orsinium commented Dec 2, 2020

orsinium commented Dec 2, 2020

orsinium commented Dec 2, 2020

Zac-HD commented Dec 3, 2020 •

edited

Loading

orsinium commented Dec 3, 2020

Zac-HD commented Dec 4, 2020

orsinium commented Dec 4, 2020

orsinium commented Dec 6, 2020

Zac-HD commented Dec 7, 2020

orsinium commented Dec 7, 2020 •

edited

Loading

Zac-HD commented Dec 7, 2020

Zac-HD commented Dec 7, 2020

orsinium commented Dec 7, 2020

Zac-HD left a comment

Zac-HD Dec 16, 2020

Zac-HD Dec 16, 2020

sobolevn Dec 24, 2020

orsinium Dec 24, 2020

Zac-HD Dec 24, 2020 •

edited

Loading

sobolevn Dec 24, 2020

Zac-HD Dec 24, 2020

TheShiftedBit commented Dec 24, 2020

orsinium commented Dec 27, 2020

Zac-HD commented Dec 27, 2020

orsinium commented Dec 28, 2020


		## Fuzzing

		[Fuzzer](https://en.wikipedia.org/wiki/Fuzzing) is when an external tool or library (fuzzer) generates a bunch of random data in hope to break your program. That means, fuzzing requires a lot of resources and is performance-critical. This is why most of the fuzzers are written on C. However, there are few Python wrappers for existing fuzzers, with awful interface but working:

Hypothesis #71

Hypothesis #71

Conversation

orsinium commented Dec 2, 2020

orsinium commented Dec 2, 2020

orsinium commented Dec 2, 2020

Zac-HD commented Dec 3, 2020 • edited Loading

orsinium commented Dec 3, 2020

Zac-HD commented Dec 4, 2020

orsinium commented Dec 4, 2020

orsinium commented Dec 6, 2020

Zac-HD commented Dec 7, 2020

orsinium commented Dec 7, 2020 • edited Loading

Zac-HD commented Dec 7, 2020

Zac-HD commented Dec 7, 2020

orsinium commented Dec 7, 2020

Zac-HD left a comment

Choose a reason for hiding this comment

Zac-HD Dec 16, 2020

Choose a reason for hiding this comment

Zac-HD Dec 16, 2020

Choose a reason for hiding this comment

sobolevn Dec 24, 2020

Choose a reason for hiding this comment

orsinium Dec 24, 2020

Choose a reason for hiding this comment

Zac-HD Dec 24, 2020 • edited Loading

Choose a reason for hiding this comment

sobolevn Dec 24, 2020

Choose a reason for hiding this comment

Zac-HD Dec 24, 2020

Choose a reason for hiding this comment

TheShiftedBit commented Dec 24, 2020

orsinium commented Dec 27, 2020

Zac-HD commented Dec 27, 2020

orsinium commented Dec 28, 2020

Zac-HD commented Dec 3, 2020 •

edited

Loading

orsinium commented Dec 7, 2020 •

edited

Loading

Zac-HD Dec 24, 2020 •

edited

Loading