Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis #71

Merged
merged 25 commits into from
Dec 27, 2020
Merged

Hypothesis #71

merged 25 commits into from
Dec 27, 2020

Conversation

orsinium
Copy link
Member

@orsinium orsinium commented Dec 2, 2020

Better compatibility with Hypothesis. Based on the official Hypothesis' integration with dpcontracts.

@orsinium
Copy link
Member Author

orsinium commented Dec 2, 2020

@Zac-HD review, please :)

@orsinium
Copy link
Member Author

orsinium commented Dec 2, 2020

I had it on my list for a long time and your email reminded me about it. I don't want to drop deal.cases because of its simplicity (you're right, it is based on hypothesis-auto which has the same motivation). So, let's provide a new function that works like fullfill from hypothesis integration with dpcontracts and internally uses the same logic to reject wrong options checking them through contracts. And also, it suppresses allowed exceptions. So, does internally the same as deal.cases.

@Zac-HD
Copy link

Zac-HD commented Dec 3, 2020

I know that deal.cases is simple, but I really, really don't like it sorry 😭

  • It doesn't, and can't, support replaying failing examples on subsequent runs - and because it's also nondeterministic, that means every test is flaky 😬
  • It also doesn't, and can't, support shrinking examples - so failures will be way more complicated and difficult to debug than they need to be 😢

Unfortunately this is inherent in the "generate a list of inputs" design; Hypothesis works by executing each input and seeing what happens. For example if Hypothesis generates an input of type e.g. Callable[[int, int], float], you can only call it within the original @given-decorated test function: there isn't actually a strong distinction between "generating inputs" and "running the test"!

I'll open a PR soon with an API I'd recommend:

@deal.raises(ZeroDivisionError)
def div(a: int, b: int) -> float:
    assert a == 1
    return a / b

# BEFORE
for case in deal.cases(div, kwargs=dict(a=1)):
    case()

# AFTER
div.get_test(a=1)()           # passes
div.get_test(a=st.just(1))()  # passes
div.get_test()()              # fails with a=0

That's implemented along the lines of

div = ...
def get_test(**kwargs):
	kwargs = {k: v if isinstance(st.SearchStrategy) else st.just(v) for k, v in kwargs.items()}

    @given(st.fixed_dictionaries(kwargs))
    def test(kw):
        deal.hypothesis(div)(**kw)  # as already in this PR

    # add some magic to make the Hypothesis database work nicely here
    return test
div.get_test = get_test

HypothesisWorks/hypothesis#2679 has some more details about a recent conversation with the icontract developers; this is pretty much the same idea but adapted slightly to the Deal api. I also have some ideas about how to make pytest integration easy-to-automatic, but more on that once the basics are working 🙂

@orsinium
Copy link
Member Author

orsinium commented Dec 3, 2020

Hey, thank you for helping me there!

I know about the limitations of the current implementation of deal.cases and I designed it having in mind the tradeoffs. The deal.case function is designed as API-first rather than Hypothesis-first. So, I'm for any changes until it doesn't change the API. The API was designed together with students based on what works best from the cognitive perspective. Understanding of iteration goes much better rather than the understanding of the ability of decorators to totally rewrite your function and call it n-times when you call it once (how it is done in Hypothesis).

Unfortunately this is inherent in the "generate a list of inputs" design

I'm not sure how limiting the API of deal.case. It provides not a list but an iterator. So, you still can communicate failures back and do shrinking, providing more test cases based on the result of execution of the previous ones. I had a few failed attempts to do so and the issue I had is I can't extend Hypothesis for that purpose since it hides most of the logic in functions inside of functions rather than classes with small methods.

What about reproducing failures, it definitely doesn't depend on how you generate the test cases. I have on my list to provide the seed argument and feed it into Hypothesis.

To summarize, I understand the limitations and see the current API as the most simple one to understand which I value more than a bit more beautiful failure reports.

On the other hand, I want to provide the extendability and room to use the full power of Hypothesis at need. This is why I'm happy to provide the fulfill function as in this PR. Is something wrong with this implementation of the wrapper?

@Zac-HD
Copy link

Zac-HD commented Dec 4, 2020

Your implementation of deal.hypothesis() works perfectly! I'd personally try to find a more specific name, but either way once it's merged into Deal I'll update the Hypothesis docs for our dpcontracts extra to point over this way as a recommended alternative 😃


I really like your API-first approach, and working with students to design it is great! I also think you're right about the conceptual challenges of decorators... we do need them in general, but it's great to have simpler interfaces where we can.

That's why I thought that a deal.get_hypothesis_test(func) function could be a nice API: there's no decorator involved. It's actually pretty similar to deal.cases(), except that instead of returning an iterable of callables, it returns a single callable which has the loop internally. (OK, and the test is a function instead of a NamedTuple)

This is also important because deal.cases() doesn't actually work for all examples - for example

import deal
from typing import Callable

@deal.pre(lambda callback: callable(callback))
def do_call(callback: Callable[[], int]) -> int:
    return callback()

# This fails, because `case()` is called outside the Hypothesis data-generation
# scope, and the generated `callback` function is therefore unable to generate
# a return value!
for case in deal.cases(do_call):
    case()

# But returning a Hypothesis-wrapped test function, which users can call like
# they would each case, does work:
test_do_call = deal.get_hypothesis_test(do_call)
test_do_call()  # or let pytest handle it ;-)

I've pushed an implementation you can run to hypothesis...Zac-HD:hypothesis


That patch also makes deal.cases() deterministic, so you don't need to thread a seed value through (though you still can if you want to). Replaying from the database is in general a lot faster than rediscovering and re-shrinking failures each time.


Unfortunately the iterator-based approach is not feasible for us... I tried something very similar for HypoFuzz as a way to interleave examples from multiple tests, but changing that much of the engine would make it unmaintainable. It would also conflict with our current requirement that wrapped test functions may only return None (not e.g. a generator), which continues to catch user errors dealing with async tests and certain test runners 😞

@orsinium
Copy link
Member Author

orsinium commented Dec 4, 2020

That's interesting points, thank you. I need a bit of time to think about it. I'm always hard to change something when it is about breaking API :)

Could you tell me a bit more about your example, please? Sorry, I didn't get it. Why Hypothesis can't generate Callable[[], int]?

@orsinium
Copy link
Member Author

orsinium commented Dec 6, 2020

Please, have a look at the new implementation. I've made it possible to use deal.cases as a decorator which returns a proper hypothesis test function:

@deal.cases(div2)
def test_decorator_div2_smoke(case):
    case()

TODO:

  • Make it possible to pass inside the function other fixtures.
  • Update the docs, describe the decorator as the preferred way.

@Zac-HD
Copy link

Zac-HD commented Dec 7, 2020

This looks good! Some quick improvements:

  • Ensure that the derandomize=True setting is used with the iterable API, to avoid flakiness - the @seed() decorator overrides this if both are used.
  • Rename the fuzz property fuzz_one_input - the former implies that there is an internal loop, whereas the latter means "BYO loop".
  • Please leave the phases setting out of your customised defaults! I'd even suggest using Hypothesis actual default settings - we have them that way for good reasons - but the phases setting is especially prone to creating surprising and confusing UX issues.

I think a simpler API is possible though: why not have deal.make_hypothesis_test(func, **kwargs) return the @given-wrapped test function? The docs could then present

  1. You can use deal.make_hypothesis_test(func) to get a pre-defined test function, which is implemented with Hypothesis. Assign it to a variable like test_func to run it via pytest, or just call it yourself!
  2. If you want to define your own test function, usually because you can make stronger assertions about a subset of allowed inputs, you can use func = deal.hypothesis_assume(func) to get a wrapped function which tells Hypothesis to skip inputs which failed preconditions or raised allowed exceptions (a better version of hypothesis.extra.dpcontracts.fulfill).
  3. The deal.cases(func) API is deprecated (and perhaps undocumented), and returns [deal.make_hypothesis_test(func)].
# current
@deal.cases(div2)
def test_decorator_div2_smoke(case):
    case()

# proposed
test_decorator_div2_smoke = deal.make_hypothesis_test(div2)

This reduces the API surface to learn, because all of the usual Hypothesis tools work in the obvious ways - you can test = settings(...)(test) or test = seed(0)(test) or even atheris.Setup(sys.argv, test.hypothesis.fuzz_one_input). Returning a function from a function is a conceptual step up from novice-level, but I don't think it's harder to grasp than the status quo of returning an iterable of callable objects.

@orsinium
Copy link
Member Author

orsinium commented Dec 7, 2020

Thank you for the review! I really appreciate your help :)

Ensure that the derandomize=True setting is used with the iterable API

I'm not sure. I expect it to generate new values each time as QuickTest does. Probably, we can change it later, in a breaking release.

Rename the fuzz property fuzz_one_input

I'm thinking even about putting it into the same __call__ as a single dispatch. Then you can use it as fuzzer-fuzz(deal.cases(div)). I'm not sure if it's not too magic but it is definitely fun 🙃

And the same with .run but this is more driven by the possibility of an error: test_div = deal.cases(div).run works but if you forget .run (which seems easy) it won't run. I think, having it all under one single dispatch will make it polymorphic, more stable, and definitely shorter.

Please leave the phases setting out of your customized defaults!

That's a good idea. Yes, I think we can do it now. Let me see if we can leave it for iteration but still keep it redefinable via the settings argument.

you can use func = deal.hypothesis_assume(func) to get a wrapped function

Do we need such an API if we have a decorator? I'm not sure. You already can define custom strategies, custom settings, wrap it into more decorators from Hypothesis, add additional operations, wrap the call of case(), provide additional checks. Am I missing something?

The deal.cases(func) API is deprecated (and perhaps undocumented), and returns

I'll update the documentation and move the iterator into the "advanced" documentation but I won't break it in any way, at least for now. Maybe, I have not many users but I see no reason to break the API or public behavior. Sorry, iteration protocol stays and returns the actual cases. Also, for a while it stays in python3 -m deal test because it requires careful changes of the behavior to still have coverage only when the actual case is running, I'll make it a bit later.

This reduces the API surface to learn because all of the usual Hypothesis tools work in the obvious ways - you can test = settings(...)(test) or test = seed(0)(test) or even atheris.Setup(sys.argv, test.hypothesis.fuzz_one_input)

I believe, it all will work after I make it a single dispatch:

test = deal.cases(div)

# This one is tricky because Hypothesis doesn't merge settings.
# This is why I provide `settings` argument.
test = settings(...)(test)

test = seed(0)(test) 
atheris.Setup(sys.argv, test)

I think I see the picture. Let me take care of single dispatch, support for passing fixtures in side the wrapped function, proper phases for the decorator, and the decorator-centric documentation, and we're ready to go :)

@Zac-HD
Copy link

Zac-HD commented Dec 7, 2020

Oooh, yeah, the single-dispatch approach is lovely. Definitely do that!

Keeping iterable cases also makes sense - I totally respect backwards compatiblity (enough to do this); and making it non-default is enough. Generating new values each time also makes sense, but I'd suggest using a random @seed(x) internally and then printing the seed value if any of the cases fail - that way it's still possible to reproduce failing cases even without the database.

Overall, thanks for all your work on this - I feel like I've just dumped piles of not-always-positive feedback, and you've come to a fantastic design and API at the end. Very very nice work 😄

@Zac-HD
Copy link

Zac-HD commented Dec 7, 2020

Do we need such an API if we have a decorator? I'm not sure. You already can define custom strategies, custom settings, wrap it into more decorators from Hypothesis, add additional operations, wrap the call of case(), provide additional checks. Am I missing something?

No, we don't need it; even @reproduce_failure will work with your version I think 😄

@orsinium
Copy link
Member Author

orsinium commented Dec 7, 2020

but I'd suggest using a random @seed(x) internally and then printing the seed value if any of the cases fail

That's a great idea! Let me think about how to make it better, not only for the test CLI command. Maybe, I can try to plug into the exception message :)

I feel like I've just dumped piles of not-always-positive feedback

It's ok, I'm the most not-welcoming-changes person here. I said right at the beginning that I am stubborn when it comes to breaking API. I'm happy that we managed to make it friendly and compatible.

even @reproduce_failure will work with your version I think

Right, reproduce_failure is an important thing, we have to be sure that it works, I'll try to make a test for it.

Copy link

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments below, but overall this looks fantastic! I look forward to recommending Deal to anyone who wants a property-testable contracts library 😃

README.md Outdated Show resolved Hide resolved
deal/_testing.py Outdated Show resolved Hide resolved
func: typing.Callable, *,
count: int = 50,
kwargs: typing.Dict[str, typing.Any] = None,
check_types: bool = True,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realise there are backwards compatiblity issues, but I'd rename check_types to check_return_type and make the former a deprecated alias. As-is, the name implies that argument types are also checked (or that =False disables such a check).

deal/_testing.py Outdated Show resolved Hide resolved
"""
Iterate over `iterator` disabling tracer on every iteration step.
This is a trick to avoid using our coverage tracer when calling hypothesis machinery.
Without it, testing is about 3 times slower.
"""
iterator = iter(items)
default_trace = sys.gettrace()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this inside the loop body, in case the user has e.g. started a debugger between your first invocation of this function and the end of iteration. It's a little slower, but avoids a weird-if-rare edge case.

docs/basic/tests.md Show resolved Hide resolved
tests/test_doctest.py Show resolved Hide resolved
docs/basic/tests.md Show resolved Hide resolved
docs/details/tests.md Show resolved Hide resolved
docs/details/tests.md Outdated Show resolved Hide resolved

## Fuzzing

[Fuzzer](https://en.wikipedia.org/wiki/Fuzzing) is when an external tool or library (fuzzer) generates a bunch of random data in hope to break your program. That means, fuzzing requires a lot of resources and is performance-critical. This is why most of the fuzzers are written on C. However, there are few Python wrappers for existing fuzzers, with awful interface but working:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with awful interface but working

Maybe we can make this part a bit more friendly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, let's be more friendly :) The reason why I say "awful" is that I still can't figure out why to pass sys.argv into atheris. I tried to search issues, readme, just to experiment with it, try to pass --help inside. No effect. The only thing I managed to do through atheris API is just to run the test function with defaults.

Copy link

@Zac-HD Zac-HD Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point, and something I've thought of before. They are actually separate for a reason though: atheris.Setup() might modify the input argument list, by removing arguments handled by libFuzzer. This allows other argument parsing libraries to subsequently be called and not think there are a lot of excess arguments.

We use this feature inside of Google. We call atheris.Setup() to remove libFuzzer arguments, then pass sys.argv to our argument-parsing code to initialize Google infrastructure, then finally call atheris.Fuzz().

Originally posted by @IanPudney in google/atheris#8 (comment)

I do agree that the Python-level user experience of current fuzzers could be improved, and hence https://hypofuzz.com/ exists, but that's more a question of "who are current fuzzers designed for" than "are they any good" 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zac-HD is hypofuzz that project you told me about in the summer? This looks really interesting!

I would love to test this out whenever you need my help.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the one! I'll be in touch in the new year 😁

@TheShiftedBit
Copy link

With Atheris, try passing -help instead of --help. It will print out all of the libFuzzer args. LibFuzzer takes arguments that start with a single dash. Atheris then removes those arguments from sys.argv, so that they don't get mixed up with any arguments your program itself has.

@orsinium
Copy link
Member Author

With Atheris, try passing -help instead of --help. It will print out all of the libFuzzer args

Thank you! -help=1 works. I'll put it in the usage example.

@orsinium orsinium merged commit 6b17398 into master Dec 27, 2020
@orsinium orsinium deleted the hypothesis branch December 27, 2020 11:42
@Zac-HD
Copy link

Zac-HD commented Dec 27, 2020

This looks fantastic - congratulations @orsinium! I'll add a link to Hypothesis' docs shortly 😃

@orsinium
Copy link
Member Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants