Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hypothesis testing #20914

Open
asmeurer opened this issue Feb 5, 2021 · 8 comments
Open

Hypothesis testing #20914

asmeurer opened this issue Feb 5, 2021 · 8 comments
Labels
Testing Related to the test runner. Do not use for test failures unless it relates to the test runner itself

Comments

@asmeurer
Copy link
Member

asmeurer commented Feb 5, 2021

I'd like to use this issue to discuss the idea of using hypothesis in SymPy.

For those who don't know, hypothesis is a library that lets you do property based testing. You tell it what the input to your function should look like and assert what properties should always hold, and it tries to find inputs that falsify that. Here's an example to show what a hypothesis test looks like

from hypothesis import given
from hypothesis.strategies import integers

 @given(integers())
 def test_factorint(x):
     f = factorint(x)
     assert Mul(*[b**e for b, e in f.items()]) == x
     for b in f:
         assert abs(b) in [0, 1] or isprime(b)

(note that this test as written will fail because it takes too long for some inputs, but it's just to give the idea of what a property based test looks like)

Hypothesis is an extremely powerful tool. It is very good at finding examples that fail your tests, things which you would never think to test yourself. However, it's also very picky. As soon as it finds a failing test, it remembers it and always reports it. So it's only useful to add it to SymPy in a place where either the code currently works, or we are willing to fix any bugs that it finds.

There has been a lot of discussion on this in the past. See #17190 and #20906.

My idea with hypothesis is to start small. Much smaller than what was proposed in #17190 (although that approach can still have some merits as something to run independently to see if anything interesting pops up). For example, the example test I wrote in #20906 passes on that branch. I just ran over 20000 examples. Even it, though, is still too much to start with, IMO, because the "correct" strategy to get it to find interesting examples is nontrivial. Random polynomials do not factor or have interesting roots. So you need to generate things in a way that matches what you are looking for.

I'll have to think a bit on where a good place to start would be. Ideally it would be something that is easy to generate with the builtin hypothesis strategies, something that doesn't blow up in terms of performance on certain inputs, and something where interesting inputs aren't difficult to find from the naive way of generating them.

The hardest part of writing good hypothesis tests is writing good strategies. But fortunately, strategies are reusable, so, e.g., if we created a good strategy for generating interesting expressions, then contributors would not need to worry about that to use it. Writing the test itself is generally straightforward. You just think of as many things as you want to be true of your function and assert them. It can be complicated, but not too hard once you get the hang of it. Actually, simply writing the test forces you to think about what you actually want to be true about your function (it is in some loose sense, a "spec" for your function). So insomuch as writing a property-based test is hard, that's a good thing.

Here are some slides for an presentation I gave internally to some colleagues about hypothesis. You can also take a look at the test suite for ndindex, a library that I wrote, if you want to see what hypothesis tests look like in practice (see the docs for a high level description of how the tests work).

@oscarbenjamin
Copy link
Contributor

I think that at the current time it would be most useful to apply hypothesis testing to something like the polys domains rather than Expr.

@oscarbenjamin oscarbenjamin added the Testing Related to the test runner. Do not use for test failures unless it relates to the test runner itself label Feb 10, 2021
@asmeurer
Copy link
Member Author

Hypothesis can also be useful for finding performance issues. Take something like #20914. A hypothesis test that just tries to generate expressions but does nothing with them would have found the issue, because simply constructing Min(*symbols('x:50')) is slow (or was slow; that issue may have already been fixed).

The key problem is that when the slow code in the Min constructor was written, no one bothered tested it for a large number of symbolic inputs. With hypothesis, we don't have to rely on the person who writes the tests being disciplined enough to check their code with "large" inputs to avoid glaring performance problems. Hypothesis checks corner and pathological cases automatically.

@asmeurer
Copy link
Member Author

asmeurer commented Apr 7, 2021

Somewhere I would start would be simply testing that SymPy expressions can be constructed. That would catch issues like #20914. The next simplest thing would be to create a test for test_args (expr.func(*expr.args) == expr).

The hardest part for both of these is writing a hypothesis strategy to generate SymPy expressions. But the hard work for that was already mostly done in #17190, and we can reuse some of the stuff in test_args as well. Basically every SymPy class needs to have a hypothesis strategy that specifies what sorts of inputs it can accept.

@oscarbenjamin
Copy link
Contributor

I think that would be good. The actual hardest part is fixing the code though. If you want to find examples breaking the func-args invariant then that's not hard. It's a bunch of work to do anything about it though!

@asmeurer
Copy link
Member Author

asmeurer commented Apr 8, 2021

Here's a good real world example where a hypothesis test was very simple to write (took me about a minute), and it found a bug in a PR that probably would have gone unnoticed #21259 (comment).

@smichr
Copy link
Member

smichr commented Feb 24, 2022

But the hard work for that was already mostly done in #17190, and

If anyone reading this is interested in just generating some random expressions I gave a simple generator here.

@asmeurer
Copy link
Member Author

I've added this as a GSoC idea. I think playing around with hypothesis and seeing how far we can get with it would be a great GSoC project. At the very least we should find a decent number of bugs (which we will ideally also fix). I would warn anyone interested in this project however that it may be harder than it appears, and I would very highly recommend getting some experience using hypothesis first if you don't already have some.

@dianetc
Copy link
Contributor

dianetc commented Jul 26, 2023

Introducing hypothesis to sympy via #25428

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing Related to the test runner. Do not use for test failures unless it relates to the test runner itself
Projects
None yet
Development

No branches or pull requests

4 participants