Hypothesis testing #20914

asmeurer · 2021-02-05T23:45:24Z

I'd like to use this issue to discuss the idea of using hypothesis in SymPy.

For those who don't know, hypothesis is a library that lets you do property based testing. You tell it what the input to your function should look like and assert what properties should always hold, and it tries to find inputs that falsify that. Here's an example to show what a hypothesis test looks like

from hypothesis import given
from hypothesis.strategies import integers

 @given(integers())
 def test_factorint(x):
     f = factorint(x)
     assert Mul(*[b**e for b, e in f.items()]) == x
     for b in f:
         assert abs(b) in [0, 1] or isprime(b)

(note that this test as written will fail because it takes too long for some inputs, but it's just to give the idea of what a property based test looks like)

Hypothesis is an extremely powerful tool. It is very good at finding examples that fail your tests, things which you would never think to test yourself. However, it's also very picky. As soon as it finds a failing test, it remembers it and always reports it. So it's only useful to add it to SymPy in a place where either the code currently works, or we are willing to fix any bugs that it finds.

There has been a lot of discussion on this in the past. See #17190 and #20906.

My idea with hypothesis is to start small. Much smaller than what was proposed in #17190 (although that approach can still have some merits as something to run independently to see if anything interesting pops up). For example, the example test I wrote in #20906 passes on that branch. I just ran over 20000 examples. Even it, though, is still too much to start with, IMO, because the "correct" strategy to get it to find interesting examples is nontrivial. Random polynomials do not factor or have interesting roots. So you need to generate things in a way that matches what you are looking for.

I'll have to think a bit on where a good place to start would be. Ideally it would be something that is easy to generate with the builtin hypothesis strategies, something that doesn't blow up in terms of performance on certain inputs, and something where interesting inputs aren't difficult to find from the naive way of generating them.

The hardest part of writing good hypothesis tests is writing good strategies. But fortunately, strategies are reusable, so, e.g., if we created a good strategy for generating interesting expressions, then contributors would not need to worry about that to use it. Writing the test itself is generally straightforward. You just think of as many things as you want to be true of your function and assert them. It can be complicated, but not too hard once you get the hang of it. Actually, simply writing the test forces you to think about what you actually want to be true about your function (it is in some loose sense, a "spec" for your function). So insomuch as writing a property-based test is hard, that's a good thing.

Here are some slides for an presentation I gave internally to some colleagues about hypothesis. You can also take a look at the test suite for ndindex, a library that I wrote, if you want to see what hypothesis tests look like in practice (see the docs for a high level description of how the tests work).

oscarbenjamin · 2021-02-06T01:05:01Z

I think that at the current time it would be most useful to apply hypothesis testing to something like the polys domains rather than Expr.

asmeurer · 2021-03-23T21:44:04Z

Hypothesis can also be useful for finding performance issues. Take something like #20914. A hypothesis test that just tries to generate expressions but does nothing with them would have found the issue, because simply constructing Min(*symbols('x:50')) is slow (or was slow; that issue may have already been fixed).

The key problem is that when the slow code in the Min constructor was written, no one bothered tested it for a large number of symbolic inputs. With hypothesis, we don't have to rely on the person who writes the tests being disciplined enough to check their code with "large" inputs to avoid glaring performance problems. Hypothesis checks corner and pathological cases automatically.

asmeurer · 2021-04-07T23:07:52Z

Somewhere I would start would be simply testing that SymPy expressions can be constructed. That would catch issues like #20914. The next simplest thing would be to create a test for test_args (expr.func(*expr.args) == expr).

The hardest part for both of these is writing a hypothesis strategy to generate SymPy expressions. But the hard work for that was already mostly done in #17190, and we can reuse some of the stuff in test_args as well. Basically every SymPy class needs to have a hypothesis strategy that specifies what sorts of inputs it can accept.

oscarbenjamin · 2021-04-07T23:22:03Z

I think that would be good. The actual hardest part is fixing the code though. If you want to find examples breaking the func-args invariant then that's not hard. It's a bunch of work to do anything about it though!

asmeurer · 2021-04-08T21:18:17Z

Here's a good real world example where a hypothesis test was very simple to write (took me about a minute), and it found a bug in a PR that probably would have gone unnoticed #21259 (comment).

smichr · 2022-02-24T22:10:01Z

But the hard work for that was already mostly done in #17190, and

If anyone reading this is interested in just generating some random expressions I gave a simple generator here.

asmeurer · 2022-03-22T23:35:06Z

I've added this as a GSoC idea. I think playing around with hypothesis and seeing how far we can get with it would be a great GSoC project. At the very least we should find a decent number of bugs (which we will ideally also fix). I would warn anyone interested in this project however that it may be harder than it appears, and I would very highly recommend getting some experience using hypothesis first if you don't already have some.

dianetc · 2023-07-26T19:05:54Z

Introducing hypothesis to sympy via #25428

asmeurer mentioned this issue Feb 5, 2021

polys: make is_disjoint strict #20906

Merged

oscarbenjamin added the Testing Related to the test runner. Do not use for test failures unless it relates to the test runner itself label Feb 10, 2021

asmeurer mentioned this issue Apr 7, 2021

fix(core): fix pickling of Basic to use __getnewargs__ #21260

Merged

asmeurer mentioned this issue Apr 8, 2021

Improved as_relational for Range #21201 #21259

Merged

asmeurer mentioned this issue Sep 28, 2021

print -oo relational first #22152

Merged

asmeurer mentioned this issue Feb 7, 2022

Rewritten Wolfram Mathematica language parser from scratch #23014

Merged

asmeurer mentioned this issue Mar 2, 2022

Look into using type hints #17945

Open

asmeurer mentioned this issue Feb 9, 2023

added einsum_to_sympy_array #24692

Open

asmeurer mentioned this issue Mar 16, 2023

Use Generalisation of Lucas' Theorem for binomial(n, m) mod q #24891

Merged

dianetc mentioned this issue Jul 26, 2023

Utilizing Hypothesis in the ntheory/polys directory #25428

Merged

Zac-HD mentioned this issue Sep 17, 2023

Tidy up property-based tests #25692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypothesis testing #20914

Hypothesis testing #20914

asmeurer commented Feb 5, 2021

oscarbenjamin commented Feb 6, 2021

asmeurer commented Mar 23, 2021

asmeurer commented Apr 7, 2021

oscarbenjamin commented Apr 7, 2021

asmeurer commented Apr 8, 2021

smichr commented Feb 24, 2022

asmeurer commented Mar 22, 2022

dianetc commented Jul 26, 2023

Hypothesis testing #20914

Hypothesis testing #20914

Comments

asmeurer commented Feb 5, 2021

oscarbenjamin commented Feb 6, 2021

asmeurer commented Mar 23, 2021

asmeurer commented Apr 7, 2021

oscarbenjamin commented Apr 7, 2021

asmeurer commented Apr 8, 2021

smichr commented Feb 24, 2022

asmeurer commented Mar 22, 2022

dianetc commented Jul 26, 2023