Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix moments method to support arrays and list #12197

Merged
merged 29 commits into from Aug 27, 2021

Conversation

tirthasheshpatel
Copy link
Member

Reference issue

fixes #12192

What does this implement/fix?

This pr adds list and array support for moment method of every distribution.

Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that's all it took! Could you add a test? Take a look at the existing tests that run for every distribution. It would be good to check the behavior for each to catch bugs like gh-11746.

scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
@AtsushiSakai AtsushiSakai added scipy.stats enhancement A new feature or improvement labels May 23, 2020
@mdhaber
Copy link
Contributor

mdhaber commented May 23, 2020

There is a section in the contributor guide that answers your question.

@@ -1186,8 +1186,14 @@ def moment(self, n, *args, **kwds):

"""
args, loc, scale = self._parse_args(*args, **kwds)
if not (self._argcheck(*args) and (scale > 0)):
return nan
arrs = tuple(map(np.asarray, [*args, loc, scale]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first assignment appears to be unused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*args? I think I did it because loc and scale parameters need to be of the same shape as inputs (that are present in args) otherwise the output isn't consistent with input shapes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line is not needed.

arrs = tuple(map(np.asarray, [*args, loc, scale]))
arrs = tuple(map(np.atleast_1d, [*args, loc, scale]))

Doing this causes the return value to always be an array if I'm understanding the code correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atleast_1d receives a list of arrays already, so the tuple(map(...)) dance can be avoided.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the review!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need the arrs = tuple(map(np.asarray, [*args, loc, scale])) even?
arrs = np.broadcast_arrays(*(*args, loc, scale)) accepts inputs that are not already arrays.

@@ -1186,8 +1186,14 @@ def moment(self, n, *args, **kwds):

"""
args, loc, scale = self._parse_args(*args, **kwds)
if not (self._argcheck(*args) and (scale > 0)):
return nan
arrs = tuple(map(np.asarray, [*args, loc, scale]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line is not needed.

arrs = tuple(map(np.asarray, [*args, loc, scale]))
arrs = tuple(map(np.atleast_1d, [*args, loc, scale]))

Doing this causes the return value to always be an array if I'm understanding the code correctly.

result += comb(n, k, exact=True)*(fac**k) * valk
result += fac**n * val
return result * loc**n
cond1 = (loc == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to do all the additional work even for loc=0. You should be able to use argsreduce to restrict to the values that need computing. E.g.

cond = (loc == loc) & (scale == scale)
if np.any(cond):
  goodargs = argsreduce(cond, loc, scale)
  ...
  place(result, cond, ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to use _lazywhere to fill nan and circumvent redundant calculations. Is it ok or should I try for this fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still has to do some sort of calculation there, though. You use _lazywhere to fill facwith nans, but as far as I can tell, that doesn't really circumvent redundant calculations, right? All sorts of operations are done on fac. For the nans, these operations aren't doing anything useful, right? I do think it would be a good idea to try implementing this suggestion.

@mdhaber
Copy link
Contributor

mdhaber commented Jul 30, 2020

I forgot about this one, @tirthasheshpatel. I can start reviewing this one too, but it looks like there are some comments already.

@tirthasheshpatel
Copy link
Member Author

@mdhaber Sorry for stalling this. I will try to resume my work on this in a couple of days!

@mdhaber
Copy link
Contributor

mdhaber commented Sep 8, 2020

Hi @tirthasheshpatel, just thought I should check in. Are you still interested in continuing with this and gh-12585? We really appreciate it!

@tirthasheshpatel
Copy link
Member Author

@mdhaber I am very sorry to delay my work on this PR. I won't be able to continue my work on this while working on #12585. I will open a PR for #12585 as soon as I get it working. I hope you don't mind. Again, sorry for wasting so much time!

@mdhaber
Copy link
Contributor

mdhaber commented Sep 9, 2020

@tirthasheshpatel no problem; we'll get back to this after merging gh-12585

@mdhaber
Copy link
Contributor

mdhaber commented Nov 26, 2020

@tirthasheshpatel I resolved the merge conflict. Please check that the changes from the first six commits and the changes from all commits are basically identical (I think I accidentally added a PEP8 newline).

I think gh-13103 is actually going to take a little while to figure out. In the meantime, if you'd like to come back to this, I'd be happy to review.

@tirthasheshpatel
Copy link
Member Author

tirthasheshpatel commented Nov 27, 2020

Thanks for resolving the conflicts, @mdhaber! I have pushed some new changes which should work fine but some redundant calculations still exists. I will try to figure out a way to work around those too. Meanwhile, would appreciae a review.

@mdhaber
Copy link
Contributor

mdhaber commented Nov 29, 2020

Sure I started taking a look. Can you add those tests?

remove redundant computations and fix the behaviour for array inputs
@tirthasheshpatel
Copy link
Member Author

Thanks very much for the review, @tupui! I have applied all your suggestion now.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of time to go over again. LGTM, the logic looks correct just a minor thing I missed last time, and you added quite some tests (I tested this manually and it checks out in terms of shape and values). Thanks for the great work. @mdhaber feel free to merge (not sure if you want to squash or merge, you are more involved).

scipy/stats/_distn_infrastructure.py Outdated Show resolved Hide resolved
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
@mdhaber
Copy link
Contributor

mdhaber commented Jul 16, 2021

@Zac-HD
This (and other methods of SciPy stats distributions) could be a good example of where Hypothesis would be useful in SciPy.

We'd like to check that all methods of all distributions are vectorized properly. This PR is just about the moment method, so let's take the moment method of the normal distribution as an example.

norm.moment(n, loc=loc, scale=scale)

should behave the same as

ref(n, loc=loc, scale=scale)

where

ref = np.vectorize(norm.moment, otype=[float])

and n is an integer (the order of the moment), loc (the location parameter of the distribution) a real or array-like of reals, and scale (the scale parameter of the distribution) is a real or array-like of reals. We'd want to check the shape of the output, the data type, and assert_allclose on the data itself.

One minor complication is that some distributions have additional shape parameters, which have varying valid ranges, and we'd want to make sure that at least some combinations of inputs are valid. (When there is an invalid parameter, there will be a nan in the output array.)

Would you be able to help us set up something - just for the moment method - like you did for NumPy in numpy/numpy#15189? (Good to start small, as there will undoubtedly be tons of failures.)

With a concrete example like this, we could ask the mailing list whether they agree that this sort of functionality is worth the additional test-time dependency.

@Zac-HD
Copy link
Contributor

Zac-HD commented Jul 16, 2021

I'd be delighted to help out! Here's a quick prototype:

import numpy as np
import hypothesis.extra.numpy as npst
from hypothesis import given, strategies as st

from scipy.stats import norm

# A strategy to generate the shape of a one-dimensional array of length [0..7]
# Subsequent draws will use the same shape to ensure compatibility.  For fancier
# tricks see `npst.broadcastable_shapes()` or `mutually_broadcastable_shapes()`.
shared_shapes = st.shared(st.tuples(st.integers(0, 7)))


def scalar_or_array(**kwargs):
    return st.floats(**kwargs) | npst.arrays(
        dtype=float, shape=shared_shapes, elements=kwargs
    )


@given(
    # Our moment `n`, between one and four (inclusive)
    n=st.integers(1, 4),
    # Our locations and scale are each either a scalar (Python-native) float,
    # or an ndarray of floats.  If both are arrays, they will have the same shape.
    loc=scalar_or_array(min_value=0, max_value=1000),  # what should the bounds be?
    scale=scalar_or_array(min_value=1, max_value=1000),
)
def test_moments(n, loc, scale):
    got = norm.moment(n, loc=loc, scale=scale)
    ref = np.vectorize(norm.moment, otypes=[float])(n, loc=loc, scale=scale)

    assert type(got) == type(ref)  # fails; vectorize returns zero-dim arrays
    if isinstance(got, np.ndarray):
        assert got.shape == ref.shape
        assert got.dtype == ref.dtype
    np.testing.assert_allclose(got, ref)

On my machine with scipy == 1.7.0, this fails with the errors you'd expect of a version that doesn't support array inputs yet:

________________________________ test_moments _________________________________
Traceback (most recent call last):
  File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 20, in test_moments
    # Our moment `n`, between one and four (inclusive)
  File "c:\users\zac\documents\github\hypothesis\hypothesis-python\src\hypothesis\core.py", line 1180, in wrapped_test
    raise the_error_hypothesis_found
hypothesis.errors.MultipleFailures: Hypothesis found 4 distinct failures.
--------------------------------- Hypothesis ----------------------------------
Falsifying example: test_moments(
    n=1, loc=array([0., 0.]), scale=1.0,
)
Traceback (most recent call last):
  File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 28, in test_moments
    got = norm.moment(n, loc=loc, scale=scale)
  File "c:\users\zac\anaconda3\envs\hydev\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1291, in moment
    if loc == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Falsifying example: test_moments(
    n=1, loc=0.0, scale=array([1., 1.]),
)
Traceback (most recent call last):
  File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 28, in test_moments
    got = norm.moment(n, loc=loc, scale=scale)
  File "c:\users\zac\anaconda3\envs\hydev\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1274, in moment
    if not (self._argcheck(*args) and (scale > 0)):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Falsifying example: test_moments(
    n=1, loc=array([], dtype=float64), scale=1.0,
)
Traceback (most recent call last):
  File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 28, in test_moments
    got = norm.moment(n, loc=loc, scale=scale)
  File "c:\users\zac\anaconda3\envs\hydev\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1295, in moment
    fac = float(scale) / float(loc)
TypeError: only size-1 arrays can be converted to Python scalars

Falsifying example: test_moments(
    n=1, loc=array([], dtype=float64), scale=array([], dtype=float64),
)
Traceback (most recent call last):
  File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 31, in test_moments
    assert type(got) == type(ref)
AssertionError: assert <class 'float'> == <class 'numpy.ndarray'>
 +  where <class 'float'> = type(nan)
 +  and   <class 'numpy.ndarray'> = type(array([], dtype=float64))

Hopefully this is a useful starting point? Further elaboration will probably require deeper knowledge (or judgement) about scipy edge cases than I have, but I'd be up for a call or pairing session too if that would help 😁

@tirthasheshpatel
Copy link
Member Author

tirthasheshpatel commented Jul 16, 2021

Hi @Zac-HD

I'd be delighted to help out!

Thanks!

Here's a quick prototype

I just had a quick look and it looks really good! I have never used hypothesis so I will need to read some docs today and can start cooking up my own version in some time. If you think you have something useful (which you possibly would before I learn to use hypothesis), feel free to just create a PR on my fork. I will merge it and we can then review it here.

    loc=scalar_or_array(min_value=0, max_value=1000),  # what should the bounds be?
    scale=scalar_or_array(min_value=1, max_value=1000),

I would throw in some invalid values too just to make it a bit more comprehensive and probably catch some edge cases. Something like:

    loc=scalar_or_array(min_value=0, max_value=1000),  # what should the bounds be?
    scale=scalar_or_array(min_value=-1, max_value=1000),

Hopefully this is a useful starting point? Further elaboration will probably require deeper knowledge (or judgement) about scipy edge cases than I have, but I'd be up for a call or pairing session too if that would help

Sounds very helpful, thanks.

@tirthasheshpatel
Copy link
Member Author

tirthasheshpatel commented Aug 3, 2021

@Zac-HD @mdhaber I have added a basic test using hypothesis using @Zac-HD's prototype as a starting point. I removed the line:

    assert type(got) == type(ref)  # fails; vectorize returns zero-dim arrays

because np.vectorize returns np.ndarray for scalar values while stats.moment returns python floats. We can also explicitly check for the condition when a scalar is expected and assert accordingly. The other thing I have changed is the range of scale. Instead of (1, 1000), I have used (-1, 1000) so that sometimes invalid values are present. All tests pass locally.

Also added hypothesis to github workflows CI (not azure yet).

Let me know what you think.

Copy link
Member

@tupui tupui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have nothing against adding hypothesis (although it adds yet another layer of complexity: pytests, asv, linters, now this), but might be worth an email in the mailing list to at least present it. Also, could you add it in https://scipy.github.io/devdocs/dev/toolchain.html? and there might be other places like the environment.yml and the testing guide.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 3, 2021

might be worth an email in the mailing list to at least present it

Yes, that's the plan. We just wanted to have a concrete example of how hypothesis can be used to improve tests.

# A strategy to generate the shape of a one-dimensional array of length [0..7]
# Subsequent draws will use the same shape to ensure compatibility. For fancier
# tricks see `npst.broadcastable_shapes()` or `mutually_broadcastable_shapes()`.
shared_shapes = st.shared(st.tuples(st.integers(0, 7)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, to show the advantage of Hypothesis, I think we want this test to go beyond what the tests above can do - show that this works for any distribution and any broadcastable loc, scale, and shape parameters. (I think it would also be good to nudge hypothesis toward occasionally testing with invalid parameters. I'm not sure how Hypothesis chooses values under the hood, but if we give it such a wide range for scale, is it likely to include in invalid value < 0 in one of the tests?). @tirthasheshpatel @Zac-HD if you are willing to go a little deeper into this, terrific; if not, I understand and I can get to this at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend having separate tests for always-valid and might-be-invalid inputs; otherwise you eventually and without realising write a test which never checks the valid case 😅

For a deeper test, you're going to want the mutually_broadcastable_shapes() strategy - there's an example in Numpy for np.clip() here. Where signatures are consistent you could use @pytest.mark.parametrize to supply the moment function - using that to supply some arguments and @hypothesis.given() for others works fine - but where signatures differ it's probably easier to write separate tests.

(This is getting pretty far from the original intent to fix a bug though, and I've seen projects take quite a while to settle on using Hypothesis. Might be worth merging this PR, and then tackling the test changes in a new one?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @Zac-HD and @mdhaber! I agree we should use Hypothesis to its fullest and add a deeper test. I have some time this and next week so I can go a little deeper and add a more comprehensive test.

This is getting pretty far from the original intent to fix a bug though, and I've seen projects take quite a while to settle on using Hypothesis. Might be worth merging this PR, and then tackling the test changes in a new one?

I also think this would be better. Though I will try to push something or at least confirm locally using Hypothesis that the bug-fix works. I can then propose a PR and we can discuss Hypothesis in detail there. What do you think, @mdhaber?

Copy link
Contributor

@mdhaber mdhaber Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a local test with Hypothesis is fine. This PR has been tricky to get right, so a more comprehensive test would give additional confidence that the bug fix is ready to be merged.
And yes, it is actually preferable for the addition of Hypothesis to go in a separate PR.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 26, 2021

@tupui: @tirthasheshpatel and I discussed in gh-13581 that we should revert to 7b69ef6 and merge this PR as-is. We will add more comprehensive tests using Hypothesis in a separate PR. Does that still sound good to you? Would you be interested in pressing the big green button after the revert, as I wrote a fair amount of this?

@tirthasheshpatel
Copy link
Member Author

we should revert to 7b69ef6

Reverted.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 27, 2021

No need to rebase. Squash will take care of it.

@tupui
Copy link
Member

tupui commented Aug 27, 2021

Do I understand that you prefer a squash here @mdhaber?

@mdhaber
Copy link
Contributor

mdhaber commented Aug 27, 2021

Yes because of the messy history.

@tupui tupui merged commit 9e08b05 into scipy:master Aug 27, 2021
@tirthasheshpatel tirthasheshpatel deleted the fix-moment branch August 27, 2021 07:12
@tupui tupui added this to the 1.8.0 milestone Aug 27, 2021
@tirthasheshpatel
Copy link
Member Author

Thanks @mdhaber for your help on this one! Thanks @tupui for the reviews! And thanks @Zac-HD for your help with Hypothesis. I will test these changes out with Hypothesis soon and report what I find.

@Zac-HD
Copy link
Contributor

Zac-HD commented Aug 27, 2021

Ping me when you do, and I'll be happy to help review 😊

@tirthasheshpatel
Copy link
Member Author

tirthasheshpatel commented Oct 23, 2021

@mdhaber @Zac-HD

I tried out Hypothesis to test the moment method of all the distributions with all possible combinations of inputs. Here's the test:

import numpy as np
import hypothesis
from hypothesis import strategies as st
from hypothesis.extra import numpy as npst
from scipy import stats
from scipy.stats._distr_params import distcont

# Writing more comprehensive tests with Hypothesis.
# The below tests check if moments function works for each distribution
# for:
#      * Mix of valid and invalid scalar loc/scale
#      * Mix of valid and invalid array loc/scale with same shape
#      * Mix of valid and invalid array loc/scale with broadcastable shape

@hypothesis.given(data=st.data())
@hypothesis.seed(123)
@hypothesis.settings(max_examples=100, deadline=None)
@pytest.mark.parametrize("distname, distargs", distcont)
def test_moment_hypothesis(distname, distargs, data):
    slow_dists = {'kappa4', 'genexpon', 'ksone', 'kstwo', 'recipinvgauss',
                  'studentized_range', 'argus', 'exponpow', 'exponweib',
                  'genhalflogistic', 'halfgennorm', 'gompertz', 'invgamma',
                  'johnsonsb', 'johnsonsu', 'kstwobign', 'nct', 'pareto',
                  'powerlognorm', 'powernorm', 'skewnorm', 'vonmises',
                  'vonmises_line'}
    if distname in slow_dists:
        pytest.skip(f"slow dist: {distname}")
    shapes = data.draw(npst.mutually_broadcastable_shapes(num_shapes=2,
                                                          max_side=5))
    loc = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[0]))
    scale = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[1],
                                  elements={'min_value': -1,
                                            'max_value': 10}))
    n = data.draw(st.integers(0, 4))
    if isinstance(distname, str):
        dist = getattr(stats, distname)
    else:
        dist = distname

    with np.errstate(all='ignore'):
        with np.testing.suppress_warnings() as sup:
            sup.filter(IntegrationWarning)
            moment_vectorized = np.vectorize(dist.moment, otypes=[float])
            res_expected = moment_vectorized(n, *distargs, loc=loc,
                                             scale=scale)
            res = dist.moment(n, *distargs, loc=loc, scale=scale)
            if isinstance(res, np.ndarray):
                assert res.shape == res_expected.shape
                assert res.dtype == res_expected.dtype
            np.testing.assert_allclose(res, res_expected)

This has already helped discover a few genuine bugs. The test fails for quite a few distributions with broadcasting and vectorizing errors:

FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[burr-distargs7] - ValueError: operands could not be broadcast together with shapes (4,1) (2,1)
FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[fisk-distargs23] - ValueError: operands could not be broadcast together with shapes (4,1) (2,1)
FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[genextreme-distargs29] - ValueError: operands could not be broadcast together with shapes (4,) (2,2)
FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[kappa3-distargs56] - hypothesis.errors.MultipleFailures: Hypothesis found 2 distinct failures.
FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[levy_stable-distargs64] - hypothesis.errors.MultipleFailures: Hypothesis found 2 distinct failures.
FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[truncexpon-distargs97] - TypeError: only size-1 arrays can be converted to Python scalars
FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[truncnorm-distargs98] - ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set
FAILED scipy/stats/tests/test_continuous_basic.py::test_moment_hypothesis[truncnorm-distargs99] - ValueError: cannot call `vectorize` on size 0 inputs unless `otypes` is set

Minimum Reproducible Examples:

from scipy.stats import kappa3, levy_stable, truncnorm, burr, fisk, genextreme, truncexpon

# kappa3 failure
try:
    kappa3.moment(1, 1, loc=0.0, scale=[2, 3])
except Exception as e:
    print(f'kappa3 failed with error {e.args[0]}')

# levy_stable failure
try:
    levy_stable.moment(1, 1.8, -0.5, loc=0.1, scale=[2, 3])
except Exception as e:
    print(f'levy_stable failed with error {e.args[0]}')

# truncnorm failure
try:
    truncnorm.moment(2, 1, 2, loc=[], scale=[])
except Exception as e:
    print(f'truncnorm failed with error {e.args[0]}')

# burr failure
try:
    burr.moment(1, 10.5, 4.3, loc=0.0, scale=[[1],[2]])
except Exception as e:
    print(f'burr failed with error {e.args[0]}')

# fisk failure
try:
    fisk.moment(1, 3.085754862225318, loc=0.0, scale=[[1],[2]])
except Exception as e:
    print(f'fisk failed with error {e.args[0]}')

# genextreme failure
try:
    genextreme.moment(1, -0.1, loc=[[0,0],[0,0]], scale=2.2)
except Exception as e:
    print(f'genextreme failed with error {e.args[0]}')

# truncexpon failure
try:
    truncexpon.moment(3, 4.69, loc=0.0, scale=[0.0, 0.0])
except Exception as e:
    print(f'truncexpon failed with error {e.args[0]}')

It is also easy to generalize the tests to check vectorization of many other methods of all the distributions:

# Writing more comprehensive tests with Hypothesis.
# The below tests check if moments function works for each distribution
# for:
#      * Mix of valid and invalid scalar loc/scale
#      * Mix of valid and invalid array loc/scale with same shape
#      * Mix of valid and invalid array loc/scale with broadcastable shape

@hypothesis.given(data=st.data())
@hypothesis.seed(123)
@hypothesis.settings(max_examples=20, deadline=None)
@pytest.mark.parametrize("distname, distargs", distcont)
@pytest.mark.parametrize("method", ['pdf', 'cdf', 'ppf', 'sf', 'isf', 'moment'])
def test_moment_hypothesis(distname, distargs, method, data):
    if method == 'moment':
        slow_dists = {
            'kappa4', 'genexpon', 'ksone', 'kstwo', 'recipinvgauss',
            'studentized_range', 'argus', 'exponpow', 'exponweib',
            'genhalflogistic', 'halfgennorm', 'gompertz', 'invgamma',
            'johnsonsb', 'johnsonsu', 'kstwobign', 'nct', 'pareto',
            'powerlognorm', 'powernorm', 'skewnorm', 'vonmises',
            'vonmises_line'
        }
        if distname in slow_dists:
            pytest.skip(f"slow dist: {distname}")
    shapes = data.draw(npst.mutually_broadcastable_shapes(num_shapes=2,
                                                          max_side=5))
    loc = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[0]))
    scale = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[1],
                                  elements={'min_value': -1,
                                            'max_value': 10}))

    if isinstance(distname, str):
        dist = getattr(stats, distname)
    else:
        dist = distname

    if method == 'moment':
        n = data.draw(st.integers(0, 4))
    else:
        shape = data.draw(npst.broadcastable_shapes(shape=shapes.result_shape))
        n = data.draw(npst.arrays(dtype=float, shape=shape,
                                  elements={'min_value': -100,
                                            'max_value': 100}))

    method = getattr(dist, method)

    with np.errstate(all='ignore'):
        with np.testing.suppress_warnings() as sup:
            sup.filter(IntegrationWarning)
            sup.filter(RuntimeWarning)
            method_vectorized = np.vectorize(method, otypes=[float])
            res_expected = method_vectorized(n, *distargs, loc=loc,
                                             scale=scale)
            res = method(n, *distargs, loc=loc, scale=scale)
            if isinstance(res, np.ndarray):
                assert res.shape == res_expected.shape
                assert res.dtype == res_expected.dtype
            np.testing.assert_allclose(res, res_expected)

The main problem with Hypothesis tests is they usually take much longer than normal tests. This might be because I am still learning how to use it correctly :) Nevertheless, these tests are very powerful and can help discover lots of such bugs in stats and other modules in SciPy. I can start a proposal in the mailing lists if something like this would be desired in other modules in SciPy too.

@Zac-HD
Copy link
Contributor

Zac-HD commented Oct 23, 2021

I tried out Hypothesis to test the moment method of all the distributions with all possible combinations of inputs. ... This has already helped discover a few genuine bugs [mostly] broadcasting and vectorizing errors.

Very nice!

One suggestion about settings though: instead of using the @settings() and @seed() decorators, I'd suggest using settings profiles and the derandomize=True setting. That makes it easy to run in quick-ish deterministic mode by default, but also have a long-running random mode so that you can search for bugs overnight if desired.

The main problem with Hypothesis tests is they usually take much longer than normal tests. This might be because I am still learning how to use it correctly :)

I don't think there's any secret that you're missing - Hypothesis' isn't slow but if you multiply out the number of distributions * 100 examples each * test size, doing that much work just takes a while.

I certainly wouldn't suggest making every test use Hypothesis - I've had projects with 80% property-based tests, and others with just one. For SciPy I'd guess that using Hypothesis for just a handful of high-level tests is probably the best compromise between speed, bug-finding, and other workflow constraints.

Nevertheless, these tests are very powerful and can help discover lots of such bugs in stats and other modules in SciPy. I can start a proposal in the mailing lists if something like this would be desired in other modules in SciPy too.

As above, I remain happy to review tests as I did for Numpy and Astropy. Just let me know how I can help 💖

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scipy.stats.rv_continuous.moment does not accept array input
8 participants