New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: fix moments method to support arrays and list #12197
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised that's all it took! Could you add a test? Take a look at the existing tests that run for every distribution. It would be good to check the behavior for each to catch bugs like gh-11746.
There is a section in the contributor guide that answers your question. |
scipy/stats/_distn_infrastructure.py
Outdated
@@ -1186,8 +1186,14 @@ def moment(self, n, *args, **kwds): | |||
|
|||
""" | |||
args, loc, scale = self._parse_args(*args, **kwds) | |||
if not (self._argcheck(*args) and (scale > 0)): | |||
return nan | |||
arrs = tuple(map(np.asarray, [*args, loc, scale])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first assignment appears to be unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*args
? I think I did it because loc
and scale
parameters need to be of the same shape as inputs (that are present in args
) otherwise the output isn't consistent with input shapes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line is not needed.
arrs = tuple(map(np.asarray, [*args, loc, scale]))
arrs = tuple(map(np.atleast_1d, [*args, loc, scale]))
Doing this causes the return value to always be an array if I'm understanding the code correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atleast_1d receives a list of arrays already, so the tuple(map(...)) dance can be avoided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need the arrs = tuple(map(np.asarray, [*args, loc, scale]))
even?
arrs = np.broadcast_arrays(*(*args, loc, scale))
accepts inputs that are not already arrays.
scipy/stats/_distn_infrastructure.py
Outdated
@@ -1186,8 +1186,14 @@ def moment(self, n, *args, **kwds): | |||
|
|||
""" | |||
args, loc, scale = self._parse_args(*args, **kwds) | |||
if not (self._argcheck(*args) and (scale > 0)): | |||
return nan | |||
arrs = tuple(map(np.asarray, [*args, loc, scale])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first line is not needed.
arrs = tuple(map(np.asarray, [*args, loc, scale]))
arrs = tuple(map(np.atleast_1d, [*args, loc, scale]))
Doing this causes the return value to always be an array if I'm understanding the code correctly.
scipy/stats/_distn_infrastructure.py
Outdated
result += comb(n, k, exact=True)*(fac**k) * valk | ||
result += fac**n * val | ||
return result * loc**n | ||
cond1 = (loc == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to do all the additional work even for loc=0
. You should be able to use argsreduce
to restrict to the values that need computing. E.g.
cond = (loc == loc) & (scale == scale)
if np.any(cond):
goodargs = argsreduce(cond, loc, scale)
...
place(result, cond, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to use _lazywhere
to fill nan
and circumvent redundant calculations. Is it ok or should I try for this fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still has to do some sort of calculation there, though. You use _lazywhere
to fill fac
with nan
s, but as far as I can tell, that doesn't really circumvent redundant calculations, right? All sorts of operations are done on fac
. For the nan
s, these operations aren't doing anything useful, right? I do think it would be a good idea to try implementing this suggestion.
I forgot about this one, @tirthasheshpatel. I can start reviewing this one too, but it looks like there are some comments already. |
@mdhaber Sorry for stalling this. I will try to resume my work on this in a couple of days! |
Hi @tirthasheshpatel, just thought I should check in. Are you still interested in continuing with this and gh-12585? We really appreciate it! |
@tirthasheshpatel no problem; we'll get back to this after merging gh-12585 |
@tirthasheshpatel I resolved the merge conflict. Please check that the changes from the first six commits and the changes from all commits are basically identical (I think I accidentally added a PEP8 newline). I think gh-13103 is actually going to take a little while to figure out. In the meantime, if you'd like to come back to this, I'd be happy to review. |
Thanks for resolving the conflicts, @mdhaber! I have pushed some new changes which should work fine but some redundant calculations still exists. I will try to figure out a way to work around those too. Meanwhile, would appreciae a review. |
Sure I started taking a look. Can you add those tests? |
remove redundant computations and fix the behaviour for array inputs
Thanks very much for the review, @tupui! I have applied all your suggestion now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a bit of time to go over again. LGTM, the logic looks correct just a minor thing I missed last time, and you added quite some tests (I tested this manually and it checks out in terms of shape and values). Thanks for the great work. @mdhaber feel free to merge (not sure if you want to squash or merge, you are more involved).
Co-authored-by: Pamphile ROY <roy.pamphile@gmail.com>
@Zac-HD We'd like to check that all methods of all distributions are vectorized properly. This PR is just about the norm.moment(n, loc=loc, scale=scale) should behave the same as ref(n, loc=loc, scale=scale) where ref = np.vectorize(norm.moment, otype=[float]) and One minor complication is that some distributions have additional shape parameters, which have varying valid ranges, and we'd want to make sure that at least some combinations of inputs are valid. (When there is an invalid parameter, there will be a Would you be able to help us set up something - just for the With a concrete example like this, we could ask the mailing list whether they agree that this sort of functionality is worth the additional test-time dependency. |
I'd be delighted to help out! Here's a quick prototype: import numpy as np
import hypothesis.extra.numpy as npst
from hypothesis import given, strategies as st
from scipy.stats import norm
# A strategy to generate the shape of a one-dimensional array of length [0..7]
# Subsequent draws will use the same shape to ensure compatibility. For fancier
# tricks see `npst.broadcastable_shapes()` or `mutually_broadcastable_shapes()`.
shared_shapes = st.shared(st.tuples(st.integers(0, 7)))
def scalar_or_array(**kwargs):
return st.floats(**kwargs) | npst.arrays(
dtype=float, shape=shared_shapes, elements=kwargs
)
@given(
# Our moment `n`, between one and four (inclusive)
n=st.integers(1, 4),
# Our locations and scale are each either a scalar (Python-native) float,
# or an ndarray of floats. If both are arrays, they will have the same shape.
loc=scalar_or_array(min_value=0, max_value=1000), # what should the bounds be?
scale=scalar_or_array(min_value=1, max_value=1000),
)
def test_moments(n, loc, scale):
got = norm.moment(n, loc=loc, scale=scale)
ref = np.vectorize(norm.moment, otypes=[float])(n, loc=loc, scale=scale)
assert type(got) == type(ref) # fails; vectorize returns zero-dim arrays
if isinstance(got, np.ndarray):
assert got.shape == ref.shape
assert got.dtype == ref.dtype
np.testing.assert_allclose(got, ref) On my machine with ________________________________ test_moments _________________________________
Traceback (most recent call last):
File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 20, in test_moments
# Our moment `n`, between one and four (inclusive)
File "c:\users\zac\documents\github\hypothesis\hypothesis-python\src\hypothesis\core.py", line 1180, in wrapped_test
raise the_error_hypothesis_found
hypothesis.errors.MultipleFailures: Hypothesis found 4 distinct failures.
--------------------------------- Hypothesis ----------------------------------
Falsifying example: test_moments(
n=1, loc=array([0., 0.]), scale=1.0,
)
Traceback (most recent call last):
File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 28, in test_moments
got = norm.moment(n, loc=loc, scale=scale)
File "c:\users\zac\anaconda3\envs\hydev\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1291, in moment
if loc == 0:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Falsifying example: test_moments(
n=1, loc=0.0, scale=array([1., 1.]),
)
Traceback (most recent call last):
File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 28, in test_moments
got = norm.moment(n, loc=loc, scale=scale)
File "c:\users\zac\anaconda3\envs\hydev\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1274, in moment
if not (self._argcheck(*args) and (scale > 0)):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Falsifying example: test_moments(
n=1, loc=array([], dtype=float64), scale=1.0,
)
Traceback (most recent call last):
File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 28, in test_moments
got = norm.moment(n, loc=loc, scale=scale)
File "c:\users\zac\anaconda3\envs\hydev\lib\site-packages\scipy\stats\_distn_infrastructure.py", line 1295, in moment
fac = float(scale) / float(loc)
TypeError: only size-1 arrays can be converted to Python scalars
Falsifying example: test_moments(
n=1, loc=array([], dtype=float64), scale=array([], dtype=float64),
)
Traceback (most recent call last):
File "C:\Users\Zac\Documents\GitHub\hypothesis\t.py", line 31, in test_moments
assert type(got) == type(ref)
AssertionError: assert <class 'float'> == <class 'numpy.ndarray'>
+ where <class 'float'> = type(nan)
+ and <class 'numpy.ndarray'> = type(array([], dtype=float64)) Hopefully this is a useful starting point? Further elaboration will probably require deeper knowledge (or judgement) about scipy edge cases than I have, but I'd be up for a call or pairing session too if that would help 😁 |
Hi @Zac-HD
Thanks!
I just had a quick look and it looks really good! I have never used hypothesis so I will need to read some docs today and can start cooking up my own version in some time. If you think you have something useful (which you possibly would before I learn to use hypothesis), feel free to just create a PR on my fork. I will merge it and we can then review it here.
I would throw in some invalid values too just to make it a bit more comprehensive and probably catch some edge cases. Something like: loc=scalar_or_array(min_value=0, max_value=1000), # what should the bounds be?
scale=scalar_or_array(min_value=-1, max_value=1000),
Sounds very helpful, thanks. |
@Zac-HD @mdhaber I have added a basic test using hypothesis using @Zac-HD's prototype as a starting point. I removed the line: assert type(got) == type(ref) # fails; vectorize returns zero-dim arrays because Also added Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nothing against adding hypothesis
(although it adds yet another layer of complexity: pytests, asv, linters, now this), but might be worth an email in the mailing list to at least present it. Also, could you add it in https://scipy.github.io/devdocs/dev/toolchain.html? and there might be other places like the environment.yml
and the testing guide.
Yes, that's the plan. We just wanted to have a concrete example of how hypothesis can be used to improve tests. |
# A strategy to generate the shape of a one-dimensional array of length [0..7] | ||
# Subsequent draws will use the same shape to ensure compatibility. For fancier | ||
# tricks see `npst.broadcastable_shapes()` or `mutually_broadcastable_shapes()`. | ||
shared_shapes = st.shared(st.tuples(st.integers(0, 7))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, to show the advantage of Hypothesis, I think we want this test to go beyond what the tests above can do - show that this works for any distribution and any broadcastable loc
, scale
, and shape parameters. (I think it would also be good to nudge hypothesis toward occasionally testing with invalid parameters. I'm not sure how Hypothesis chooses values under the hood, but if we give it such a wide range for scale, is it likely to include in invalid value < 0 in one of the tests?). @tirthasheshpatel @Zac-HD if you are willing to go a little deeper into this, terrific; if not, I understand and I can get to this at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend having separate tests for always-valid and might-be-invalid inputs; otherwise you eventually and without realising write a test which never checks the valid case 😅
For a deeper test, you're going to want the mutually_broadcastable_shapes()
strategy - there's an example in Numpy for np.clip()
here. Where signatures are consistent you could use @pytest.mark.parametrize
to supply the moment function - using that to supply some arguments and @hypothesis.given()
for others works fine - but where signatures differ it's probably easier to write separate tests.
(This is getting pretty far from the original intent to fix a bug though, and I've seen projects take quite a while to settle on using Hypothesis. Might be worth merging this PR, and then tackling the test changes in a new one?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @Zac-HD and @mdhaber! I agree we should use Hypothesis to its fullest and add a deeper test. I have some time this and next week so I can go a little deeper and add a more comprehensive test.
This is getting pretty far from the original intent to fix a bug though, and I've seen projects take quite a while to settle on using Hypothesis. Might be worth merging this PR, and then tackling the test changes in a new one?
I also think this would be better. Though I will try to push something or at least confirm locally using Hypothesis that the bug-fix works. I can then propose a PR and we can discuss Hypothesis in detail there. What do you think, @mdhaber?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a local test with Hypothesis is fine. This PR has been tricky to get right, so a more comprehensive test would give additional confidence that the bug fix is ready to be merged.
And yes, it is actually preferable for the addition of Hypothesis to go in a separate PR.
@tupui: @tirthasheshpatel and I discussed in gh-13581 that we should revert to 7b69ef6 and merge this PR as-is. We will add more comprehensive tests using Hypothesis in a separate PR. Does that still sound good to you? Would you be interested in pressing the big green button after the revert, as I wrote a fair amount of this? |
Reverted. |
No need to rebase. Squash will take care of it. |
Do I understand that you prefer a squash here @mdhaber? |
Yes because of the messy history. |
Ping me when you do, and I'll be happy to help review 😊 |
I tried out Hypothesis to test the import numpy as np
import hypothesis
from hypothesis import strategies as st
from hypothesis.extra import numpy as npst
from scipy import stats
from scipy.stats._distr_params import distcont
# Writing more comprehensive tests with Hypothesis.
# The below tests check if moments function works for each distribution
# for:
# * Mix of valid and invalid scalar loc/scale
# * Mix of valid and invalid array loc/scale with same shape
# * Mix of valid and invalid array loc/scale with broadcastable shape
@hypothesis.given(data=st.data())
@hypothesis.seed(123)
@hypothesis.settings(max_examples=100, deadline=None)
@pytest.mark.parametrize("distname, distargs", distcont)
def test_moment_hypothesis(distname, distargs, data):
slow_dists = {'kappa4', 'genexpon', 'ksone', 'kstwo', 'recipinvgauss',
'studentized_range', 'argus', 'exponpow', 'exponweib',
'genhalflogistic', 'halfgennorm', 'gompertz', 'invgamma',
'johnsonsb', 'johnsonsu', 'kstwobign', 'nct', 'pareto',
'powerlognorm', 'powernorm', 'skewnorm', 'vonmises',
'vonmises_line'}
if distname in slow_dists:
pytest.skip(f"slow dist: {distname}")
shapes = data.draw(npst.mutually_broadcastable_shapes(num_shapes=2,
max_side=5))
loc = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[0]))
scale = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[1],
elements={'min_value': -1,
'max_value': 10}))
n = data.draw(st.integers(0, 4))
if isinstance(distname, str):
dist = getattr(stats, distname)
else:
dist = distname
with np.errstate(all='ignore'):
with np.testing.suppress_warnings() as sup:
sup.filter(IntegrationWarning)
moment_vectorized = np.vectorize(dist.moment, otypes=[float])
res_expected = moment_vectorized(n, *distargs, loc=loc,
scale=scale)
res = dist.moment(n, *distargs, loc=loc, scale=scale)
if isinstance(res, np.ndarray):
assert res.shape == res_expected.shape
assert res.dtype == res_expected.dtype
np.testing.assert_allclose(res, res_expected) This has already helped discover a few genuine bugs. The test fails for quite a few distributions with broadcasting and vectorizing errors:
Minimum Reproducible Examples: from scipy.stats import kappa3, levy_stable, truncnorm, burr, fisk, genextreme, truncexpon
# kappa3 failure
try:
kappa3.moment(1, 1, loc=0.0, scale=[2, 3])
except Exception as e:
print(f'kappa3 failed with error {e.args[0]}')
# levy_stable failure
try:
levy_stable.moment(1, 1.8, -0.5, loc=0.1, scale=[2, 3])
except Exception as e:
print(f'levy_stable failed with error {e.args[0]}')
# truncnorm failure
try:
truncnorm.moment(2, 1, 2, loc=[], scale=[])
except Exception as e:
print(f'truncnorm failed with error {e.args[0]}')
# burr failure
try:
burr.moment(1, 10.5, 4.3, loc=0.0, scale=[[1],[2]])
except Exception as e:
print(f'burr failed with error {e.args[0]}')
# fisk failure
try:
fisk.moment(1, 3.085754862225318, loc=0.0, scale=[[1],[2]])
except Exception as e:
print(f'fisk failed with error {e.args[0]}')
# genextreme failure
try:
genextreme.moment(1, -0.1, loc=[[0,0],[0,0]], scale=2.2)
except Exception as e:
print(f'genextreme failed with error {e.args[0]}')
# truncexpon failure
try:
truncexpon.moment(3, 4.69, loc=0.0, scale=[0.0, 0.0])
except Exception as e:
print(f'truncexpon failed with error {e.args[0]}') It is also easy to generalize the tests to check vectorization of many other methods of all the distributions: # Writing more comprehensive tests with Hypothesis.
# The below tests check if moments function works for each distribution
# for:
# * Mix of valid and invalid scalar loc/scale
# * Mix of valid and invalid array loc/scale with same shape
# * Mix of valid and invalid array loc/scale with broadcastable shape
@hypothesis.given(data=st.data())
@hypothesis.seed(123)
@hypothesis.settings(max_examples=20, deadline=None)
@pytest.mark.parametrize("distname, distargs", distcont)
@pytest.mark.parametrize("method", ['pdf', 'cdf', 'ppf', 'sf', 'isf', 'moment'])
def test_moment_hypothesis(distname, distargs, method, data):
if method == 'moment':
slow_dists = {
'kappa4', 'genexpon', 'ksone', 'kstwo', 'recipinvgauss',
'studentized_range', 'argus', 'exponpow', 'exponweib',
'genhalflogistic', 'halfgennorm', 'gompertz', 'invgamma',
'johnsonsb', 'johnsonsu', 'kstwobign', 'nct', 'pareto',
'powerlognorm', 'powernorm', 'skewnorm', 'vonmises',
'vonmises_line'
}
if distname in slow_dists:
pytest.skip(f"slow dist: {distname}")
shapes = data.draw(npst.mutually_broadcastable_shapes(num_shapes=2,
max_side=5))
loc = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[0]))
scale = data.draw(npst.arrays(dtype=float, shape=shapes.input_shapes[1],
elements={'min_value': -1,
'max_value': 10}))
if isinstance(distname, str):
dist = getattr(stats, distname)
else:
dist = distname
if method == 'moment':
n = data.draw(st.integers(0, 4))
else:
shape = data.draw(npst.broadcastable_shapes(shape=shapes.result_shape))
n = data.draw(npst.arrays(dtype=float, shape=shape,
elements={'min_value': -100,
'max_value': 100}))
method = getattr(dist, method)
with np.errstate(all='ignore'):
with np.testing.suppress_warnings() as sup:
sup.filter(IntegrationWarning)
sup.filter(RuntimeWarning)
method_vectorized = np.vectorize(method, otypes=[float])
res_expected = method_vectorized(n, *distargs, loc=loc,
scale=scale)
res = method(n, *distargs, loc=loc, scale=scale)
if isinstance(res, np.ndarray):
assert res.shape == res_expected.shape
assert res.dtype == res_expected.dtype
np.testing.assert_allclose(res, res_expected) The main problem with Hypothesis tests is they usually take much longer than normal tests. This might be because I am still learning how to use it correctly :) Nevertheless, these tests are very powerful and can help discover lots of such bugs in |
Very nice! One suggestion about settings though: instead of using the
I don't think there's any secret that you're missing - Hypothesis' isn't slow but if you multiply out the number of distributions * 100 examples each * test size, doing that much work just takes a while. I certainly wouldn't suggest making every test use Hypothesis - I've had projects with 80% property-based tests, and others with just one. For SciPy I'd guess that using Hypothesis for just a handful of high-level tests is probably the best compromise between speed, bug-finding, and other workflow constraints.
As above, I remain happy to review tests as I did for Numpy and Astropy. Just let me know how I can help 💖 |
Reference issue
fixes #12192
What does this implement/fix?
This pr adds list and array support for
moment
method of every distribution.