Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix moments method to support arrays and list #12197

Merged
merged 29 commits into from Aug 27, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2f1beae
BUG: fix moments method to support arrays and list
tirthasheshpatel May 23, 2020
2f0a903
FIXUP: fix the moment method to perform operations correctly
tirthasheshpatel May 23, 2020
8359f4a
FIUP: fix the int32 error in moment method
tirthasheshpatel May 23, 2020
65e359a
FIXUP: fix int64 error in moment
tirthasheshpatel May 23, 2020
5391f86
FIXUP: fix failing tests
tirthasheshpatel May 24, 2020
5ab67bf
FIXUP: filter warnings in test_moment
tirthasheshpatel May 24, 2020
e3fd4ef
Merge remote-tracking branch 'upstream/master' into fix-moment
mdhaber Nov 26, 2020
06c5162
BUG: stats: remove some redundant computation
tirthasheshpatel Nov 27, 2020
b7206d4
BUG: stats: fix an inplace operation
tirthasheshpatel Nov 27, 2020
60c517a
BUG: stats: remove all redundant computations
tirthasheshpatel Nov 30, 2020
78e3fe9
BUG, TST: stats: add tests and resolve the bug
tirthasheshpatel Nov 30, 2020
8ca99eb
MAINT: stats: refactor moment function for array input
mdhaber Dec 1, 2020
3ef2889
Merge pull request #1 from mdhaber/fix-moment
tirthasheshpatel Dec 2, 2020
ec282dd
TST: stats: remove rtol=1e-20 and assert equality
tirthasheshpatel Dec 3, 2020
4989c3c
BUG: stats: fix the TypeError bug in _moment_from_stats
tirthasheshpatel Dec 4, 2020
559bd3d
MAINT: stats: merge remote tracking branch 'master' and resolve confl…
tirthasheshpatel Dec 15, 2020
3dbf027
TST: stats: change assert_equal to assert_allclose
tirthasheshpatel Dec 28, 2020
a6f4788
Merge remote-tracking branch 'upstream/master' into fix-moment
mdhaber May 26, 2021
cd90f3e
TST: stats: improve tests of rv_continuous.moment with array arguments
mdhaber May 26, 2021
112b588
MAINT: stats: use [()] instead of .item()
tirthasheshpatel May 27, 2021
43858ec
BUG: stats: call argsreduce for all non-none moments
tirthasheshpatel May 27, 2021
2cebce8
MAINT: stats: apply suggestions from code review
tirthasheshpatel Jun 27, 2021
7b69ef6
MAINT: stats: add a suggestion from code review
tirthasheshpatel Jul 5, 2021
a13a73b
TST, CI: stats: add hypothesis to CI and add tests
tirthasheshpatel Aug 3, 2021
acf4c1b
CI: stats: add hypothesis to macOS job
tirthasheshpatel Aug 3, 2021
5b10a45
Revert "CI: stats: add hypothesis to macOS job"
tirthasheshpatel Aug 27, 2021
35387ff
Revert "TST, CI: stats: add hypothesis to CI and add tests"
tirthasheshpatel Aug 27, 2021
d6a9b11
STY: stats: fix lint failures
tirthasheshpatel Aug 27, 2021
7ed600d
STY: stats: fix E501 (line too long) PEP-8 failure
tirthasheshpatel Aug 27, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/linux.yml
Expand Up @@ -36,7 +36,7 @@ jobs:
run: |
python3.7-dbg -c 'import sys; print("Python debug build:", hasattr(sys, "gettotalrefcount"))'
python3.7-dbg -m pip install --upgrade pip setuptools wheel
python3.7-dbg -m pip install --upgrade numpy cython pytest pytest-xdist pybind11
python3.7-dbg -m pip install --upgrade numpy cython pytest hypothesis pytest-xdist pybind11
python3.7-dbg -m pip install --upgrade mpmath gmpy2 pythran
python3.7-dbg -m pip uninstall -y nose
cd ..
Expand Down Expand Up @@ -81,7 +81,7 @@ jobs:
- name: Install packages
run: |
pip install --user git+https://github.com/numpy/numpy.git
python3.10 -m pip install --user setuptools wheel cython pytest pybind11 pytest-xdist
python3.10 -m pip install --user setuptools wheel cython pytest hypothesis pybind11 pytest-xdist
pip install --user git+https://github.com/serge-sans-paille/pythran.git
python3.10 -m pip install -r mypy_requirements.txt

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/macos.yml
Expand Up @@ -77,7 +77,7 @@ jobs:
- name: Install packages
run: |
pip install ${{ matrix.numpy-version }}
pip install setuptools wheel cython pytest pytest-xdist pybind11 pytest-xdist mpmath gmpy2 pythran
pip install setuptools wheel cython pytest hypothesis pytest-xdist pybind11 pytest-xdist mpmath gmpy2 pythran

- name: Test SciPy
run: |
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Expand Up @@ -93,6 +93,10 @@ pip-wheel-metadata
.cache/*
.pytest_cache/*

# hypothesis cache #
####################
.hypothesis/*

# mypy cache #
##############
.mypy_cache/*
Expand Down
6 changes: 3 additions & 3 deletions scipy/stats/_distn_infrastructure.py
Expand Up @@ -1272,14 +1272,14 @@ def moment(self, n, *args, **kwds):
"""
shapes, loc, scale = self._parse_args(*args, **kwds)
args = np.broadcast_arrays(*(*shapes, loc, scale))
shapes, loc, scale = args[:-2], args[-2], args[-1]
*shapes, loc, scale = args

i0 = np.logical_and(self._argcheck(*shapes), scale > 0)
i1 = np.logical_and(i0, loc == 0)
i2 = np.logical_and(i0, loc != 0)

args = argsreduce(i0, *shapes, loc, scale)
shapes, loc, scale = args[:-2], args[-2], args[-1]
*shapes, loc, scale = args

if (floor(n) != n):
raise ValueError("Moment must be an integer.")
Expand Down Expand Up @@ -1316,7 +1316,7 @@ def moment(self, n, *args, **kwds):
j+=1
mu, mu2, g1, g2 = mom
args = argsreduce(loc != 0, *shapes, loc, scale, val)
shapes, loc, scale, val = args[:-3], args[-3], args[-2], args[-1]
*shapes, loc, scale, val = args

res2 = zeros(loc.shape, dtype='d')
fac = scale / loc
Expand Down
43 changes: 38 additions & 5 deletions scipy/stats/tests/test_continuous_basic.py
Expand Up @@ -2,6 +2,8 @@
import numpy as np
import numpy.testing as npt
import pytest
import hypothesis.extra.numpy as npst
from hypothesis import given, strategies as st
from pytest import raises as assert_raises
from scipy.integrate import IntegrationWarning

Expand Down Expand Up @@ -741,7 +743,7 @@ def test_moments_with_array_gh12192_regression():
vals3 = stats.norm.moment(n=2, loc=0, scale=-4)
expected3 = np.nan
npt.assert_equal(vals3, expected3)
npt.assert_(isinstance(vals3, expected3.__class__))
assert isinstance(vals3, expected3.__class__)

# array loc with 0 entries and scale with invalid entries
vals4 = stats.norm.moment(n=2, loc=[1, 0, 2], scale=[3, -4, -5])
Expand All @@ -762,7 +764,7 @@ def test_moments_with_array_gh12192_regression():
vals7 = stats.chi.moment(n=2, df=1, loc=0, scale=0)
expected7 = np.nan
npt.assert_equal(vals7, expected7)
npt.assert_(isinstance(vals7, expected7.__class__))
assert isinstance(vals7, expected7.__class__)

# array args, scalar loc, and scalar scale
vals8 = stats.chi.moment(n=2, df=[1, 2, 3], loc=0, scale=0)
Expand Down Expand Up @@ -808,23 +810,54 @@ def test_broadcasting_in_moments_gh12192_regression():
vals0 = stats.norm.moment(n=1, loc=np.array([1, 2, 3]), scale=[[1]])
expected0 = np.array([[1., 2., 3.]])
npt.assert_equal(vals0, expected0)
npt.assert_(vals0.shape == expected0.shape)
assert vals0.shape == expected0.shape

vals1 = stats.norm.moment(n=1, loc=np.array([[1], [2], [3]]),
scale=[1, 2, 3])
expected1 = np.array([[1., 1., 1.], [2., 2., 2.], [3., 3., 3.]])
npt.assert_equal(vals1, expected1)
npt.assert_(vals1.shape == expected1.shape)
assert vals1.shape == expected1.shape

vals2 = stats.chi.moment(n=1, df=[1., 2., 3.], loc=0., scale=1.)
expected2 = np.array([0.79788456, 1.25331414, 1.59576912])
npt.assert_allclose(vals2, expected2, rtol=1e-8)
npt.assert_(vals2.shape == expected2.shape)
assert vals2.shape == expected2.shape

vals3 = stats.chi.moment(n=1, df=[[1.], [2.], [3.]], loc=[0., 1., 2.],
scale=[-1., 0., 3.])
expected3 = np.array([[np.nan, np.nan, 4.39365368],
[np.nan, np.nan, 5.75994241],
[np.nan, np.nan, 6.78730736]])
npt.assert_allclose(vals3, expected3, rtol=1e-8)
assert vals3.shape == expected3.shape
npt.assert_(vals3.shape == expected3.shape)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
npt.assert_(vals3.shape == expected3.shape)
assert vals3.shape == expected3.shape

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this was added but the original was not removed.



# A strategy to generate the shape of a one-dimensional array of length [0..7]
# Subsequent draws will use the same shape to ensure compatibility. For fancier
# tricks see `npst.broadcastable_shapes()` or `mutually_broadcastable_shapes()`.
shared_shapes = st.shared(st.tuples(st.integers(0, 7)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, to show the advantage of Hypothesis, I think we want this test to go beyond what the tests above can do - show that this works for any distribution and any broadcastable loc, scale, and shape parameters. (I think it would also be good to nudge hypothesis toward occasionally testing with invalid parameters. I'm not sure how Hypothesis chooses values under the hood, but if we give it such a wide range for scale, is it likely to include in invalid value < 0 in one of the tests?). @tirthasheshpatel @Zac-HD if you are willing to go a little deeper into this, terrific; if not, I understand and I can get to this at some point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend having separate tests for always-valid and might-be-invalid inputs; otherwise you eventually and without realising write a test which never checks the valid case 😅

For a deeper test, you're going to want the mutually_broadcastable_shapes() strategy - there's an example in Numpy for np.clip() here. Where signatures are consistent you could use @pytest.mark.parametrize to supply the moment function - using that to supply some arguments and @hypothesis.given() for others works fine - but where signatures differ it's probably easier to write separate tests.

(This is getting pretty far from the original intent to fix a bug though, and I've seen projects take quite a while to settle on using Hypothesis. Might be worth merging this PR, and then tackling the test changes in a new one?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @Zac-HD and @mdhaber! I agree we should use Hypothesis to its fullest and add a deeper test. I have some time this and next week so I can go a little deeper and add a more comprehensive test.

This is getting pretty far from the original intent to fix a bug though, and I've seen projects take quite a while to settle on using Hypothesis. Might be worth merging this PR, and then tackling the test changes in a new one?

I also think this would be better. Though I will try to push something or at least confirm locally using Hypothesis that the bug-fix works. I can then propose a PR and we can discuss Hypothesis in detail there. What do you think, @mdhaber?

Copy link
Contributor

@mdhaber mdhaber Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a local test with Hypothesis is fine. This PR has been tricky to get right, so a more comprehensive test would give additional confidence that the bug fix is ready to be merged.
And yes, it is actually preferable for the addition of Hypothesis to go in a separate PR.



def scalar_or_array(**kwargs):
return st.floats(**kwargs) | npst.arrays(
dtype=float, shape=shared_shapes, elements=kwargs
)


@given(
# Our moment `n`, between one and four (inclusive)
n=st.integers(1, 4),
# Our locations and scale are each either a scalar (Python-native) float,
# or an ndarray of floats. If both are arrays, they will have the same shape.
loc=scalar_or_array(min_value=0, max_value=1000), # what should the bounds be?
scale=scalar_or_array(min_value=-1, max_value=1000),
)
def test_moments_gh12192_regression(n, loc, scale):
got = stats.norm.moment(n, loc=loc, scale=scale)
ref = np.vectorize(stats.norm.moment, otypes=[float])(n, loc=loc, scale=scale)

if isinstance(got, np.ndarray):
assert got.shape == ref.shape
assert got.dtype == ref.dtype
np.testing.assert_allclose(got, ref)