Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fix sparse random int handling #9486

Merged
merged 1 commit into from Nov 21, 2018

Conversation

tylerjereddy
Copy link
Contributor

SciPy wheel builds have been failing for the last ten weeks, which has caused a number of issues to fly under the radar until my attempt to start the wheel build process for 1.2.0rc1 as described in MacPython/scipy-wheels#37.

This PR targets perhaps the most worrisome of the failures, in scipy.sparse.tests.test_construct::TestConstructUtils::test_random_sampling.

I was able to reproduce one of the several failing wheel build matrix entries locally by using numpy==1.8.2, Cython==0.29.0, Python==2.7, and SciPy master branch.

The changes made in this PR were sufficient to patch that specific test failure locally, but that doesn't mean we actually want to fix it this way, and I'm not sure how imposing this fix is after the branch has happened--maybe not too cumbersome yet without a tag having been pushed.

The original compatibility layer addition was quite recent -- #9048 by @jor-.

@tylerjereddy tylerjereddy added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse backport-candidate This fix should be ported by a maintainer to previous SciPy versions. labels Nov 16, 2018
@rgommers
Copy link
Member

Adding the test failure for clarity:

___________________ TestConstructUtils.test_random_sampling ____________________
[gw1] linux -- Python 3.4.3 /venv/bin/python

self = <scipy.sparse.tests.test_construct.TestConstructUtils object at 0x7f21ec1dd5f8>

    def test_random_sampling(self):
        # Simple sanity checks for sparse random sampling.
        for f in sprand, _sprandn:
            for t in [np.float32, np.float64, np.longdouble,
                      np.int32, np.int64, np.complex64, np.complex128]:
>               x = f(5, 10, density=0.1, dtype=t)

f          = <function rand at 0x7f22047bf9d8>
self       = <scipy.sparse.tests.test_construct.TestConstructUtils object at 0x7f21ec1dd5f8>
t          = <class 'numpy.int32'>
x          = <5x10 sparse matrix of type '<class 'numpy.float128'>'
	with 5 stored elements in COOrdinate format>

/venv/lib/python3.4/site-packages/scipy/sparse/tests/test_construct.py:428: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/venv/lib/python3.4/site-packages/scipy/sparse/construct.py:842: in rand
    return random(m, n, density, format, dtype, random_state)
/venv/lib/python3.4/site-packages/scipy/sparse/construct.py:792: in random
    vals = data_rvs(k).astype(dtype, copy=False)
/venv/lib/python3.4/site-packages/scipy/sparse/construct.py:781: in data_rvs
    n, dtype=dtype)
/venv/lib/python3.4/site-packages/scipy/_lib/_numpy_compat.py:120: in randint_patched
    integers = random_state.randint(low, high=high, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   ValueError: low >= high

Lock       = <built-in function allocate_lock>
RandomState = <class 'mtrand.RandomState'>
__builtins__ = <builtins>
__doc__    = None
__file__   = '/venv/lib/python3.4/site-packages/numpy/random/mtrand.cpython-34m.so'
__loader__ = <_frozen_importlib.ExtensionFileLoader object at 0x7f22052cbd30>
__name__   = 'numpy.random.mtrand'
__package__ = 'numpy.random'
__spec__   = ModuleSpec(name='numpy.random.mtrand', loader=<_frozen_importlib.ExtensionFileLoader object at 0x7f22052cbd30>, origin='/venv/lib/python3.4/site-packages/numpy/random/mtrand.cpython-34m.so')
__test__   = {'RandomState.binomial (line 3546)': '\n        binomial(n, p, size=None)\n\n        Draw samples from a binomial dist...0.3])\n        array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'],\n              dtype='|S11')\n\n        ", ...}
_rand      = <mtrand.RandomState object at 0x7f22052d8c18>
_shape_from_size = <built-in function _shape_from_size>
beta       = <built-in method beta of mtrand.RandomState object at 0x7f22052d8c18>
binomial   = <built-in method binomial of mtrand.RandomState object at 0x7f22052d8c18>
... <etc>
zipf       = <built-in method zipf of mtrand.RandomState object at 0x7f22052d8c18>

mtrand.pyx:941: ValueError
=========================== short test summary info ============================
FAIL sparse/tests/test_construct.py::TestConstructUtils::test_random_sampling

@rgommers
Copy link
Member

I can't say I get the logic of either the original code or the fix, it's unclear to me what the options are for *args, **kwargs. Deserves some more comments.

Note that if this takes a while to resolve, you could also consider reverting that PR in 1.2.x. The problem seems to be for numpy <1.11.0 only, which we're dropping in master very soon.

@tylerjereddy
Copy link
Contributor Author

tylerjereddy commented Nov 17, 2018

randint didn't have the dtype argument for the older NumPy versions with a compat layer here.

--- a/scipy/_lib/_numpy_compat.py
+++ b/scipy/_lib/_numpy_compat.py
@@ -95,6 +95,11 @@ else:
     # method of RandomState does only work with int32 values.
     def get_randint(random_state):
         def randint_patched(*args, **kwargs):
+            # args should be: low, high, size
+            # kwargs should only really contain
+            # dtype argument not available in older
+            # NumPy versions, which couldn't handle
+            # int64
             try:
                 low = args[0]
                 size = args[2]

The patched function is probably a bit awkward though, and may be important to check which of its codepaths are actually tested locally.

The top of _numpy_compat.py has a docstring with Functions copypasted from newer versions of numpy, but I think the random stuff is in Cython so we'd likely need to add in a compiled compatibility layer if we don't already have one (and only if it doesn't branch out too much to other parts of NumPy, which it might). I suspect @rkern would agree this is asking for trouble / maintenance issues, which is probably why we can't just copy/paste here.

@perimosocordiae
Copy link
Member

This shim is only used in one place right now, in scipy/sparse/construct.py:

        if np.issubdtype(dtype, np.integer):
            randint = get_randint(random_state)

            def data_rvs(n):
                return randint(np.iinfo(dtype).min, np.iinfo(dtype).max,
                               n, dtype=dtype)

Since the shim will be obsolete once the minimum numpy version is increased, I think it's sufficient to patch it here to get tests running, without worrying too much about having all the corner cases checked. In particular, we know that both low and high are not None, and that high doesn't show up in the **kwargs, so a lot of the shim's logic isn't exercised.

@tylerjereddy
Copy link
Contributor Author

Revised based on revisions requested from CJ, though I don't have immediate access to the local test environment originally used for reproduction since I'm at BIDS at the moment.

* TestConstructUtils::test_random_sampling fails
in some SciPy wheel build scenarios because of
incorrect handling of np.int32/64 limits in the
compatibility layer; this commit attempts to fix
that behavior
@tylerjereddy
Copy link
Contributor Author

I reproduced the original test failure locally on the latest master branch (04c0e5f) using above-described dependencies & then confirmed that the test passes on the revised version of this PR branch (f9ad4bd) in the same environment.

CI is all-green too. If there's still too much concern about the shim after the simplifications above then I should probably consider the PR reversion.

@tylerjereddy tylerjereddy changed the title WIP, BUG: fix sparse random int handling BUG: fix sparse random int handling Nov 20, 2018
@rgommers rgommers added this to the 1.2.0 milestone Nov 21, 2018
@rgommers rgommers merged commit b635563 into scipy:master Nov 21, 2018
@rgommers
Copy link
Member

LGTM now, merged. Thanks @tylerjereddy and @perimosocordiae

@tylerjereddy tylerjereddy deleted the fix_sparse_int_compat branch November 21, 2018 18:09
@tylerjereddy
Copy link
Contributor Author

Unfortunately we're still getting a related wheel build failure with OverflowError: Python int too large to convert to C long.

It looks like we really do have to ensure that all arguments to randint are within np.int32 bounds for the old NumPy version. Do we want to try attacking that with another PR, reverting the functionality, or xfail for old NumPy?

@rgommers
Copy link
Member

Out of all distributions available, it's only randint that fails right, all the other distributions do work? If so, I'd suggest to either skip testing for randint or xfail the test on 1.2.x.

@jor-
Copy link
Contributor

jor- commented Nov 28, 2018

#9550 might fix the problem with randint and old NumPy versions.

@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.sparse
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants