Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NumPy 1.24: test regression ValueError: cannot convert float NaN to integer #423

Closed
mgorny opened this issue Dec 22, 2022 · 6 comments · Fixed by #425
Closed

[BUG] NumPy 1.24: test regression ValueError: cannot convert float NaN to integer #423

mgorny opened this issue Dec 22, 2022 · 6 comments · Fixed by #425
Labels

Comments

@mgorny
Copy link
Contributor

mgorny commented Dec 22, 2022

Describe the bug
When NumPy is upgrade to 1.24, the following test fails:

________________________________________________________ test_move[move_rank] _________________________________________________________

func = <built-in function move_rank>

    @pytest.mark.parametrize("func", bn.get_functions("move"), ids=lambda x: x.__name__)
    def test_move(func):
        """Test that bn.xxx gives the same output as a reference function."""
        fmt = (
            "\nfunc %s | window %d | min_count %s | input %s (%s) | shape %s | "
            "axis %s | order %s\n"
        )
        fmt += "\nInput array:\n%s\n"
        aaae = assert_array_almost_equal
        func_name = func.__name__
        func0 = eval("bn.slow.%s" % func_name)
        if func_name == "move_var":
            decimal = 3
        else:
            decimal = 5
        for i, a in enumerate(arrays(func_name)):
            axes = range(-1, a.ndim)
            for axis in axes:
                windows = range(1, a.shape[axis])
                for window in windows:
                    min_counts = list(range(1, window + 1)) + [None]
                    for min_count in min_counts:
                        actual = func(a, window, min_count, axis=axis)
>                       desired = func0(a, window, min_count, axis=axis)

a          = array([], shape=(2, 0), dtype=int64)
aaae       = <function assert_array_almost_equal at 0x7faf70006ca0>
actual     = array([], shape=(2, 0), dtype=float64)
axes       = range(-1, 2)
axis       = 0
da         = dtype('float32')
dd         = dtype('float32')
decimal    = 5
desired    = array([], shape=(1, 0, 2), dtype=float32)
err_msg    = '\nfunc move_rank | window 1 | min_count None | input a53 (float32) | shape (1, 0, 2) | axis 2 | order C,F\n\nInput array:\n[]\n\n dtype mismatch %s %s'
fmt        = '\nfunc %s | window %d | min_count %s | input %s (%s) | shape %s | axis %s | order %s\n\nInput array:\n%s\n'
func       = <built-in function move_rank>
func0      = <function move_rank at 0x7faf70ab9d00>
func_name  = 'move_rank'
i          = 57
min_count  = 1
min_counts = [1, None]
tup        = ('move_rank', 1, 'None', 'a53', 'float32', '(1, 0, 2)', ...)
window     = 1
windows    = range(1, 2)

bottleneck/.venv/lib/python3.11/site-packages/bottleneck/tests/move_test.py:33: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
bottleneck/.venv/lib/python3.11/site-packages/bottleneck/slow/move.py:110: in move_rank
    return move_func(lastrank, a, window, min_count, axis=axis)
        a          = array([], shape=(2, 0), dtype=int64)
        axis       = 0
        min_count  = 1
        window     = 1
bottleneck/.venv/lib/python3.11/site-packages/bottleneck/slow/move.py:148: in move_func
    y[tuple(idx2)] = func(a[tuple(idx1)], axis=axis, **kwargs)
        a          = array([], shape=(2, 0), dtype=int64)
        axis       = 0
        func       = <function lastrank at 0x7faf70ab9ee0>
        i          = 0
        idx1       = [slice(0, 1, None), slice(None, None, None)]
        idx2       = [0, slice(None, None, None)]
        kwargs     = {}
        mc         = 1
        min_count  = 1
        win        = 1
        window     = 1
        y          = array([], shape=(2, 0), dtype=float64)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = array([], shape=(1, 0), dtype=int64), axis = 0

    def lastrank(a, axis=-1):
        """
        The ranking of the last element along the axis, ignoring NaNs.
    
        The ranking is normalized to be between -1 and 1 instead of the more
        common 1 and N. The results are adjusted for ties.
    
        Parameters
        ----------
        a : ndarray
            Input array. If `a` is not an array, a conversion is attempted.
        axis : int, optional
            The axis over which to rank. By default (axis=-1) the ranking
            (and reducing) is performed over the last axis.
    
        Returns
        -------
        d : array
            In the case of, for example, a 2d array of shape (n, m) and
            axis=1, the output will contain the rank (normalized to be between
            -1 and 1 and adjusted for ties) of the the last element of each row.
            The output in this example will have shape (n,).
    
        Examples
        --------
        Create an array:
    
        >>> y1 = larry([1, 2, 3])
    
        What is the rank of the last element (the value 3 in this example)?
        It is the largest element so the rank is 1.0:
    
        >>> import numpy as np
        >>> from la.afunc import lastrank
        >>> x1 = np.array([1, 2, 3])
        >>> lastrank(x1)
        1.0
    
        Now let's try an example where the last element has the smallest
        value:
    
        >>> x2 = np.array([3, 2, 1])
        >>> lastrank(x2)
        -1.0
    
        Here's an example where the last element is not the minimum or maximum
        value:
    
        >>> x3 = np.array([1, 3, 4, 5, 2])
        >>> lastrank(x3)
        -0.5
    
        """
        a = np.array(a, copy=False)
        ndim = a.ndim
        if a.size == 0:
            # At least one dimension has length 0
            shape = list(a.shape)
            shape.pop(axis)
            r = np.empty(shape, dtype=a.dtype)
>           r.fill(np.nan)
E           ValueError: cannot convert float NaN to integer

a          = array([], shape=(1, 0), dtype=int64)
axis       = 0
ndim       = 2
r          = array([], dtype=int64)
shape      = [0]

bottleneck/.venv/lib/python3.11/site-packages/bottleneck/slow/move.py:236: ValueError

Downgrading to 1.23.5 makes tests pass.

To Reproduce
To assist in reproducing the bug, please include the following:

  1. git clone https://github.com/pydata/bottleneck
  2. pip install bottleneck pytest
  3. python -c "import bottleneck;bottleneck.test()"

Expected behavior
Tests passing ;-).

Additional context
Tested on a825e20.

@mgorny mgorny added the bug label Dec 22, 2022
bmwiedemann pushed a commit to bmwiedemann/openSUSE that referenced this issue Dec 25, 2022
https://build.opensuse.org/request/show/1045128
by user mcepl + dimstar_suse
Forwarded request #1045057 from bnavigator

- Skip a failing test -- gh#pydata/bottleneck#423
  - Add rpmlintrc
@nileshpatra
Copy link
Contributor

nileshpatra commented Jan 14, 2023

This is now reported also as a bug in corresponding debian package which need to be addressed soonish. A hacky way of getting this moving would be to create the r array with float dtype and later cast it back.

--- a/bottleneck/slow/move.py
+++ b/bottleneck/slow/move.py
@@ -232,8 +232,9 @@
         # At least one dimension has length 0
         shape = list(a.shape)
         shape.pop(axis)
-        r = np.empty(shape, dtype=a.dtype)
+        r = np.empty(shape, dtype=float)
         r.fill(np.nan)
+        r = r.astype(a.dtype)
         if (r.ndim == 0) and (r.size == 1):
             r = np.nan
         return r

I know this looks ugly, but just a suggestion which kind of works for a few combinations that I tried.

@qwhelan qwhelan removed their assignment Jan 14, 2023
@rdbisme
Copy link
Collaborator

rdbisme commented Jan 16, 2023

@nileshpatra Can you just check the performance impact of your fix? You can run the fix with and without the fix on the previous numpy version and see how that impacts.

if it's negligible, I think we can go on with the PR.

@nileshpatra
Copy link
Contributor

Hi @rdbisme I ran the benchmark with the current HEAD of bottleneck (master branch) and numpy 1.23.5 (wherein the code works fine). As it seems, the benchmark numbers are not consistent across runs (there are deltas of +/- 15% and I do not have the time/energy to do more runs/analysis on this), but the runs with and without patches do not differ much. Let me know what you think about this.

Without the patch:

Bottleneck performance benchmark
    Bottleneck 1.3.5.post0.dev3; Numpy 1.23.5
    Speed is NumPy time divided by Bottleneck time
    NaN means approx one-fifth NaNs; float64 used

              no NaN     no NaN      NaN       no NaN      NaN    
               (100,)  (1000,1000)(1000,1000)(1000,1000)(1000,1000)
               axis=0     axis=0     axis=0     axis=1     axis=1  
nansum         38.9        1.1        1.8        1.5        2.1
nanmean       112.0        1.7        2.1        2.3        2.4
nanstd        200.1        2.1        2.2        2.7        2.5
nanvar        184.7        2.2        2.1        2.7        2.4
nanmin         22.7        0.4        0.4        0.5        0.6
nanmax         23.9        0.4        0.4        0.5        0.6
median        101.1        1.2        5.8        1.0        5.7
nanmedian     106.7        4.3        4.9        4.1        4.8
ss             10.6        0.9        0.8        1.2        1.2
nanargmin      80.7        2.6        5.4        2.0        6.1
nanargmax      77.1        2.4        5.2        1.9        6.3
anynan          8.7        0.4       40.8        0.8       34.4
allnan         12.5      129.8       94.0      113.4       73.4
rankdata       49.2        2.0        2.1        2.4        2.3
nanrankdata    52.0        2.2        2.0        2.4        2.4
partition       2.4        1.2        1.6        1.0        1.5
argpartition    3.0        1.0        1.3        1.0        1.5
replace         9.8        1.5        1.6        1.5        1.6
push         1565.2       10.8       10.2       23.9       14.3
move_sum     2650.8       72.9      164.6      268.8      230.0
move_mean    6062.1      113.7      189.2      267.3      250.3
move_std     8331.0      134.8      235.1      177.6      293.9
move_var    10452.3      174.0      292.3      281.0      368.3
move_min     1131.9        7.7        7.3       13.0       16.9
move_max     1071.0        7.9        7.4       13.3       17.5
move_argmin  3073.9       42.6       80.1       47.2       89.0
move_argmax  3031.7       43.7       79.2       53.2       86.1
move_median  1646.1      133.7      133.7      144.1      147.7
move_rank     569.3        1.5        1.6        2.0        2.0

With the patch:

>>> bn.bench()
Bottleneck performance benchmark
    Bottleneck 1.3.5.post0.dev4; Numpy 1.23.5
    Speed is NumPy time divided by Bottleneck time
    NaN means approx one-fifth NaNs; float64 used

              no NaN     no NaN      NaN       no NaN      NaN    
               (100,)  (1000,1000)(1000,1000)(1000,1000)(1000,1000)
               axis=0     axis=0     axis=0     axis=1     axis=1  
nansum         42.3        1.1        1.8        1.5        2.1
nanmean       125.9        1.8        2.2        2.5        2.4
nanstd        217.9        2.2        2.1        2.8        2.4
nanvar        207.4        2.1        2.1        2.7        2.5
nanmin         22.5        0.4        0.4        0.5        0.6
nanmax         22.8        0.4        0.4        0.5        0.6
median        111.5        1.3        5.8        1.0        5.6
nanmedian     115.6        4.7        5.6        4.4        5.2
ss              9.1        0.8        0.8        1.2        1.2
nanargmin      72.5        2.6        5.5        1.9        7.0
nanargmax      82.6        2.4        5.4        1.9        6.3
anynan          9.5        0.4       40.8        0.8       35.3
allnan         13.2      139.5       90.0      110.3       72.3
rankdata       48.8        2.1        2.1        2.3        2.3
nanrankdata    55.7        2.5        2.2        2.5        2.5
partition       2.8        1.2        1.7        1.0        1.5
argpartition    3.5        1.0        1.3        1.0        1.5
replace         9.6        1.6        1.6        1.5        1.5
push         1617.2       11.0        9.5       22.4       14.5
move_sum     2810.9       73.2      166.9      269.3      228.4
move_mean    6681.1      120.2      195.3      272.4      254.2
move_std     8567.2      141.3      243.7      167.2      294.5
move_var    10914.3      204.9      293.9      287.0      384.6
move_min     1121.8        7.0        7.2       13.1       17.7
move_max     1074.7        7.3        8.2       14.0       17.6
move_argmin  3146.1       42.5       78.9       44.7       85.8
move_argmax  2974.9       42.5       83.5       44.7       87.1
move_median  1817.4      158.1      152.6      145.9      146.2
move_rank     568.9        1.5        1.4        2.1        1.9

@nileshpatra
Copy link
Contributor

@rdbisme We'd need to fix this in debian package soonish since it affects some key packages. And hence, could you please comment if the above report looks OK to you or you'd like something more?

@rdbisme
Copy link
Collaborator

rdbisme commented Jan 18, 2023

@nileshpatra Sounds good. Feel free to open a PR and we'll get it merged fast! :)

nileshpatra added a commit to nileshpatra/bottleneck that referenced this issue Jan 19, 2023
@nileshpatra
Copy link
Contributor

@rdbisme done, please consider to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants