BUG: Better report integer division overflow #21507

ganesh-k13 · 2022-05-14T13:47:13Z

Changes

Handle overflow cases
Testcases for same

Related: #21506
Finishes: #19260

numpy/core/tests/test_umath.py

ganesh-k13 · 2022-05-26T15:51:18Z

This took longer than expected and got caught up with a bunch of stuff, apologies for the delay, will fix the small difference in behavior mentioned above soon.

numpy/core/tests/test_umath.py

ganesh-k13 · 2022-05-30T13:50:30Z

Thanks again @seberg for the detailed help in this one. It's ready for review. Will just wait out the CI for windows in particular for good measure.

seberg

Thanks, looks good to me. I made a lot of silly comments, the only important points are:

1.I think should check for the 0 result for fmod/remainder.
2. The test failure is real, but not an actual problem. There is a #pragma on @scalar@_richcompare (or similar), that we probably need to copy to make things pass here as well.

Otherwise this looks ready to go in to me. I have to admit, I did not double check whether we may have other SIMD paths that can trigger the issue. I assume not, and if we do the tests should weed them out.
(The only thing I am wondering about is backporting tests if we are not sure whether they might fail.)

.gitignore

seberg · 2022-05-31T16:51:08Z

numpy/core/tests/test_umath.py

+            # then, result will be a larger type than dividend and will not
+            # result in an overflow for `divmod` and `floor_divide`.
+            if np.dtype(dividend_dtype).itemsize >= np.dtype(
+                    divisor_dtype).itemsize and operation in (


Maybe just dividend_type.itemsize, just to make it a bit shorter.

dividend_dtype is type yeah?

>>> np.int8.itemsize <attribute 'itemsize' of 'numpy.generic' objects> >>> np.int8.itemsize < np.int16.itemsize Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: '<' not supported between instances of 'getset_descriptor' and 'getset_descriptor' >>> np.dtype(np.int8).itemsize 1

numpy/core/tests/test_umath.py

numpy/core/src/umath/loops_modulo.dispatch.c.src

ganesh-k13 · 2022-06-04T17:36:38Z

Oh I forgot the #pragma change :(, will do

ganesh-k13 · 2022-06-05T06:26:18Z

Seems like a rebase fixed the issue due to some other change. I think this is ready for review @seberg

seberg · 2022-06-06T14:55:01Z

numpy/core/src/umath/scalarmath.c.src

+            *out = 0;
        }
-        return 0;
+        return NPY_FPE_DIVIDEBYZERO;


Shouldn't the above be:

*out = 0; if (b == 0) { return NPY_FPE_DIVIDEBYZERO; } return 0;

Since this is a remainder? I am surprised the test wouldn't catch it though, so maybe I am misreading.

Other than this question, I think this is ready. Thanks for working on fixing this long-standing issue!

numpy/core/tests/test_umath.py

ganesh-k13 · 2022-06-09T14:42:42Z

numpy/core/src/umath/scalarmath.c.src

@@ -167,7 +174,7 @@ static NPY_INLINE int
    }
 #if @neg@
    else if (b == -1 && a == NPY_MIN_@NAME@) {
-        *out = a / b;
+        *out = -NPY_MIN_@NAME@;


#21658 (comment)

Ok I'm just confused at this point as in arrays we do set it to 0

numpy/numpy/core/src/umath/loops_arithmetic.dispatch.c.src

Lines 293 to 295 in 5447192

if (b == 0 || (a == NPY_MIN_INT@len@ && b == -1)) {

npy_set_floatstatus_divbyzero();

*dst1 = 0;

But how are the tests passing for me locally, I need to see.

Did we always do this? It seems like all of this is a bit of a mess, NPY_MIN_@NAME@ (in this particular case), does seem a bit more sensible to me, since it is the "correct" overflowing result. OTOH, I don't particularly want to change it if it was always different.

Yeah, it seems we should have taken a short break and map out the exact behavior a bit clearer earlier...

The compiler warns on

numpy/core/src/umath/scalarmath.c.src:177:16: warning: integer overflow in expression ‘-2147483648’ of type ‘int’ results in ‘-2147483648’ [-Woverflow] 177 | *out = -NPY_MIN_@NAME@; | ^

ganesh-k13 · 2022-06-09T14:43:44Z

numpy/core/tests/test_umath.py

+                        TestDivisionOverflows.operator.floordiv):
+            # FIXME: Raise the same error in case of scalars and arrays
+            with pytest.raises(
+                    RuntimeWarning if operation is


Not very happy with this difference in behavior, but thought of following up as it might end up a bigger refactor? Ok with fixing now as well, let me know what's best.

Subtleties... Which errors are we giving? int_min / -1 (or all other operations), should probably give overflow warnings (rather than divide-by-zero), and I think this is what you see here?

But I am not sure if not all operations here should be overflow warnings/errors (which we should set to raise in the decorator, since we are testing them)?

I think RuntimeWarning for underflow is the best error.

Now I think the reason for so much of different behaviors is because floor_div in SIMD is not handled in all loops. Your guess on

check whether we may have other SIMD paths that can trigger the issue. I assume not, and if we do the tests should weed them out.

is accurate. I'll fix all of them here and set the result to 0 making it uniform across.

I think we all can agree -MIN is definitely an overflow and lets change it to 0

I think we all can agree -MIN is definitely an overflow and lets change it to 0

But the correct "rollover" result of -MIN overflowing is -MIN? So I think it is the better result, especially for -MIN itself (and why would MIN/-1 be different?). MIN is much more of a warning that something is wrong and it seems also like the "well defined" behavior (if the C-standard would define 2-complement signed integer overflow as defined rather than undefined)

Ah ok that does make sense. I forgot we do have a return value to indicate any errors as well. I'll go ahead and set it to -MIN (in a way the compiler does not treat it as a warning)

Yeah, we don't have it really, I guess. But I do feel it may be useful to users to get MIN; 0 is just not a "weird" value :).

Haha yeah agree on that. Yeah let's set it to MIN and raise an Overflow.

Just to summarize how I current see things:

For the "overflow" paths, I would argue for MIN as the correct result and would be happy if we change it if there are currently paths that return 0.

For the "division by zero" paths, I don't have a clear opinion. To some degree, I think MIN might be the better "error" value, but I think we had been consistent about returning 0, so I tend to think we should just keep doing that.

+1 to the proposal, I'll make the changes following this. It would be nice If we can officially standardize it somewhere :)

ganesh-k13 · 2022-06-12T13:38:07Z

numpy/core/src/umath/scalarmath.c.src

@@ -167,7 +174,7 @@ static NPY_INLINE int
    }
 #if @neg@
    else if (b == -1 && a == NPY_MIN_@NAME@) {
-        *out = a / b;
+        *out = NPY_MIN_@NAME@;


Ok it seems like its not covered here as well which is weird, let me check. ref: #21648 (comment)

I can confirm it's hitting in my machine.

In which test? Maybe somehow it is skipped in CI?

numpy/core/tests/test_umath.py::TestDivisionOverflows::test_overflows[floordiv-int8-int8] and similar

https://github.com/numpy/numpy/pull/21507/files#diff-ddb5d5f0dc988a0ab77107bb335499fd1ae477f73343f244e3c4d039bdca5166R786 Line to be precise

rafaelcfsousa · 2022-06-14T02:38:33Z

numpy/core/src/umath/loops_modulo.dispatch.c.src

+                      // Divide by zero
+                      quo = npyv_select_@sfx@(bzero, vzero, quo);
+                      // Overflow
+                      quo = npyv_select_@sfx@(overflow, vmin, quo);
+        npyv_store_@sfx@(dst1, quo);
        npyv_store_@sfx@(dst2, npyv_and_@sfx@(cvtozero, rem));


Hi @ganesh-k13 , see below one change needed for VSX/Power as the variable/vector cvtozero was deleted in one of the commits.

One more thing: your tests are passing when I execute them on Power/VSX.

Thank you for applying the changes for Power/VSX as well :-) 👍 .

P.S.: I realized that the other cvtozero that is used in the other function that implements the unsigned integer types could also be removed to reduce 2 SIMD instructions into just one (select+and --> select).

Suggested change

// Divide by zero

quo = npyv_select_@sfx@(bzero, vzero, quo);

// Overflow

quo = npyv_select_@sfx@(overflow, vmin, quo);

npyv_store_@sfx@(dst1, quo);

npyv_store_@sfx@(dst2, npyv_and_@sfx@(cvtozero, rem));

// Divide by zero

quo = npyv_select_@sfx@(bzero, vzero, quo);

rem = npyv_select_@sfx@(bzero, vzero, rem);

// Overflow

quo = npyv_select_@sfx@(overflow, vmin, quo);

rem = npyv_select_@sfx@(overflow, vzero, rem);

npyv_store_@sfx@(dst1, quo);

npyv_store_@sfx@(dst2, rem);

Thanks a lot @rafaelcfsousa! I have applied the changes to both signed and unsinged, can you take a look once please?

@ganesh-k13 :

Thank you for applying the changes!! 👍

There are two other small changes for Power/VSX as shown below.

We also need the addition of the line

rem = npyv_select_@sfx@(bzero, vzero, rem);

after the line below as shown in the suggested change box in my previous message (above).

numpy/numpy/core/src/umath/loops_modulo.dispatch.c.src

Line 347 in 4f428f0

quo = npyv_select_@sfx@(bzero, vzero, quo);

It is required as we compute the rem of the operation divmod using the equation below:

rem = a - (b * quo)

which means that we have to zero rem for b equal to zero, otherwise rem results in a.

About the variable cvtozero, the changes worked perfectly for both signed and unsigned types. But I forgot to say that one other variable (vneg_one) would become unused after removing cvtozero, see below:

numpy/core/src/umath/loops_modulo.dispatch.c.src:184:19: warning: unused variable ‘vneg_one’ [-Wunused-variable] 184 | const npyv_@sfx@ vneg_one = npyv_setall_@sfx@(-1);

You can remove the variable vneg_one defined at numpy/core/src/umath/loops_modulo.dispatch.c.src:184 (see below).

numpy/numpy/core/src/umath/loops_modulo.dispatch.c.src

Line 184 in 4f428f0

const npyv_@sfx@ vneg_one = npyv_setall_@sfx@(-1);

That's all, thanks again for applying the changes for VSX4/Power10. :-) 👍

Ah ok, so we perform the combined action of cvtozero using the two selects. Thanks a lot Rafael! Really shooting in the dark here without the hardware.

seberg · 2022-07-06T20:36:24Z

Thanks, I took the liberty to add two pretty extensive tests (maybe we can use the pattern elsewhere too), but keep the old tests around. The reason is that the scalar path I think is covered (coverage is wrong, first time I saw that). But the other two paths should actually be covered (and should be now).

However, I don't trust the vsx contig_scalar paths. @rafaelcfsousa could you have a look into it? I suppose as long as they don't crash it would already be pretty good, but maybe it is not too hard to get the warnings right. (I suspect we need special loops for the two special scalar values 0 and -1).

rafaelcfsousa · 2022-07-06T22:05:28Z

However, I don't trust the vsx contig_scalar paths. @rafaelcfsousa could you have a look into it? I suppose as long as they don't crash it would already be pretty good, but maybe it is not too hard to get the warnings right. (I suspect we need special loops for the two special scalar values 0 and -1).

Hi @seberg, both build and tests (including the ones you just added in ac5a846) are working correctly on a Power10/VSX4 machine.

python runtests.py -v -t numpy/core/tests/test_umath.py -vvv
========= 594 passed, 2 xpassed in 4.32s ===============

About the special cases in the VSX code:

for scalar = 0, as the execution goes thru the scalar path (ie code without vectorization), we do not have checks to identify div_by_zero in the contig_scalar
for scalar = -1, we have checks to identify overflow.

seberg · 2022-07-07T00:14:46Z

Thanks for the quick check, had a small problem with the test, but I don't think that was the problem :). What I missed is that the contig+scalar path is disabled early on for the // 0 case and I thought the tests may just miss parts here.

seberg · 2022-07-07T00:36:31Z

OK, the tests coverage only claims the scalar path is not hit (which does not add up). I am planning to merge this soon unless anyone has comments on my test additions.

mattip · 2022-07-07T09:28:03Z

Does the change affect benchmarks or is the DIVIDEBYZERO_OVERFLOW_CHECK not as expensive as it looks?

seberg · 2022-07-07T13:46:50Z

Hmmm, good point, it is a necessary fix for remainder (and maybe divmod) to fix a bug. The scalar paths, I would like the overflow warning, but if adding these 1-2 integer ops has noticeable slowdown, we could try removing the check (and warning) from the the other array paths. We don't typically give overflow warnings anyway.

ganesh-k13 · 2022-07-09T11:15:09Z

Thanks a lot for the tests @seberg and @rafaelcfsousa! Let me check the benchmarks to see if we have any slowdown.

ganesh-k13 · 2022-07-09T11:47:54Z

Not sure if this covers what we need?

bench_ufunc.UFunc

(np-test) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/os/numpy (bug_div_overflow) » python3 runtests.py --bench-compare main bench_ufunc.UFunc                                                                                                                130 ↵ ganesh@ganesh-MS-7B86
· Creating environments
· Discovering benchmarks
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For numpy commit b89a1d21 <main> (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[  0.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[  0.00%] ··· Importing benchmark suite produced output:
[  0.00%] ···· NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
[  4.17%] ··· Running (bench_ufunc.UFunc.time_ufunc_types--)......
[ 25.00%] · For numpy commit f918491c <bug_div_overflow> (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 25.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 29.17%] ··· Running (bench_ufunc.UFunc.time_ufunc_types--)......
[ 50.00%] · For numpy commit f918491c <bug_div_overflow> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 54.17%] ··· bench_ufunc.UFunc.time_ufunc_types                                                                                                                                                                                    ok
[ 54.17%] ··· =============== =============
                   ufunc
              --------------- -------------
                    abs          588±2μs
                  absolute       588±2μs
                    add          254±2μs
                   arccos      5.79±0.01ms
                  arccosh      5.04±0.01ms
                   arcsin      5.86±0.02ms
                  arcsinh      4.80±0.01ms
                   arctan      2.64±0.01ms
                  arctan2      1.43±0.01ms
                  arctanh      3.28±0.01ms
                bitwise_and     16.5±0.5μs
                bitwise_not    10.9±0.05μs
                 bitwise_or     16.9±0.6μs
                bitwise_xor     17.4±0.1μs
                    cbrt         1.31±0ms
                    ceil        133±0.2μs
                    conj        132±0.6μs
                 conjugate       134±2μs
                  copysign      146±0.3μs
                    cos        7.03±0.01ms
                    cosh       5.94±0.01ms
                  deg2rad        184±9μs
                  degrees        196±10μs
                   divide        442±2μs
                   divmod        733±10μs
                   equal         192±2μs
                    exp          4.35±0ms
                    exp2       4.41±0.02ms
                   expm1       7.27±0.03ms
                    fabs         188±6μs
                float_power    9.93±0.01ms
                   floor        136±0.4μs
                floor_divide     615±1μs
                    fmax         287±1μs
                    fmin         287±3μs
                    fmod         423±2μs
                   frexp        273±0.4μs
                    gcd         113±0.5μs
                  greater       219±0.6μs
               greater_equal     224±5μs
                 heaviside       273±5μs
                   hypot         610±2μs
                   invert       11.7±0.1μs
                  isfinite      116±0.2μs
                   isinf        106±0.8μs
                   isnan        96.7±0.4μs
                   isnat         262±2ns
                    lcm         212±0.09μs
                   ldexp        153±0.6μs
                 left_shift     77.5±0.1μs
                    less         219±2μs
                 less_equal      223±2μs
                    log          2.39±0ms
                   log10       2.71±0.01ms
                   log1p       2.77±0.02ms
                    log2       2.54±0.01ms
                 logaddexp       247±7μs
                 logaddexp2      259±5μs
                logical_and     221±0.5μs
                logical_not     129±0.4μs
                 logical_or     222±0.5μs
                logical_xor     241±0.8μs
                   matmul       27.3±0.1ms
                  maximum       275±0.5μs
                  minimum        275±2μs
                    mod          513±1μs
                    modf         351±5μs
                  multiply      252±0.5μs
                  negative       152±3μs
                 nextafter      309±0.2μs
                 not_equal      203±0.5μs
                  positive       164±2μs
                   power       10.1±0.01ms
                  rad2deg        194±8μs
                  radians        199±1μs
                 reciprocal      344±2μs
                 remainder       513±1μs
                right_shift     78.6±0.8μs
                    rint         235±2μs
                    sign         182±3μs
                  signbit       106±0.3μs
                    sin        6.89±0.01ms
                    sinh         6.84±0ms
                  spacing       307±0.9μs
                    sqrt       1.28±0.01ms
                   square        184±1μs
                  subtract       261±3μs
                    tan        7.49±0.01ms
                    tanh       5.63±0.03ms
                true_divide      447±2μs
                   trunc         146±8μs
              =============== =============

[ 58.33%] ··· bench_ufunc.UFuncSmall.time_ufunc_numpy_scalar                                                                                                                                                                        ok
[ 58.33%] ··· ======= ==========
               ufunc
              ------- ----------
                abs    817±10ns
                sqrt   818±10ns
                cos    817±9ns
              ======= ==========

[ 62.50%] ··· bench_ufunc.UFuncSmall.time_ufunc_python_float                                                                                                                                                                        ok
[ 62.50%] ··· ======= =========
               ufunc
              ------- ---------
                abs    795±7ns
                sqrt   795±8ns
                cos    796±8ns
              ======= =========

[ 66.67%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array                                                                                                                                                                         ok
[ 66.67%] ··· ======= ==========
               ufunc
              ------- ----------
                abs    533±3ns
                sqrt   523±2ns
                cos    562±10ns
              ======= ==========

[ 70.83%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array_inplace                                                                                                                                                                 ok
[ 70.83%] ··· ======= =========
               ufunc
              ------- ---------
                abs    462±9ns
                sqrt   452±7ns
                cos    473±3ns
              ======= =========

[ 75.00%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_int_array                                                                                                                                                                     ok
[ 75.00%] ··· ======= =========
               ufunc
              ------- ---------
                abs    506±8ns
                sqrt   960±8ns
                cos    967±9ns
              ======= =========

[ 75.00%] · For numpy commit b89a1d21 <main> (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 75.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 79.17%] ··· bench_ufunc.UFunc.time_ufunc_types                                                                                                                                                                                    ok
[ 79.17%] ··· =============== =============
                   ufunc
              --------------- -------------
                    abs          580±1μs
                  absolute       583±2μs
                    add          255±1μs
                   arccos      5.69±0.02ms
                  arccosh      4.99±0.04ms
                   arcsin      5.76±0.01ms
                  arcsinh      4.83±0.02ms
                   arctan      2.60±0.01ms
                  arctan2        1.41±0ms
                  arctanh      3.28±0.01ms
                bitwise_and    15.9±0.03μs
                bitwise_not    10.8±0.05μs
                 bitwise_or     16.2±0.2μs
                bitwise_xor     17.4±0.5μs
                    cbrt       1.31±0.01ms
                    ceil        136±0.8μs
                    conj        136±0.9μs
                 conjugate      135±0.8μs
                  copysign       149±1μs
                    cos        7.86±0.01ms
                    cosh       6.10±0.01ms
                  deg2rad       182±0.3μs
                  degrees        183±1μs
                   divide        444±2μs
                   divmod        725±4μs
                   equal        192±0.1μs
                    exp        4.33±0.01ms
                    exp2       4.40±0.03ms
                   expm1       7.18±0.05ms
                    fabs        175±0.4μs
                float_power    9.72±0.01ms
                   floor        136±0.2μs
                floor_divide    609±0.4μs
                    fmax         286±5μs
                    fmin         285±1μs
                    fmod        424±0.9μs
                   frexp        274±0.2μs
                    gcd         112±0.1μs
                  greater       221±0.5μs
               greater_equal     220±1μs
                 heaviside       280±2μs
                   hypot         612±1μs
                   invert       10.8±0.3μs
                  isfinite       109±1μs
                   isinf         108±2μs
                   isnan        98.7±0.2μs
                   isnat         270±2ns
                    lcm          212±1μs
                   ldexp         151±1μs
                 left_shift    76.8±0.06μs
                    less         218±1μs
                 less_equal      222±3μs
                    log        2.38±0.01ms
                   log10       2.72±0.02ms
                   log1p       2.76±0.02ms
                    log2         2.51±0ms
                 logaddexp      252±0.5μs
                 logaddexp2     253±0.7μs
                logical_and     219±0.9μs
                logical_not     127±0.6μs
                 logical_or     216±0.2μs
                logical_xor      237±1μs
                   matmul      27.9±0.06ms
                  maximum       270±0.3μs
                  minimum        273±2μs
                    mod          516±2μs
                    modf         348±1μs
                  multiply       256±3μs
                  negative       152±1μs
                 nextafter      322±0.9μs
                 not_equal       202±1μs
                  positive       174±3μs
                   power         10.0±0ms
                  rad2deg       183±0.9μs
                  radians       183±0.9μs
                 reciprocal      346±3μs
                 remainder       515±2μs
                right_shift     76.3±0.9μs
                    rint        233±0.7μs
                    sign         176±1μs
                  signbit       109±0.4μs
                    sin          7.02±0ms
                    sinh         6.34±0ms
                  spacing        305±1μs
                    sqrt       1.25±0.01ms
                   square       179±0.8μs
                  subtract       255±2μs
                    tan          7.23±0ms
                    tanh       5.67±0.04ms
                true_divide      443±2μs
                   trunc         144±8μs
              =============== =============

[ 83.33%] ··· bench_ufunc.UFuncSmall.time_ufunc_numpy_scalar                                                                                                                                                                        ok
[ 83.33%] ··· ======= =========
               ufunc
              ------- ---------
                abs    812±9ns
                sqrt   809±8ns
                cos    804±7ns
              ======= =========

[ 87.50%] ··· bench_ufunc.UFuncSmall.time_ufunc_python_float                                                                                                                                                                        ok
[ 87.50%] ··· ======= =========
               ufunc
              ------- ---------
                abs    791±6ns
                sqrt   771±6ns
                cos    779±3ns
              ======= =========

[ 91.67%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array                                                                                                                                                                         ok
[ 91.67%] ··· ======= =========
               ufunc
              ------- ---------
                abs    525±4ns
                sqrt   516±4ns
                cos    556±3ns
              ======= =========

[ 95.83%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array_inplace                                                                                                                                                                 ok
[ 95.83%] ··· ======= =========
               ufunc
              ------- ---------
                abs    469±3ns
                sqrt   451±3ns
                cos    467±4ns
              ======= =========

[100.00%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_int_array                                                                                                                                                                     ok
[100.00%] ··· ======= ==========
               ufunc
              ------- ----------
                abs    503±4ns
                sqrt   950±4ns
                cos    964±20ns
              ======= ==========

       before           after         ratio
     [b89a1d21]       [f918491c]
     <main>           <bug_div_overflow>
+       183±0.9μs          199±1μs     1.09  bench_ufunc.UFunc.time_ufunc_types('radians')
+        6.34±0ms         6.84±0ms     1.08  bench_ufunc.UFunc.time_ufunc_types('sinh')
+       175±0.4μs          188±6μs     1.07  bench_ufunc.UFunc.time_ufunc_types('fabs')
+         109±1μs        116±0.2μs     1.06  bench_ufunc.UFunc.time_ufunc_types('isfinite')
-         174±3μs          164±2μs     0.94  bench_ufunc.UFunc.time_ufunc_types('positive')
-     7.86±0.01ms      7.03±0.01ms     0.89  bench_ufunc.UFunc.time_ufunc_types('cos')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

mattip · 2022-07-09T18:40:28Z

It seems the benchmark changes to the division ufuncs are not significant.

seberg · 2022-07-09T19:31:48Z

Having a quick look, I am not sure the benchmark actually check the integer functions well (focusing on float64).

ganesh-k13 · 2022-07-10T12:46:04Z

I thought of adding an integer path. But man is it complicated to understand the setup :).

I wrote a custom piece: [EDIT] Added dtype=dtype

 class UFuncInt(Benchmark):
     params = [np.sctypes['int']]
     param_names = ['dtype']
     timeout = 10
 
     def setup(self, dtype):
         np.seterr(all='ignore')
         low, high = np.iinfo(dtype).min, np.iinfo(dtype).max
         self.arrays = [np.random.randint(low, high, size=i, dtype=dtype) for i in range(1, 10000)]
         self.divisor = np.array([np.random.randint(low, high)], dtype=dtype)
 
     def time_ufunc_int_divide(self, dtype):
         for a in self.arrays:
             np.floor_divide(a, self.divisor)
             np.divmod(a, self.divisor)
             np.fmod(a, self.divisor)

and got:

~/os/numpy (bug_div_overflow*) » python3 runtests.py --bench-compare main bench_ufunc.UFuncInt                                                                              ganesh@ganesh-MS-7B86
********************************************************************************
WARNING: you have uncommitted changes --- these will NOT be benchmarked!
********************************************************************************
· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.10-Cython-setuptools59.2.0
·· Installing f918491c <bug_div_overflow> into virtualenv-py3.10-Cython-setuptools59.2.0.
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For numpy commit b89a1d21 <main> (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[  0.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[  0.00%] ··· Importing benchmark suite produced output:
[  0.00%] ···· NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
[ 25.00%] ··· Running (bench_ufunc.UFuncInt.time_ufunc_int_divide--).
[ 25.00%] · For numpy commit f918491c <bug_div_overflow> (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 25.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 50.00%] ··· Running (bench_ufunc.UFuncInt.time_ufunc_int_divide--).
[ 50.00%] · For numpy commit f918491c <bug_div_overflow> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 75.00%] ··· bench_ufunc.UFuncInt.time_ufunc_int_divide                                                                                                                                        ok
[ 75.00%] ··· ============= =========
                  dtype              
              ------------- ---------
                numpy.int8   509±7ms 
               numpy.int16   511±5ms 
               numpy.int32   496±3ms 
               numpy.int64   481±5ms 
              ============= =========

[ 75.00%] · For numpy commit b89a1d21 <main> (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 75.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[100.00%] ··· bench_ufunc.UFuncInt.time_ufunc_int_divide                                                                                                                                        ok
[100.00%] ··· ============= =========
                  dtype              
              ------------- ---------
                numpy.int8   495±3ms 
               numpy.int16   498±4ms 
               numpy.int32   493±4ms 
               numpy.int64   481±3ms 
              ============= =========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

But this seems definitely wrong though yeah? There has to be some sort of impact? I changed low to 1 in order to avoid potential 0's that may short the operations, but no difference.

seberg · 2022-07-10T15:07:52Z

Probably best to split up the test by function and then limit it to a single run with a large array (10000 elements should be fine). Small arrays are not too interesting (from a dtype perspective), since ufunc overhead will dominate those.

Do we have special paths for contiguous arrays and contig+scalar at all (except for VSX4?). If we do not have those, maybe the code is just not optimized enough that a difference can show up.
But the difference could just be very small. Division is a very slow instruction and the added check is a single instruction in most cases.

ganesh-k13 · 2022-07-10T16:24:50Z

Probably best to split up the test by function and then limit it to a single run with a large array (10000 elements should be fine).

I tried this:

 class UFuncInt(Benchmark):
     params = [np.sctypes['int']]
     param_names = ['dtype']
     timeout = 10
 
     def setup(self, dtype):
         np.seterr(all='ignore')
         low, high = np.iinfo(dtype).min, np.iinfo(dtype).max
         self.array = np.random.randint(1, high, size=10000, dtype=dtype)
         self.divisor = np.array([np.random.randint(1, high)], dtype=dtype)
 
     def time_ufunc_int_divide(self, dtype):
         np.floor_divide(self.array, self.divisor)
 
     def time_ufunc_int_divmod(self, dtype):
         np.divmod(self.array, self.divisor)
 
     def time_ufunc_int_fmod(self, dtype):
         np.fmod(self.array, self.divisor)

And got the same BENCHMARKS NOT SIGNIFICANTLY CHANGED.

But the difference could just be very small. Division is a very slow instruction and the added check is a single instruction in most cases.

Yeah, I think this is a very good explanation. During our SIMD migration from libdivide(#18766), we implemented this paper. Now this gets us to about at least 20 instruction miminimum, barring other overheads. Now adding this if should not tilt the ratio a lot I guess.

mattip · 2022-08-10T13:11:52Z

Does this PR need more review?

ganesh-k13 · 2022-08-24T16:48:57Z

Hey @seberg / @mattip, do let me know if any additional things to take care here, post-merge, I'll work on documenting all of this.

seberg · 2022-09-08T08:31:02Z

Thanks Ganesh, sorry for keeping this hanging so long. I just browesed through once more, but there was not anything to be done beyond checking that we don't have a big performance degradation.

We should try to backport this and the related PR, and we discussed yesterday, that we should not worry about the tiny changes in behavior since this is extremely niche (niche enough that the bug was not noticed in practice).

I will try to create backport PR with both very soon (maybe its all just applying this anyway). We should ping remember to ping @rafaelcfsousa on the backport as well.

ganesh-k13 · 2022-09-08T12:52:53Z

Thanks a lot @seberg! I'll raise a backport PR right away, no worries.

seberg · 2022-09-08T12:54:36Z

Thanks, if you have time for it that is great. But do not feel compelled!

github-actions bot added the 00 - Bug label May 14, 2022

ganesh-k13 force-pushed the bug_div_overflow branch from 956f56b to 254dbd0 Compare May 14, 2022 13:50

ganesh-k13 commented May 14, 2022

View reviewed changes

numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved

ganesh-k13 force-pushed the bug_div_overflow branch from 254dbd0 to bacf182 Compare May 26, 2022 15:48

ganesh-k13 commented May 30, 2022

View reviewed changes

numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved

seberg reviewed May 30, 2022

View reviewed changes

numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved

ganesh-k13 force-pushed the bug_div_overflow branch from bacf182 to e57c99b Compare May 30, 2022 13:48

seberg reviewed May 31, 2022

View reviewed changes

ganesh-k13 force-pushed the bug_div_overflow branch 2 times, most recently from c662b9b to 197d7c0 Compare June 4, 2022 17:19

ganesh-k13 requested a review from seberg June 4, 2022 17:21

ganesh-k13 mentioned this pull request Jun 5, 2022

ENH: issue overflow warning when using abs on np.int8(-128) #21648

Merged

seberg reviewed Jun 6, 2022

View reviewed changes

numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved

seberg added this to the 1.23.1 release milestone Jun 9, 2022

ganesh-k13 mentioned this pull request Jun 9, 2022

MAINT: improved overflow check to avoid undefined behavior #21658

Merged

ganesh-k13 commented Jun 9, 2022

View reviewed changes

ganesh-k13 force-pushed the bug_div_overflow branch 2 times, most recently from 163f7a5 to b9c27d5 Compare June 9, 2022 14:48

ganesh-k13 mentioned this pull request Jun 10, 2022

MAINT, SIMD: Handle overflow gracefully in integer division #21727

Merged

ganesh-k13 force-pushed the bug_div_overflow branch 3 times, most recently from abbc022 to 85f1b09 Compare June 12, 2022 05:28

ganesh-k13 commented Jun 12, 2022

View reviewed changes

rafaelcfsousa reviewed Jun 14, 2022

View reviewed changes

TST: Add unsigned integers to integer divide-by-zero test

f918491

charris modified the milestones: 1.23.1 release, 1.23.2 Jul 8, 2022

charris added the 09 - Backport-Candidate PRs tagged should be backported label Aug 13, 2022

charris modified the milestones: 1.23.2 release, 1.23.3 release Aug 13, 2022

charris added the triage review Issue/PR to be discussed at the next triage meeting label Sep 7, 2022

seberg changed the title ~~BUG: Division overflow~~ BUG: Better report integer division overflow Sep 8, 2022

seberg merged commit e9295bf into numpy:main Sep 8, 2022

seberg mentioned this pull request Sep 8, 2022

BUG: Integer overflow in remainder for Windows #19260

Closed

2 tasks

ganesh-k13 mentioned this pull request Sep 8, 2022

BUG: Better report integer division overflow (backport) #22230

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 8, 2022

	if (b == 0 \|\| (a == NPY_MIN_INT@len@ && b == -1)) {
	npy_set_floatstatus_divbyzero();
	*dst1 = 0;

BUG: Better report integer division overflow #21507

BUG: Better report integer division overflow #21507

Conversation

ganesh-k13 commented May 14, 2022 • edited

Changes

ganesh-k13 commented May 26, 2022

ganesh-k13 commented May 30, 2022

seberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganesh-k13 commented Jun 4, 2022

ganesh-k13 commented Jun 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ganesh-k13 Jun 13, 2022 • edited

Choose a reason for hiding this comment

rafaelcfsousa Jun 14, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rafaelcfsousa Jun 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seberg commented Jul 6, 2022

rafaelcfsousa commented Jul 6, 2022

seberg commented Jul 7, 2022

seberg commented Jul 7, 2022

mattip commented Jul 7, 2022

seberg commented Jul 7, 2022

ganesh-k13 commented Jul 9, 2022

ganesh-k13 commented Jul 9, 2022 • edited

mattip commented Jul 9, 2022

seberg commented Jul 9, 2022

ganesh-k13 commented Jul 10, 2022 • edited

seberg commented Jul 10, 2022

ganesh-k13 commented Jul 10, 2022 • edited

mattip commented Aug 10, 2022

ganesh-k13 commented Aug 24, 2022

seberg commented Sep 8, 2022

ganesh-k13 commented Sep 8, 2022

seberg commented Sep 8, 2022

ganesh-k13 commented May 14, 2022 •

edited

ganesh-k13 Jun 13, 2022 •

edited

rafaelcfsousa Jun 14, 2022 •

edited

rafaelcfsousa Jun 18, 2022 •

edited

ganesh-k13 commented Jul 9, 2022 •

edited

ganesh-k13 commented Jul 10, 2022 •

edited

ganesh-k13 commented Jul 10, 2022 •

edited