Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Better report integer division overflow #21507

Merged
merged 8 commits into from Sep 8, 2022

Conversation

ganesh-k13
Copy link
Member

@ganesh-k13 ganesh-k13 commented May 14, 2022

Changes

  • Handle overflow cases
  • Testcases for same

Related: #21506
Finishes: #19260

@ganesh-k13
Copy link
Member Author

This took longer than expected and got caught up with a bunch of stuff, apologies for the delay, will fix the small difference in behavior mentioned above soon.

@ganesh-k13
Copy link
Member Author

Thanks again @seberg for the detailed help in this one. It's ready for review. Will just wait out the CI for windows in particular for good measure.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me. I made a lot of silly comments, the only important points are:

1.I think should check for the 0 result for fmod/remainder.
2. The test failure is real, but not an actual problem. There is a #pragma on @scalar@_richcompare (or similar), that we probably need to copy to make things pass here as well.

Otherwise this looks ready to go in to me. I have to admit, I did not double check whether we may have other SIMD paths that can trigger the issue. I assume not, and if we do the tests should weed them out.
(The only thing I am wondering about is backporting tests if we are not sure whether they might fail.)

.gitignore Outdated Show resolved Hide resolved
# then, result will be a larger type than dividend and will not
# result in an overflow for `divmod` and `floor_divide`.
if np.dtype(dividend_dtype).itemsize >= np.dtype(
divisor_dtype).itemsize and operation in (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just dividend_type.itemsize, just to make it a bit shorter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dividend_dtype is type yeah?

>>> np.int8.itemsize
<attribute 'itemsize' of 'numpy.generic' objects>
>>> np.int8.itemsize < np.int16.itemsize
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'getset_descriptor' and 'getset_descriptor'
>>> np.dtype(np.int8).itemsize
1

numpy/core/tests/test_umath.py Show resolved Hide resolved
numpy/core/tests/test_umath.py Show resolved Hide resolved
numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved
numpy/core/tests/test_umath.py Outdated Show resolved Hide resolved
numpy/core/src/umath/loops_modulo.dispatch.c.src Outdated Show resolved Hide resolved
@ganesh-k13 ganesh-k13 force-pushed the bug_div_overflow branch 2 times, most recently from c662b9b to 197d7c0 Compare June 4, 2022 17:19
@ganesh-k13 ganesh-k13 requested a review from seberg June 4, 2022 17:21
@ganesh-k13
Copy link
Member Author

Oh I forgot the #pragma change :(, will do

@ganesh-k13
Copy link
Member Author

Seems like a rebase fixed the issue due to some other change. I think this is ready for review @seberg

Comment on lines 204 to 206
*out = 0;
}
return 0;
return NPY_FPE_DIVIDEBYZERO;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the above be:

*out = 0;
if (b == 0) {
    return NPY_FPE_DIVIDEBYZERO;
}
return 0;

Since this is a remainder? I am surprised the test wouldn't catch it though, so maybe I am misreading.

Other than this question, I think this is ready. Thanks for working on fixing this long-standing issue!

@@ -167,7 +174,7 @@ static NPY_INLINE int
}
#if @neg@
else if (b == -1 && a == NPY_MIN_@NAME@) {
*out = a / b;
*out = -NPY_MIN_@NAME@;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm just confused at this point as in arrays we do set it to 0

if (b == 0 || (a == NPY_MIN_INT@len@ && b == -1)) {
npy_set_floatstatus_divbyzero();
*dst1 = 0;

But how are the tests passing for me locally, I need to see.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we always do this? It seems like all of this is a bit of a mess, NPY_MIN_@NAME@ (in this particular case), does seem a bit more sensible to me, since it is the "correct" overflowing result. OTOH, I don't particularly want to change it if it was always different.

Yeah, it seems we should have taken a short break and map out the exact behavior a bit clearer earlier...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler warns on

numpy/core/src/umath/scalarmath.c.src:177:16: warning: integer overflow in expression ‘-2147483648’ of type ‘int’ results in ‘-2147483648’ [-Woverflow]
  177 |         *out = -NPY_MIN_@NAME@;
      |                ^

TestDivisionOverflows.operator.floordiv):
# FIXME: Raise the same error in case of scalars and arrays
with pytest.raises(
RuntimeWarning if operation is
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very happy with this difference in behavior, but thought of following up as it might end up a bigger refactor? Ok with fixing now as well, let me know what's best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subtleties... Which errors are we giving? int_min / -1 (or all other operations), should probably give overflow warnings (rather than divide-by-zero), and I think this is what you see here?

But I am not sure if not all operations here should be overflow warnings/errors (which we should set to raise in the decorator, since we are testing them)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RuntimeWarning for underflow is the best error.

Now I think the reason for so much of different behaviors is because floor_div in SIMD is not handled in all loops. Your guess on

check whether we may have other SIMD paths that can trigger the issue. I assume not, and if we do the tests should weed them out.

is accurate. I'll fix all of them here and set the result to 0 making it uniform across.

I think we all can agree -MIN is definitely an overflow and lets change it to 0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all can agree -MIN is definitely an overflow and lets change it to 0

But the correct "rollover" result of -MIN overflowing is -MIN? So I think it is the better result, especially for -MIN itself (and why would MIN/-1 be different?). MIN is much more of a warning that something is wrong and it seems also like the "well defined" behavior (if the C-standard would define 2-complement signed integer overflow as defined rather than undefined)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok that does make sense. I forgot we do have a return value to indicate any errors as well. I'll go ahead and set it to -MIN (in a way the compiler does not treat it as a warning)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't have it really, I guess. But I do feel it may be useful to users to get MIN; 0 is just not a "weird" value :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah agree on that. Yeah let's set it to MIN and raise an Overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to summarize how I current see things:

  • For the "overflow" paths, I would argue for MIN as the correct result and would be happy if we change it if there are currently paths that return 0.
  • For the "division by zero" paths, I don't have a clear opinion. To some degree, I think MIN might be the better "error" value, but I think we had been consistent about returning 0, so I tend to think we should just keep doing that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the proposal, I'll make the changes following this. It would be nice If we can officially standardize it somewhere :)

@@ -167,7 +174,7 @@ static NPY_INLINE int
}
#if @neg@
else if (b == -1 && a == NPY_MIN_@NAME@) {
*out = a / b;
*out = NPY_MIN_@NAME@;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok it seems like its not covered here as well which is weird, let me check. ref: #21648 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm it's hitting in my machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which test? Maybe somehow it is skipped in CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy/core/tests/test_umath.py::TestDivisionOverflows::test_overflows[floordiv-int8-int8] and similar

https://github.com/numpy/numpy/pull/21507/files#diff-ddb5d5f0dc988a0ab77107bb335499fd1ae477f73343f244e3c4d039bdca5166R786 Line to be precise

Comment on lines 346 to 351
// Divide by zero
quo = npyv_select_@sfx@(bzero, vzero, quo);
// Overflow
quo = npyv_select_@sfx@(overflow, vmin, quo);
npyv_store_@sfx@(dst1, quo);
npyv_store_@sfx@(dst2, npyv_and_@sfx@(cvtozero, rem));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ganesh-k13 , see below one change needed for VSX/Power as the variable/vector cvtozero was deleted in one of the commits.

One more thing: your tests are passing when I execute them on Power/VSX.

Thank you for applying the changes for Power/VSX as well :-) 👍 .

P.S.: I realized that the other cvtozero that is used in the other function that implements the unsigned integer types could also be removed to reduce 2 SIMD instructions into just one (select+and --> select).

Suggested change
// Divide by zero
quo = npyv_select_@sfx@(bzero, vzero, quo);
// Overflow
quo = npyv_select_@sfx@(overflow, vmin, quo);
npyv_store_@sfx@(dst1, quo);
npyv_store_@sfx@(dst2, npyv_and_@sfx@(cvtozero, rem));
// Divide by zero
quo = npyv_select_@sfx@(bzero, vzero, quo);
rem = npyv_select_@sfx@(bzero, vzero, rem);
// Overflow
quo = npyv_select_@sfx@(overflow, vmin, quo);
rem = npyv_select_@sfx@(overflow, vzero, rem);
npyv_store_@sfx@(dst1, quo);
npyv_store_@sfx@(dst2, rem);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @rafaelcfsousa! I have applied the changes to both signed and unsinged, can you take a look once please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ganesh-k13 :

Thank you for applying the changes!! 👍

There are two other small changes for Power/VSX as shown below.


  1. We also need the addition of the line
rem = npyv_select_@sfx@(bzero, vzero, rem);

after the line below as shown in the suggested change box in my previous message (above).

quo = npyv_select_@sfx@(bzero, vzero, quo);

It is required as we compute the rem of the operation divmod using the equation below:

rem = a - (b * quo)

which means that we have to zero rem for b equal to zero, otherwise rem results in a.


  1. About the variable cvtozero, the changes worked perfectly for both signed and unsigned types. But I forgot to say that one other variable (vneg_one) would become unused after removing cvtozero, see below:
numpy/core/src/umath/loops_modulo.dispatch.c.src:184:19: warning: unused variable ‘vneg_one’ [-Wunused-variable]
  184 |     const npyv_@sfx@ vneg_one = npyv_setall_@sfx@(-1);

You can remove the variable vneg_one defined at numpy/core/src/umath/loops_modulo.dispatch.c.src:184 (see below).

const npyv_@sfx@ vneg_one = npyv_setall_@sfx@(-1);


That's all, thanks again for applying the changes for VSX4/Power10. :-) 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, so we perform the combined action of cvtozero using the two selects. Thanks a lot Rafael! Really shooting in the dark here without the hardware.

@seberg
Copy link
Member

seberg commented Jul 6, 2022

Thanks, I took the liberty to add two pretty extensive tests (maybe we can use the pattern elsewhere too), but keep the old tests around. The reason is that the scalar path I think is covered (coverage is wrong, first time I saw that). But the other two paths should actually be covered (and should be now).

However, I don't trust the vsx contig_scalar paths. @rafaelcfsousa could you have a look into it? I suppose as long as they don't crash it would already be pretty good, but maybe it is not too hard to get the warnings right. (I suspect we need special loops for the two special scalar values 0 and -1).

@rafaelcfsousa
Copy link
Contributor

However, I don't trust the vsx contig_scalar paths. @rafaelcfsousa could you have a look into it? I suppose as long as they don't crash it would already be pretty good, but maybe it is not too hard to get the warnings right. (I suspect we need special loops for the two special scalar values 0 and -1).

Hi @seberg, both build and tests (including the ones you just added in ac5a846) are working correctly on a Power10/VSX4 machine.

python runtests.py -v -t numpy/core/tests/test_umath.py -vvv
========= 594 passed, 2 xpassed in 4.32s ===============

About the special cases in the VSX code:

  • for scalar = 0, as the execution goes thru the scalar path (ie code without vectorization), we do not have checks to identify div_by_zero in the contig_scalar
  • for scalar = -1, we have checks to identify overflow.

@seberg
Copy link
Member

seberg commented Jul 7, 2022

Thanks for the quick check, had a small problem with the test, but I don't think that was the problem :). What I missed is that the contig+scalar path is disabled early on for the // 0 case and I thought the tests may just miss parts here.

@seberg
Copy link
Member

seberg commented Jul 7, 2022

OK, the tests coverage only claims the scalar path is not hit (which does not add up). I am planning to merge this soon unless anyone has comments on my test additions.

@mattip
Copy link
Member

mattip commented Jul 7, 2022

Does the change affect benchmarks or is the DIVIDEBYZERO_OVERFLOW_CHECK not as expensive as it looks?

@seberg
Copy link
Member

seberg commented Jul 7, 2022

Hmmm, good point, it is a necessary fix for remainder (and maybe divmod) to fix a bug. The scalar paths, I would like the overflow warning, but if adding these 1-2 integer ops has noticeable slowdown, we could try removing the check (and warning) from the the other array paths. We don't typically give overflow warnings anyway.

@charris charris modified the milestones: 1.23.1 release, 1.23.2 Jul 8, 2022
@ganesh-k13
Copy link
Member Author

Thanks a lot for the tests @seberg and @rafaelcfsousa! Let me check the benchmarks to see if we have any slowdown.

@ganesh-k13
Copy link
Member Author

ganesh-k13 commented Jul 9, 2022

Not sure if this covers what we need?

bench_ufunc.UFunc
(np-test) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/os/numpy (bug_div_overflow) » python3 runtests.py --bench-compare main bench_ufunc.UFunc                                                                                                                130 ↵ ganesh@ganesh-MS-7B86
· Creating environments
· Discovering benchmarks
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For numpy commit b89a1d21 <main> (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[  0.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[  0.00%] ··· Importing benchmark suite produced output:
[  0.00%] ···· NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
[  4.17%] ··· Running (bench_ufunc.UFunc.time_ufunc_types--)......
[ 25.00%] · For numpy commit f918491c <bug_div_overflow> (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 25.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 29.17%] ··· Running (bench_ufunc.UFunc.time_ufunc_types--)......
[ 50.00%] · For numpy commit f918491c <bug_div_overflow> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 54.17%] ··· bench_ufunc.UFunc.time_ufunc_types                                                                                                                                                                                    ok
[ 54.17%] ··· =============== =============
                   ufunc
              --------------- -------------
                    abs          588±2μs
                  absolute       588±2μs
                    add          254±2μs
                   arccos      5.79±0.01ms
                  arccosh      5.04±0.01ms
                   arcsin      5.86±0.02ms
                  arcsinh      4.80±0.01ms
                   arctan      2.64±0.01ms
                  arctan2      1.43±0.01ms
                  arctanh      3.28±0.01ms
                bitwise_and     16.5±0.5μs
                bitwise_not    10.9±0.05μs
                 bitwise_or     16.9±0.6μs
                bitwise_xor     17.4±0.1μs
                    cbrt         1.31±0ms
                    ceil        133±0.2μs
                    conj        132±0.6μs
                 conjugate       134±2μs
                  copysign      146±0.3μs
                    cos        7.03±0.01ms
                    cosh       5.94±0.01ms
                  deg2rad        184±9μs
                  degrees        196±10μs
                   divide        442±2μs
                   divmod        733±10μs
                   equal         192±2μs
                    exp          4.35±0ms
                    exp2       4.41±0.02ms
                   expm1       7.27±0.03ms
                    fabs         188±6μs
                float_power    9.93±0.01ms
                   floor        136±0.4μs
                floor_divide     615±1μs
                    fmax         287±1μs
                    fmin         287±3μs
                    fmod         423±2μs
                   frexp        273±0.4μs
                    gcd         113±0.5μs
                  greater       219±0.6μs
               greater_equal     224±5μs
                 heaviside       273±5μs
                   hypot         610±2μs
                   invert       11.7±0.1μs
                  isfinite      116±0.2μs
                   isinf        106±0.8μs
                   isnan        96.7±0.4μs
                   isnat         262±2ns
                    lcm         212±0.09μs
                   ldexp        153±0.6μs
                 left_shift     77.5±0.1μs
                    less         219±2μs
                 less_equal      223±2μs
                    log          2.39±0ms
                   log10       2.71±0.01ms
                   log1p       2.77±0.02ms
                    log2       2.54±0.01ms
                 logaddexp       247±7μs
                 logaddexp2      259±5μs
                logical_and     221±0.5μs
                logical_not     129±0.4μs
                 logical_or     222±0.5μs
                logical_xor     241±0.8μs
                   matmul       27.3±0.1ms
                  maximum       275±0.5μs
                  minimum        275±2μs
                    mod          513±1μs
                    modf         351±5μs
                  multiply      252±0.5μs
                  negative       152±3μs
                 nextafter      309±0.2μs
                 not_equal      203±0.5μs
                  positive       164±2μs
                   power       10.1±0.01ms
                  rad2deg        194±8μs
                  radians        199±1μs
                 reciprocal      344±2μs
                 remainder       513±1μs
                right_shift     78.6±0.8μs
                    rint         235±2μs
                    sign         182±3μs
                  signbit       106±0.3μs
                    sin        6.89±0.01ms
                    sinh         6.84±0ms
                  spacing       307±0.9μs
                    sqrt       1.28±0.01ms
                   square        184±1μs
                  subtract       261±3μs
                    tan        7.49±0.01ms
                    tanh       5.63±0.03ms
                true_divide      447±2μs
                   trunc         146±8μs
              =============== =============

[ 58.33%] ··· bench_ufunc.UFuncSmall.time_ufunc_numpy_scalar                                                                                                                                                                        ok
[ 58.33%] ··· ======= ==========
               ufunc
              ------- ----------
                abs    817±10ns
                sqrt   818±10ns
                cos    817±9ns
              ======= ==========

[ 62.50%] ··· bench_ufunc.UFuncSmall.time_ufunc_python_float                                                                                                                                                                        ok
[ 62.50%] ··· ======= =========
               ufunc
              ------- ---------
                abs    795±7ns
                sqrt   795±8ns
                cos    796±8ns
              ======= =========

[ 66.67%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array                                                                                                                                                                         ok
[ 66.67%] ··· ======= ==========
               ufunc
              ------- ----------
                abs    533±3ns
                sqrt   523±2ns
                cos    562±10ns
              ======= ==========

[ 70.83%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array_inplace                                                                                                                                                                 ok
[ 70.83%] ··· ======= =========
               ufunc
              ------- ---------
                abs    462±9ns
                sqrt   452±7ns
                cos    473±3ns
              ======= =========

[ 75.00%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_int_array                                                                                                                                                                     ok
[ 75.00%] ··· ======= =========
               ufunc
              ------- ---------
                abs    506±8ns
                sqrt   960±8ns
                cos    967±9ns
              ======= =========

[ 75.00%] · For numpy commit b89a1d21 <main> (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 75.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 79.17%] ··· bench_ufunc.UFunc.time_ufunc_types                                                                                                                                                                                    ok
[ 79.17%] ··· =============== =============
                   ufunc
              --------------- -------------
                    abs          580±1μs
                  absolute       583±2μs
                    add          255±1μs
                   arccos      5.69±0.02ms
                  arccosh      4.99±0.04ms
                   arcsin      5.76±0.01ms
                  arcsinh      4.83±0.02ms
                   arctan      2.60±0.01ms
                  arctan2        1.41±0ms
                  arctanh      3.28±0.01ms
                bitwise_and    15.9±0.03μs
                bitwise_not    10.8±0.05μs
                 bitwise_or     16.2±0.2μs
                bitwise_xor     17.4±0.5μs
                    cbrt       1.31±0.01ms
                    ceil        136±0.8μs
                    conj        136±0.9μs
                 conjugate      135±0.8μs
                  copysign       149±1μs
                    cos        7.86±0.01ms
                    cosh       6.10±0.01ms
                  deg2rad       182±0.3μs
                  degrees        183±1μs
                   divide        444±2μs
                   divmod        725±4μs
                   equal        192±0.1μs
                    exp        4.33±0.01ms
                    exp2       4.40±0.03ms
                   expm1       7.18±0.05ms
                    fabs        175±0.4μs
                float_power    9.72±0.01ms
                   floor        136±0.2μs
                floor_divide    609±0.4μs
                    fmax         286±5μs
                    fmin         285±1μs
                    fmod        424±0.9μs
                   frexp        274±0.2μs
                    gcd         112±0.1μs
                  greater       221±0.5μs
               greater_equal     220±1μs
                 heaviside       280±2μs
                   hypot         612±1μs
                   invert       10.8±0.3μs
                  isfinite       109±1μs
                   isinf         108±2μs
                   isnan        98.7±0.2μs
                   isnat         270±2ns
                    lcm          212±1μs
                   ldexp         151±1μs
                 left_shift    76.8±0.06μs
                    less         218±1μs
                 less_equal      222±3μs
                    log        2.38±0.01ms
                   log10       2.72±0.02ms
                   log1p       2.76±0.02ms
                    log2         2.51±0ms
                 logaddexp      252±0.5μs
                 logaddexp2     253±0.7μs
                logical_and     219±0.9μs
                logical_not     127±0.6μs
                 logical_or     216±0.2μs
                logical_xor      237±1μs
                   matmul      27.9±0.06ms
                  maximum       270±0.3μs
                  minimum        273±2μs
                    mod          516±2μs
                    modf         348±1μs
                  multiply       256±3μs
                  negative       152±1μs
                 nextafter      322±0.9μs
                 not_equal       202±1μs
                  positive       174±3μs
                   power         10.0±0ms
                  rad2deg       183±0.9μs
                  radians       183±0.9μs
                 reciprocal      346±3μs
                 remainder       515±2μs
                right_shift     76.3±0.9μs
                    rint        233±0.7μs
                    sign         176±1μs
                  signbit       109±0.4μs
                    sin          7.02±0ms
                    sinh         6.34±0ms
                  spacing        305±1μs
                    sqrt       1.25±0.01ms
                   square       179±0.8μs
                  subtract       255±2μs
                    tan          7.23±0ms
                    tanh       5.67±0.04ms
                true_divide      443±2μs
                   trunc         144±8μs
              =============== =============

[ 83.33%] ··· bench_ufunc.UFuncSmall.time_ufunc_numpy_scalar                                                                                                                                                                        ok
[ 83.33%] ··· ======= =========
               ufunc
              ------- ---------
                abs    812±9ns
                sqrt   809±8ns
                cos    804±7ns
              ======= =========

[ 87.50%] ··· bench_ufunc.UFuncSmall.time_ufunc_python_float                                                                                                                                                                        ok
[ 87.50%] ··· ======= =========
               ufunc
              ------- ---------
                abs    791±6ns
                sqrt   771±6ns
                cos    779±3ns
              ======= =========

[ 91.67%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array                                                                                                                                                                         ok
[ 91.67%] ··· ======= =========
               ufunc
              ------- ---------
                abs    525±4ns
                sqrt   516±4ns
                cos    556±3ns
              ======= =========

[ 95.83%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_array_inplace                                                                                                                                                                 ok
[ 95.83%] ··· ======= =========
               ufunc
              ------- ---------
                abs    469±3ns
                sqrt   451±3ns
                cos    467±4ns
              ======= =========

[100.00%] ··· bench_ufunc.UFuncSmall.time_ufunc_small_int_array                                                                                                                                                                     ok
[100.00%] ··· ======= ==========
               ufunc
              ------- ----------
                abs    503±4ns
                sqrt   950±4ns
                cos    964±20ns
              ======= ==========

       before           after         ratio
     [b89a1d21]       [f918491c]
     <main>           <bug_div_overflow>
+       183±0.9μs          199±1μs     1.09  bench_ufunc.UFunc.time_ufunc_types('radians')
+        6.34±0ms         6.84±0ms     1.08  bench_ufunc.UFunc.time_ufunc_types('sinh')
+       175±0.4μs          188±6μs     1.07  bench_ufunc.UFunc.time_ufunc_types('fabs')
+         109±1μs        116±0.2μs     1.06  bench_ufunc.UFunc.time_ufunc_types('isfinite')
-         174±3μs          164±2μs     0.94  bench_ufunc.UFunc.time_ufunc_types('positive')
-     7.86±0.01ms      7.03±0.01ms     0.89  bench_ufunc.UFunc.time_ufunc_types('cos')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@mattip
Copy link
Member

mattip commented Jul 9, 2022

It seems the benchmark changes to the division ufuncs are not significant.

@seberg
Copy link
Member

seberg commented Jul 9, 2022

Having a quick look, I am not sure the benchmark actually check the integer functions well (focusing on float64).

@ganesh-k13
Copy link
Member Author

ganesh-k13 commented Jul 10, 2022

I thought of adding an integer path. But man is it complicated to understand the setup :).

I wrote a custom piece: [EDIT] Added dtype=dtype

 class UFuncInt(Benchmark):
     params = [np.sctypes['int']]
     param_names = ['dtype']
     timeout = 10
 
     def setup(self, dtype):
         np.seterr(all='ignore')
         low, high = np.iinfo(dtype).min, np.iinfo(dtype).max
         self.arrays = [np.random.randint(low, high, size=i, dtype=dtype) for i in range(1, 10000)]
         self.divisor = np.array([np.random.randint(low, high)], dtype=dtype)
 
     def time_ufunc_int_divide(self, dtype):
         for a in self.arrays:
             np.floor_divide(a, self.divisor)
             np.divmod(a, self.divisor)
             np.fmod(a, self.divisor)

and got:

~/os/numpy (bug_div_overflow*) » python3 runtests.py --bench-compare main bench_ufunc.UFuncInt                                                                              ganesh@ganesh-MS-7B86
********************************************************************************
WARNING: you have uncommitted changes --- these will NOT be benchmarked!
********************************************************************************
· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.10-Cython-setuptools59.2.0
·· Installing f918491c <bug_div_overflow> into virtualenv-py3.10-Cython-setuptools59.2.0.
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For numpy commit b89a1d21 <main> (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[  0.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[  0.00%] ··· Importing benchmark suite produced output:
[  0.00%] ···· NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
[ 25.00%] ··· Running (bench_ufunc.UFuncInt.time_ufunc_int_divide--).
[ 25.00%] · For numpy commit f918491c <bug_div_overflow> (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 25.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 50.00%] ··· Running (bench_ufunc.UFuncInt.time_ufunc_int_divide--).
[ 50.00%] · For numpy commit f918491c <bug_div_overflow> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[ 75.00%] ··· bench_ufunc.UFuncInt.time_ufunc_int_divide                                                                                                                                        ok
[ 75.00%] ··· ============= =========
                  dtype              
              ------------- ---------
                numpy.int8   509±7ms 
               numpy.int16   511±5ms 
               numpy.int32   496±3ms 
               numpy.int64   481±5ms 
              ============= =========

[ 75.00%] · For numpy commit b89a1d21 <main> (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.10-Cython-setuptools59.2.0.
[ 75.00%] ·· Benchmarking virtualenv-py3.10-Cython-setuptools59.2.0
[100.00%] ··· bench_ufunc.UFuncInt.time_ufunc_int_divide                                                                                                                                        ok
[100.00%] ··· ============= =========
                  dtype              
              ------------- ---------
                numpy.int8   495±3ms 
               numpy.int16   498±4ms 
               numpy.int32   493±4ms 
               numpy.int64   481±3ms 
              ============= =========


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

But this seems definitely wrong though yeah? There has to be some sort of impact? I changed low to 1 in order to avoid potential 0's that may short the operations, but no difference.

@seberg
Copy link
Member

seberg commented Jul 10, 2022

Probably best to split up the test by function and then limit it to a single run with a large array (10000 elements should be fine). Small arrays are not too interesting (from a dtype perspective), since ufunc overhead will dominate those.

Do we have special paths for contiguous arrays and contig+scalar at all (except for VSX4?). If we do not have those, maybe the code is just not optimized enough that a difference can show up.
But the difference could just be very small. Division is a very slow instruction and the added check is a single instruction in most cases.

@ganesh-k13
Copy link
Member Author

ganesh-k13 commented Jul 10, 2022

Probably best to split up the test by function and then limit it to a single run with a large array (10000 elements should be fine).

I tried this:

 class UFuncInt(Benchmark):
     params = [np.sctypes['int']]
     param_names = ['dtype']
     timeout = 10
 
     def setup(self, dtype):
         np.seterr(all='ignore')
         low, high = np.iinfo(dtype).min, np.iinfo(dtype).max
         self.array = np.random.randint(1, high, size=10000, dtype=dtype)
         self.divisor = np.array([np.random.randint(1, high)], dtype=dtype)
 
     def time_ufunc_int_divide(self, dtype):
         np.floor_divide(self.array, self.divisor)
 
     def time_ufunc_int_divmod(self, dtype):
         np.divmod(self.array, self.divisor)
 
     def time_ufunc_int_fmod(self, dtype):
         np.fmod(self.array, self.divisor)

And got the same BENCHMARKS NOT SIGNIFICANTLY CHANGED.

But the difference could just be very small. Division is a very slow instruction and the added check is a single instruction in most cases.

Yeah, I think this is a very good explanation. During our SIMD migration from libdivide(#18766), we implemented this paper. Now this gets us to about at least 20 instruction miminimum, barring other overheads. Now adding this if should not tilt the ratio a lot I guess.

@mattip
Copy link
Member

mattip commented Aug 10, 2022

Does this PR need more review?

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Aug 13, 2022
@ganesh-k13
Copy link
Member Author

Hey @seberg / @mattip, do let me know if any additional things to take care here, post-merge, I'll work on documenting all of this.

@charris charris added the triage review Issue/PR to be discussed at the next triage meeting label Sep 7, 2022
@seberg
Copy link
Member

seberg commented Sep 8, 2022

Thanks Ganesh, sorry for keeping this hanging so long. I just browesed through once more, but there was not anything to be done beyond checking that we don't have a big performance degradation.

We should try to backport this and the related PR, and we discussed yesterday, that we should not worry about the tiny changes in behavior since this is extremely niche (niche enough that the bug was not noticed in practice).

I will try to create backport PR with both very soon (maybe its all just applying this anyway). We should ping remember to ping @rafaelcfsousa on the backport as well.

@seberg seberg changed the title BUG: Division overflow BUG: Better report integer division overflow Sep 8, 2022
@seberg seberg merged commit e9295bf into numpy:main Sep 8, 2022
@ganesh-k13
Copy link
Member Author

Thanks a lot @seberg! I'll raise a backport PR right away, no worries.

@seberg
Copy link
Member

seberg commented Sep 8, 2022

Thanks, if you have time for it that is great. But do not feel compelled!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug triage review Issue/PR to be discussed at the next triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants