Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abs() is slow for complex, add abs2() #3994

Open
Nodd opened this issue Oct 29, 2013 · 19 comments
Open

abs() is slow for complex, add abs2() #3994

Nodd opened this issue Oct 29, 2013 · 19 comments

Comments

@Nodd
Copy link
Contributor

Nodd commented Oct 29, 2013

See the following code (run on Ubuntu 64bits):

In [1]: import numpy as np

In [2]: b = np.random.rand(500, 500) + 1j * np.random.rand(500, 500)

In [3]: %timeit np.sqrt(b.real**2 + b.imag**2)
100 loops, best of 3: 4.15 ms per loop

In [4]: %timeit np.abs(b)
100 loops, best of 3: 6.06 ms per loop

abs() is slow compared to the manual formula. Is there any drawback in using np.sqrt(b.real**2 + b.imag**2) (like possible overflow ?)

Also, it could be useful to add an abs2() function which would return the value of abs()**2 but should be really faster for complex:

In [5]: %timeit b.real**2 + b.imag**2
1000 loops, best of 3: 1.4 ms per loop

It is commonly used to compute the energy of a signal or to sort complex numbers by norm value, for example.

@rkern
Copy link
Member

rkern commented Oct 29, 2013

I get the opposite results (numpy 1.7.1, MacBook Pro):

[~]
|5> %timeit np.sqrt(b.real**2 + b.imag**2)
100 loops, best of 3: 2.38 ms per loop

[~]
|6> %timeit np.abs(b)
1000 loops, best of 3: 1.56 ms per loop

[~]
|9> %timeit b.real**2 + b.imag**2
1000 loops, best of 3: 1.71 ms per loop

np.abs() is using the efficient hypot() function underneath and should be the most efficient way to get this result. I don't know why it is slower on your machine. Yes, overflow is a concern with the explicit formula, but hypot() should also just be faster and use less temporary memory.

@juliantaylor
Copy link
Contributor

hypot is not efficient compared to the the naive approach, it must use a different slower algorithm in order to avoid overflows and underflows, see the fallback implementation in npy_math.c, glibcs implementation is even slower as it has a rounding error of 1 ulp.

that it is faster on mac os I can only explain by that their implementation does not care about the correctness of the result and just uses the naive approach.
Or it is very well optimized, which I kind of doubt for such a niche function, can someone disassemble it?

@ewmoore
Copy link
Contributor

ewmoore commented Oct 29, 2013

I'm not so sure that hypot should necessarily be slower. The naive aproach is looping over the array 3 times and calling pow twice. There is certainly similar checks for inf and nan etc in pow to what is found in hypot. If speed is desired, you probably want 'b.real*b.real' anyway, even if small integer powers are special cases in pow.

@juliantaylor
Copy link
Contributor

the naive approach calls square, not pow, also the square root and addition are vectorized in numpy 1.8.

@rkern
Copy link
Member

rkern commented Oct 29, 2013

I won't copy the assembly here (the object code is in /usr/lib/system/libsystem_m.dylib on 10.8 if anyone wants to look), but it looks like it does a few checks, but the main path multiplies each argument by itself, adds them together, and then sqrts them, as in the "naive approach" as I have called it.

I would not be averse to a fast_hypot() ufunc that does not care about overflow.

@juliantaylor
Copy link
Contributor

just for reference the jist of the glibc implementation is:

 *      1. if x > 2y  use
 *              x1*x1+(y*y+(x2*(x+x1))) for x*x+y*y
 *      where x1 = x with lower 32 bits cleared, x2 = x-x1; else
 *      2. if x <= 2y use
 *              t1*y1+((x-y)*(x-y)+(t1*y2+t2*y))
 *      where t1 = 2x with lower 32 bits cleared, t2 = 2x-t1,
 *      y1= y with lower 32 bits chopped, y2 = y-y1.

in sysdeps/ieee754/dbl-64/e_hypot.c

maybe a sort of precision state would be useful. e.g. on np.precisionstate("fast") numpy chooses faster but not always correct functions. Could also be useful for complex division.

@Nodd
Copy link
Contributor Author

Nodd commented Oct 29, 2013

I checked for hypot too but forgot to add it. It was roughly the same time as abs. I did not see any notable difference between x**2 and x*x.

@rkern
Copy link
Member

rkern commented Oct 29, 2013

Yeah, I think the difference between your timings and mine is that OS X's implementation of hypot(x,y) is the less-safe sqrt(x*x+y*y), so naturally the ufunc that wraps that will win over doing it in Python with full temporary arrays. On your platform, hypot() has a more involved implementation with more arithmetic to avoid overflow.

@itdaniher
Copy link

In [1]: import numpy as np

In [2]: b = np.random.rand(500, 500) + 1j * np.random.rand(500, 500)

In [3]: %timeit np.sqrt(b.real**2 + b.imag**2)
10 loops, best of 3: 58.2 ms per loop

In [4]: %timeit np.abs(b)
10 loops, best of 3: 121 ms per loop

similar results as OP on Python3.5.2 / Linux 4.10.0 / ARMv7

this issue still seems relevant.

@dlindelof
Copy link

Hi,

As I mentioned in #19321 I have currently some bandwidth and would like to contribute to Numpy. Is this issue something that needs to be done?

@lostanlen
Copy link

lostanlen commented Jun 26, 2021

Hello @dlindelof ,
Please see #13179
According to @/rkern this abs2 ufunc should belong to scipy.special, not numpy.emath

@dlindelof
Copy link

Ok I'll give it a shot then, thanks.

@mhvk
Copy link
Contributor

mhvk commented Jun 26, 2021

Note that discussion has not quite converged, but the code would be the same whether it is in numpy or scipy.special anyway. It is probably a function where clever use of vectorization could have fairly large impact.

@zerothi
Copy link
Contributor

zerothi commented Dec 15, 2021

I would like this to get attention again.

I.e. the abs2 function is extremely heavily used and I think it really deserves to be present in the numpy eco-system.

What are the thoughts here?

@rkern
Copy link
Member

rkern commented Dec 15, 2021

Same as in #13179, more or less. If you make a PR adding abs2 to scipy.special, I am confident it will be approved quite quickly.

To get it into numpy proper, we will probably want you to convince the Python Array API standard to add it (see here for making proposals). We'll add any ufuncs in that standard and will be very unlikely to add more ufuncs not in that standard.

@dlindelof
Copy link

I'd like to take a shot at implementing this--I began work on this in June that was put on hold, I'll let you know when I have something that works in scipy.special.

@mhvk
Copy link
Contributor

mhvk commented Dec 15, 2021

For the array API standard, see also data-apis/array-api#153 and maybe especially the discussion following data-apis/array-api#153 (comment) - complex values are just used infrequently enough to somewhat slip through the cracks.

Though adding to #13179 may be useful too - others may be missing different functions than I was!

dlindelof pushed a commit to dlindelof/scipy that referenced this issue Jan 9, 2022
Implement the absolute square function in scipy.special as suggested
in numpy/numpy#3994. See also
numpy/numpy#13179.
@dlindelof
Copy link

I've implemented a first version of abs2() in scipy/scipy#15390.

@kgryte
Copy link

kgryte commented Jun 23, 2022

A proposal has been made for adding abs2 to the array API specification. Were this proposal to be accepted, this is likely to affect the outcome of this issue and scipy/scipy#15390.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests