not helping the np.sign/np.abs/np.power #4190

isaac-you · 2019-06-17T05:38:55Z

from numba import njit
import numpy as np

def signedpower(x,a):

	'''
	x:the vector 
	a: the power 
	'''
	signedV = np.sign(x)	#signe vector: 1 -1 1 -1 ...
	absV = np.abs(x)
	powerV = np.power(absV,a)
	return signedV * powerV

signedpower_ = njit(signedpower)

v1 = np.arange(-500,500)

%timeit signedpower(v1,10)   
8.53 µs ± 28.7 ns per loop
%timeit signedpower_(v1,10)
75.6 µs ± 1.14 µs per loop

njit actually slowing down the function

stuartarchibald · 2019-06-17T11:26:41Z

NOTE: edited code to include imports and updated markdown.

stuartarchibald · 2019-06-17T11:49:19Z

Thanks for the report, I can reproduce. I think that what is observed is due to a number of issues including:

There's not a huge amount of data or work in the function, so dispatch will have some cost.
Numba dosen't by default do multi-statement shortcut deforestation, so each of those ufunc calls is creating new memory. See Performance hit with local temporary variables #3980 for discussion.
Anaconda NumPy ufuncs are heavily optimised and backed by Intel MLK VML library, which is very fast.

However, I'm going to mark this as a bug needing more investigation because even throwing all the optimisations at the function, including developer-only ones, NumPy is still winning by a suspicious amount:

from numba import njit
import numpy as np
from IPython import get_ipython
ipython = get_ipython()

from numba import parfor
parfor.sequential_parfor_lowering=True

@njit(error_model="numpy", parallel=True, fastmath=True)
def signedpower(x,a):

        '''
        x:the vector
        a: the power
        '''
        signedV = np.sign(x)    #signe vector: 1 -1 1 -1 ...
        absV = np.abs(x)
        powerV = np.power(absV, a)
        ret = signedV * powerV
        return ret

v1 = np.arange(-50000, 50000)
p = 10
signedpower(v1, p)

print("numpy : %s" % ipython.magic("timeit -o -q signedpower.py_func(v1, p)").best)
print("numba : %s" % ipython.magic("timeit -o -q signedpower(v1, p)").best)


signedpower.parallel_diagnostics(level=3)

gives:

numpy : 0.0029059834200052138
numba : 0.01115928922999955

If however, the dtype of v1 is changed to float64 Numba wins by a considerable amount:

numpy : 0.012640262770000845
numba : 0.009875944289997279

and that it is faster than the int64 variant suggests there is something perhaps unusual going on.

stuartarchibald added bug performance performance related issue labels Jun 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

not helping the np.sign/np.abs/np.power #4190

not helping the np.sign/np.abs/np.power #4190

isaac-you commented Jun 17, 2019 •

edited by stuartarchibald

stuartarchibald commented Jun 17, 2019

stuartarchibald commented Jun 17, 2019

not helping the np.sign/np.abs/np.power #4190

not helping the np.sign/np.abs/np.power #4190

Comments

isaac-you commented Jun 17, 2019 • edited by stuartarchibald

stuartarchibald commented Jun 17, 2019

stuartarchibald commented Jun 17, 2019

isaac-you commented Jun 17, 2019 •

edited by stuartarchibald