In [1]:
import numpy
import math

Let's define a function that operates on two inputs

In [2]:
def trig(a, b):
    return math.sin(a**2) * math.exp(b)

In [3]:
trig(1, 1)

2.2873552871788423

Seems reasonable.  However, the `math` library only works on scalars.  If we try to pass in arrays, we'll get an error.

In [4]:
a = numpy.ones((5,5))
b = numpy.ones((5,5))

In [5]:
trig(a, b)

TypeError: only length-1 arrays can be converted to Python scalars

In [2]:
from numba import vectorize

In [8]:
vec_trig = vectorize()(trig)

In [9]:
vec_trig(a, b)

array([[ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529],
       [ 2.28735529,  2.28735529,  2.28735529,  2.28735529,  2.28735529]])

And just like that, the scalar function `trig` is now a NumPy `ufunc` called `vec_trig`

Note that this is a "Dynamic UFunc" with no signature given.  

How does it compare to just using NumPy?  Let's check

In [22]:
def numpy_trig(a, b):
    return numpy.sin(a**2) * numpy.exp(b)

In [23]:
a = numpy.random.random((5000, 5000))
b = numpy.random.random((5000, 5000))

In [24]:
%%timeit
numpy_trig(a, b)

1 loop, best of 3: 809 ms per loop


In [25]:
%%timeit
vec_trig(a, b)

1 loop, best of 3: 765 ms per loop


What happens if we do specify a signature?  Is there a speed boost?

In [32]:
vec_trig = vectorize(["float64(float64, float64)"])(trig)

In [33]:
%%timeit
vec_trig(a, b)

1 loop, best of 3: 775 ms per loop


No, not really.  But(!), if we have a signature, then we can add the target `kwarg`.

In [34]:
vec_trig = vectorize(["float64(float64, float64)"], target='parallel')(trig)

In [35]:
%%timeit
vec_trig(a, b)

1 loop, best of 3: 179 ms per loop


# Clipping an array

In [13]:
def truncate(a, amin, amax):
    if a < amin:
        a = amin
    elif a > amax:
        a = amax
    return a

In [15]:
vec_truncate_serial = vectorize(['float64(float64, float64, float64)'])(truncate)
vec_truncate_par = vectorize(['float64(float64, float64, float64)'], target='parallel')(truncate)

In [26]:
a = numpy.random.random((5000))

In [27]:
amin = .2
amax = .6

In [28]:
%%timeit
vec_truncate_serial(a, amin, amax)

The slowest run took 26.45 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.17 µs per loop


In [29]:
%%timeit
vec_truncate_par(a, amin, amax)

The slowest run took 312.11 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 109 µs per loop


In [30]:
a = numpy.random.random((50000))

In [31]:
%%timeit
vec_truncate_serial(a, amin, amax)

1000 loops, best of 3: 251 µs per loop


In [32]:
%%timeit
vec_truncate_par(a, amin, amax)

The slowest run took 248.91 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 136 µs per loop
