# Exercise 1.2 - The vectorize decorator

## Our first vectorized function

Let's define an elementwise function for computing the relative difference of two numbers:

In [None]:
from numba import vectorize

@vectorize
def rel_diff(x, y):
    return 2 * (x - y) / (x + y)

We have written our `rel_diff` function in terms of a scalar, but because we have used the vectorize decorator, we can apply the function to an array. For example:

In [5]:
import numpy as np

a = np.arange(10, dtype=np.float64)
b = a * 2 + 1

diff = rel_diff(a, b)
print(diff)

[-2.         -1.         -0.85714286 -0.8        -0.76923077 -0.75
 -0.73684211 -0.72727273 -0.72       -0.71428571]


We can see from the output that the function has been applied element-wise.

Inspect the dtype of the result:

In [6]:
diff.dtype

dtype('float64')

This dtype was inferred based on the dtypes of the input arrays.

## Numba’s choice of types

Try running the following code:

In [7]:
@vectorize
def prod(x, y):
    return x * y

a_int = np.arange(1000, dtype=np.int32)
b_int = a_int * 2

result = prod(a_int, b_int)

result.dtype

dtype('int64')

This time, the dtype of the result is not the same as the input array’s dtype! Instead, Numba has decided that the output array should be of dtype int64 - this is because the multiplication of two int32 values could overflow an int32 output, and Numba tries to avoid this scenario.

To continue the example, try calling the function on some arrays of other types:

In [8]:
a_f64 = np.arange(1000, dtype=np.float64)
b_f64 = a_f64 * 2
result2a = prod(a_f64, b_f64)

a_f32 = np.arange(1000, dtype=np.float32)
b_f32 = a_f32 * 2
result2b = prod(a_f32, b_f32)

Now let's check the dtypes of these two results:

In [9]:
result2a.dtype

dtype('float64')

In [10]:
result2b.dtype

dtype('float64')

The dtypes of both results are the same. However, in this case, it is because the vectorized function was first executed on arguments with `float64` dtypes, which created a compiled version of the `prod` function with the signature `float64(float64, float64)`.

When the function is called with arguments with a dtype of `float32`, the version that takes `float64` arrays is seen by Numba, and it will cast the `float32` inputs to `float64` then make use of the previously-compiled code.

In order to see what types a vectorized function has been compiled for, you can use its `ufunc.types` member:

In [11]:
prod.ufunc.types

['ii->l', 'dd->d']

In the output notation, `i` means `int32`, `l` means `int64`, and `d` means `float64`. For example, the notation `ii->l` describes the mapping of two `int32` values to an `int64` value. You should see a version that accepts `int32` parameters and another version that accepts `float64` parameters.

## Ensuring the use of specific types

Although it is often convenient and acceptable to allow Numba to determine the signatures for vectorized functions, it is sometimes desirable to specify the types exactly in order to avoid situations such as those above occurring.

The vectorize decorator will accept a list of signatures, in order of precedence. Continuing the above example:

In [13]:
from numba import int32, int64, float32, float64

@vectorize([int32(int32, int32),
            int64(int64, int64),
            float32(float32, float32),
            float64(float64, float64)])
def prod2(x, y):
    return x * y

Numba will check each signature in order to see if it matches the arguments to a function call. It is important that most-specific types are higher up in the list than less specific types. For example, `float32` must be before `float64` because `float64` will always match `float32`.

Verify that the `prod2` function behaves as you expect by calling it with arguments that are of the `int32` and `float32` dtypes:

In [14]:
a_i32 = np.arange(1000, dtype=np.int32)
b_i32 = np.arange(1000, dtype=np.int32)
prod2(a_i32, b_i32).dtype

dtype('int32')

In [16]:
a_f32 = np.arange(1000, dtype=np.float32)
b_f32 = np.arange(1000, dtype=np.float32)
prod2(a_f32, b_f32).dtype

dtype('float32')

## Ufunc methods

Why not just write functions using the jit decorator that include a for-loop over the input arguments? One answer to this question is that creating ufuncs also provides additional methods with no extra work. The following code demonstrates the reduce and accumulate functions:

In [17]:
a = np.arange(12).reshape(3,4)
prod2.reduce(a, axis=0)

array([  0,  45, 120, 231])

In [18]:
prod2.reduce(a, axis=1)

array([   0,  840, 7920])

In [19]:
prod2.accumulate(a)

array([[  0,   1,   2,   3],
       [  0,   5,  12,  21],
       [  0,  45, 120, 231]])

If you are not familiar with ufuncs, run the above code and examine the output to understand what is happening in these functions. For full documentation of the ufunc methods, refer to http://docs.scipy.org/doc/numpy/reference/ufuncs.html#methods

## Ufunc performance

Why not combine Numpy’s built-in ufuncs in order to achieve the same result as using the vectorize decorator on a function? In many cases (especially those with complex behaviour or control flow) it is much easier to write a scalar function than to try to write the entire computation as array-wide expressions.

For those cases where it is straightforward to write array-wide expressions, the performance of vectorize-decorated function soon overtakes the pure Numpy implementations. Try running the following code:

In [20]:
x = np.arange(100000, dtype=np.float64) + 1
y = np.arange(100000, dtype=np.float64) + 1.1

%timeit 2 * (x - y) / (x + y)

1000 loops, best of 3: 770 µs per loop


In [21]:
%timeit rel_diff(x, y)

10000 loops, best of 3: 131 µs per loop


The results will vary depending on your machine, but on one machine execution times were 2.1ms for Numpy and 1.25ms for the Numba-compiled function. There are two main reasons for this:

- Memory allocation: Numpy will create a temporary array for each intermediate step in the computation
- Cache thrashing: because each operation is performed on a whole temporary array at a time, data is repeatedly evicted-from and reloaded into the CPU cache, which reduces performance.

The Numba-compiled function performs all computations on a single element at a time, which sidesteps these issues.

# Summary

- The vectorize decorator is used to turn a scalar function into one which can be applied to all elements of an array.
- Type inference is performed on the arguments to determine the output types.
- However, a version previously-compiled may be used if the arguments can be coerced into a suitable type.
- In order to control the coercion and any casting, input and output types can be specified as arguments to the vectorize decorator.
- Vectorized functions also get additional methods, such as `reduce`, for "free".
- Performance of a vectorized function is generally higher than writing the equivalent array expression.