# Computation on NumPy Arrays: Universal Functions

Computation on NumPy arrays can be very fast, or it can be very slow

The key to making it fast is to use *vectorized* operations, generally implemented through NumPy's __universal functions (ufuncs).__

Function in Python

In [None]:
def greet(name):
  print("Hello " + name +"!");

greet("Hongbo")

Hello Hongbo!


In [None]:
def my_function(x):
  return x * 2

%timeit my_function(10)

The slowest run took 5.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1.32 µs ± 751 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


# The Slowness of Loops

Python is a dynamic and interpreted language meaning sequences of operations cannot be compiled down to efficient machine code.

Slugishness manifests itself in situations where many small operations are being repeated

- e.g looping over arrays to operate on each element.

In [1]:
import numpy as np
np.random.seed(0)

def compute_reciprocals(values):

    output = np.empty(len(values))

    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

values = np.random.randint(1, 10, size=5)
print(values)
haha = np.empty(len(values))
print(haha)
compute_reciprocals(values)

[6 1 4 4 8]
[1. 1. 1. 1. 1.]


array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [5]:
np.empty?

In [6]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

2.02 s ± 389 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Seems slow for 1 million operations...

Type checking and function dispatches is the culprit.

If we were working with compiled code the type wouldn't have to be checked as rigorously for each item, so computation could be more efficient...

# Introducing UFuncs

###Universal function
###A ufunc, or universal function, is a NumPy function that operates on ndarrays in an element-by-element fashion.

NumPy provides a convenient interface to statically typed, compiled routines

- Known as *vectorized* operation
- Operation applied to the array, which in turn is applied to *each element*.
- Pushes loop into compiled layer underlying NumPy, making execution faster.

In [3]:
print(compute_reciprocals(values))
print(1.0 / values)

[0.16666667 1.         0.25       0.25       0.125     ]
[0.16666667 1.         0.25       0.25       0.125     ]


In [4]:
%timeit (1.0 / big_array)

2.34 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [9]:
# ufuncs can operate between two arrays
print(np.arange(5))
print(np.arange(1, 6))
np.arange(5) / np.arange(1, 6)

[0 1 2 3 4]
[1 2 3 4 5]


array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

In [10]:
x = np.arange(9).reshape((3, 3))
# ufuncs can be applied to multidimensional arrays
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

# Exploring NumPy's UFuncs

Ufuncs exist in two flavors:
- unary ufuncs: which operate on a single input
- binary ufuncs: which operate on two inputs.

## Array arithmetic

Feel quite natural as they all use standard arithmetic:

In [11]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # floor division

x     = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


In [12]:
print("-x     = ", -x) #negation
print("x ** 2 = ", x ** 2) #
print("x % 2  = ", x % 2)

-x     =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2  =  [0 1 0 1]


In [13]:
# Operators can also be combined
-(0.5*x + 1) ** 2

array([-1.  , -2.25, -4.  , -6.25])

In [14]:
y = np.arange(1000).reshape(10,100)
y

array([[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
         13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
         26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
         39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
         52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
         65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
         78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
         91,  92,  93,  94,  95,  96,  97,  98,  99],
       [100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
        113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,
        126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138,
        139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151,
        152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
        165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 17

In [15]:
%timeit y ** 2

1.46 µs ± 441 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


Each of the previous operation are *wrappers* for specific NumPy functions:

- e.g `+` is the wrapper for the `add` function.

In [16]:
np.add(x, 2)

array([2, 3, 4, 5])

## Absolute value

NumPy also interacts with other inbuilt Python arithmetic operators:

- e.g Python's built-in absolute value function

In [17]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

array([2, 1, 0, 1, 2])

In [18]:
np.abs(x)

array([2, 1, 0, 1, 2])

## Trigonometric functions

NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions.

In [19]:
theta = np.linspace(0, np.pi, 3)

In [20]:
print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta))
print("cos(theta) = ", np.cos(theta))
print("tan(theta) = ", np.tan(theta))

theta      =  [0.         1.57079633 3.14159265]
sin(theta) =  [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) =  [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) =  [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]


In [None]:
np.linspace?

In [None]:
x = [-1, 0, 1]
print("x         = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))

## Exponents and logarithms

Another common type of operation available in a NumPy ufunc are the exponentials:

In [21]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]


In [22]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]


## Advanced ufunc features

Specifying output

In [34]:
x = np.arange(5)
print(x)

y = np.empty(5)
print(y)

np.multiply(x, 10, out=y)
print(y)

[0 1 2 3 4]
[0.0e+000 4.9e-324 9.9e-324 1.5e-323 2.0e-323]
[ 0. 10. 20. 30. 40.]


In [35]:
y = np.zeros(10)
print(y)
np.power(2, x, out=y[::2])
print(y)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


## Aggregates

For binary ufuncs, there are some interesting aggregates that can be computed directly from the object.

- e.g `reduce` applies a given operation to the elements of an array til a single result remains

In [27]:
x = np.arange(1, 6)
print(x)
np.add.reduce(x)

[1 2 3 4 5]


15

In [26]:
np.multiply.reduce(x)

120

In [28]:
# If we'd like to store all the intermediate results of the computation, we can instead use accumulate
np.add.accumulate(x)

array([ 1,  3,  6, 10, 15])

In [None]:
np.multiply.accumulate(x)

## Outer products

Finally, any ufunc can compute the output of all pairs of two different inputs using the `outer` method.

In [30]:
x = np.arange(1, 6)
print(x)
np.multiply.outer(x, x)
# consider the first row and first column

[1 2 3 4 5]


array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])

# Summary: ufuncs:

- Help speed up computation significantly

- Useful for array arithmetic, applying operations to all values.

- Also useful for aggregate functions

- __N.B__ if you're stuck with this stuff don't forget the inbuilt help `?` after a function.

In [31]:
np.multiply.outer?