## Vectorization


Suppose that we have three-dimensional arrays $ \mathbf u $ and $ \mathbf v $
and would like to take their inner (dot) product. Here's the most obvious way of doing
this in Python:

In [2]:
import numpy as np

dimension = 3
u = np.array([1, 2, 3])
v = np.array([-1, 0, 1])

product = 0
for i in range(dimension):
    product += u[i] * v[i]
print(product)

2


This approach's limitation is that it performs the required arithmetical
operations *in sequence*. That is, we first compute the product $ 1 \cdot (-1) $
and add it to our cumulative total; then we add $ 2 \cdot 0 $; and finally we add
$ 3 \cdot 1 $ to arrive at the result.

We can speed up computations of this kind significantly by processing the
entire arrays, or at least large chunks of the arrays, at once. This technique
is called **vectorization**. It leverages the optimized implementations of vector and
matrix operations provided by libraries such as NumPy, which make use of
multi-core CPUs or even GPUs (Graphics Processing Units) to execute tasks *in
parallel* whenever possible. 

The vectorized version of the previous example would be:

In [3]:
import numpy as np

u = np.array([1, 2, 3])
v = np.array([-1, 0, 1])

product = np.dot(u, v)  # Compute the dot product of u and v.
print(product)

2


This version is certainly more concise and legible. However, because the vectors
have only $ 3 $ dimensions, the performance boost is not noticeable. To better
appreciate the power of vectorization, let's consider the task of computing
the dot product of two vectors having $ 10 $ million dimensions. Here is the
vectorized version:

In [5]:
import time  # Module that will allow us to time the computations

# Generate two large vectors with random coordinates in [0, 1):
dimension = 10**7
u = np.random.rand(dimension)  
v = np.random.rand(dimension)

tic = time.time()
prod = np.dot(u, v)
toc = time.time()
vect_runtime = toc - tic

print(f"The inner product is {prod}.")
print(f"Runtime for vectorized version: {1000 * vect_runtime} ms.")

The inner product is 2499843.426373514.
Runtime for vectorized version: 4.890680313110352 ms.


📝 The function `time.time()` used above returns a floating point number that
represents the time in seconds since January 1st, 1970, 00:00:00 (UTC). The
duration of the computation was measured by taking the time before (`tic`) and
after (`toc`) it, and then taking the difference. 

Considering the dimension of the vectors, the computation was pretty fast. Let's
now contrast the performance to that of its non-vectorized counterpart.

In [8]:
import time  # A module that will allow us to time the computations

tic = time.time()
prod = 0
for i in range(dimension):
    prod += u[i] * v[i]
toc = time.time()
non_vect_runtime = toc - tic
print(f"The inner product is {prod}.")
print(f"Runtime for vectorized version: {non_vect_runtime} s.")
print(f"Or, in miliseconds: {1000 * (non_vect_runtime)}")

The inner product is 2499843.4263739116.
Runtime for vectorized version: 1.4287347793579102 s.
Or, in miliseconds: 1428.7347793579102


📝 The slight discrepancy in the value of the dot product between the two
versions arises because the non-vectorized method accumulates floating-point
arithmetic errors more prominently than the vectorized operation.

**Exercise:** Referring to the example above, compute the precise 
speedup factor relating the two implementations. Why does this value
change everytime you compute it?

This discussion shows that vectorization provides a simple way to improve the
performance of our code by several orders of magnitude. This is especially
crucial in machine learning and computer graphics. These fields frequently
involve dealing with vast datasets and complex numerical computations. In such
cases, code that does not make use of vectorization is simply not viable.

To summarize: *When performing operations on arrays, avoid loops whenever
possible; instead, use the built-in vectorized implementations provided by
NumPy.*

## Vector- and matrix-valued functions

To illustrate another aspect of vectorization, let $ \mathbf u = (1, 2, 3) $ and
suppose that we need to apply the exponential function to each of the
coordinates of $ \mathbf u $. That is, suppose that we wish to compute $ \exp(u)
= (e^1, e^2, e^3) $. We could simply use a for-loop:

In [40]:
u = np.array([1, 2, 3])
v = np.zeros(3)  # will hold the result
for i in range(3):
    v[i] = np.exp(u[i])
print(v)

[ 2.71828183  7.3890561  20.08553692]


However, there is an alternative way that is both simpler and more efficient,
namely, to use vectorization, in this case by leveraging to NumPy to apply the
exponential to the vector $ u $ as a whole:

In [41]:
v = np.exp(u)
print(v)

[ 2.71828183  7.3890561  20.08553692]


This approach also works with basic operations that are built into Python, such as:

In [45]:
squared_u = u**2
print(squared_u)

[1 4 9]


In [46]:
reciprocal_u = 1 / u
print(reciprocal_u)

[1.         0.5        0.33333333]


__Exercise:__ Similarly to what we did for the inner product, compare the difference in performance
between the vectorized and non-vectorized approaches to computing $ e^u $, where $ u $ is a random
vector having a large number of dimensions. What is the speedup factor, approximately?

__Exercise:__ Let $ u = (1, 2, 3) $. Compute:

(a) $ \log(u) $ (that is, the vector whose coordinates are the logarithms of the coordinates of $ u $). *Hint*: Use the `np.log` function.

(b) $ \vert{u}\vert $ (that is, the vector whose coordinates are the absolute values of the coordinates of $ u $). *Hint:* Use the `np.abs` function.

(c) $ \max\{u, 3\} $. *Hint:* Use the `np.max` function.

(d) $ \sin\big(\tfrac{\pi u}{2}\big) $. *Hint:* Use the `np.sin` function.

How did you interpret the vectors of items (c) and (d)?