In [1]:
%%HTML
<style>
.container { width:100% }
</style>

## Vectorization is Fast!

This short notebook demonstrates that working with <tt>NumPy</tt> arrays is much faster than working with *Python* lists.

In [2]:
import numpy as np

We begin by defining two <tt>NumPy</tt> arrays `a` and `b` that are each filled with a million random numbers.

In [3]:
a = np.random.rand(1000000)
b = np.random.rand(1000000)

Next, we compute the <em style="color:blue;">dot product</em> of `a` and `b`.  Mathematically, this is defined as follows:
$$ \textbf{a} \cdot \textbf{b} = \sum\limits_{i=1}^n \textbf{a}[i] \cdot \textbf{b}[i], $$
where $n$ is the dimension of `a`and `b`.  In *Python* we can use the operator `@` to compute the *dot product*.

In [4]:
%%time 
a @ b

CPU times: user 4.61 ms, sys: 2.93 ms, total: 7.54 ms
Wall time: 4.35 ms


250086.59728420374

To compare this time with time that is needed if `a` and `b` are stored as lists instead, we convert `a` and `b` to ordinary *Python* lists.

In [5]:
la = list(a)
lb = list(b)

Next, we compute the <em style="color:blue;">dot product</em> of `a` and `b` using these lists.

In [6]:
%%time
sum = 0
for i in range(len(la)):
    sum += la[i] * lb[i]

CPU times: user 304 ms, sys: 1.98 ms, total: 306 ms
Wall time: 305 ms


We notice that <tt>NumPy</tt> based computation is much faster than the list based computation.  Similar observations can be made when a function is applied to all elements of an array.  For big arrays, using the vectorized functions offered by <tt>NumPy</tt> is usually much faster than applying the function to all elements of a list.

In [7]:
import math

In [9]:
%%time
for i, x in enumerate(la):
    lb[i] = math.sin(x)

CPU times: user 272 ms, sys: 2.08 ms, total: 274 ms
Wall time: 273 ms


In [10]:
%%time
b = np.sin(a)

CPU times: user 11.9 ms, sys: 2.58 ms, total: 14.5 ms
Wall time: 13.2 ms
