## Numpy and Computation Efficiency

This notebooks illustrates the computational efficiency of running linear algebra with the proper tools - such as numpy.

Let's compute an array dot product in Python:

In [None]:
import numpy as np

In [None]:
def array_dot_product(v1, v2):
    dot_product = 0
    
    for v1_i, v2_i in zip(v1, v2):
        dot_product += v1_i * v2_i
    
    return dot_product

v1 = list(range(100))
v2 = list(range(100, 200))

print("v1 = %s\n" % v1)
print("v2 = %s\n" % v2)

result = array_dot_product(v1, v2)
print("v1 dot v2 = %d" % result)

Okay, it works, but how long does it take?

In [None]:
%timeit array_dot_product(v1, v2)

Now let's try with numpy -- it uses data structures like in C, optimized for mathematical operations, without the Python overhead.

In [None]:
v1_np = np.arange(100)
v2_np = np.arange(100, 200)
print("v1: %s\n" % v1_np)
print("v2: %s\n" % v2_np)

result = v1_np.dot(v2_np)
print("v1 dot v2 = %d" % result)

Nice, aligned formatting. Now let's check the running time.

In [None]:
%timeit v1_np.dot(v2_np)

What about matrices?

In [None]:
def matrix_dot_product(m1, m2_t):
    num_rows = len(m1)
    num_columns = len(m2_t)
    internal_dim = len(m1[0])
    result = []
    
    for i in range(num_rows):
        new_row = []
        for j in range(num_columns):
            total = 0
            for k in range(internal_dim):
                total += m1[i][k] * m2_t[j][k]
            new_row.append(total)
        result.append(new_row)
    
    return result

In [None]:
m1 = np.random.rand(100, 200)
m2 = np.random.rand(200, 300)

m2_t = m2.T
m1_list = m1.tolist()
m2_t_list = m2_t.tolist()
result_list = matrix_dot_product(m1_list, m2_t_list)
result_numpy = m1.dot(m2)

Checking the results...

In [None]:
result_list = np.array(result_list)
result_list == result_numpy

Different? How much?

In [None]:
result_list - result_numpy

In [None]:
np.abs(result_list - result_numpy).sum()

Okay. Now lets time it again.

In [None]:
%timeit matrix_dot_product(m1_list, m2_t_list)

In [None]:
%timeit m1.dot(m2)

In [None]:
time1 = 647e-3
time2 = 215e-6
print('Numpy is ~{:.0f}x faster than standard python'.format(time1 / time2))

## Enters PyTorch

In [None]:
import torch

In [None]:
m1_pt = torch.from_numpy(m1)
m2_pt = torch.from_numpy(m2)

In [None]:
%timeit m1_pt @ m2_pt

Seems about the same... Now let's try to use a GPU:

In [None]:
m1_pt = m1_pt.to('cuda')
m2_pt = m2_pt.to('cuda')
%timeit m1_pt @ m2_pt

Can we make make things more efficient? Enters **JIT**.

In [None]:
@torch.jit.script
def jit_mm(m1, m2):
    return m1 @ m2

%timeit jit_mm(m1_pt, m2_pt)

In [None]:
%timeit jit_mm(m1_pt, m2_pt)

In [None]:
traced_mm = torch.jit.trace(jit_mm, (torch.rand(2,2), torch.rand(2,2)))

%timeit traced_mm(m1_pt, m2_pt)

In [None]:
%timeit traced_mm(m1_pt, m2_pt)