# Numpy

## Vectorization
= having the CPU do multiple things at the time

Natively not possible in Python because of “memory fragmentation”:
Lists contain pointers to objects, not the objects themselves. And the “BUS” can only move contiguous chunks of data to the cache at ones. 
But with pointes, the objects are scattered all over RAM.

Luckely, there is numpy, that both stores its data contiguously and knowns how to leverage the CPU for vectorization.


In [1]:
import numpy as np

In [2]:
def dot_list_comprehension(x, y):
    return sum([x[i] * y[i] for i in range(len(x))])

In [3]:
def dot_numpy(x, y):
    return np.dot(x, y)

In [4]:
x = list(range(1000))
y = list(range(1000))

In [5]:
%timeit dot_list_comprehension(x, y)

97.4 µs ± 2.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [16]:
x_array = np.arange(1000)
y_array = np.arange(1000)

In [7]:
%timeit dot_numpy(x_array, y_array)

2.08 µs ± 60.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [9]:
assert dot_list_comprehension(x, y) == dot_numpy(x_array, y_array)

### Note!
However that there is an overhead on creating numpy arrays from lists, e.g. this is actually slower then list comprehension:

In [10]:
%timeit dot_numpy(x, y)

150 µs ± 3.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## In place operations

In-place operations alter the object itself

Non in-place operations make a new object in memory, which is expensive

In [17]:
z_array = np.arange(1000)

In [45]:
print(f"id of z {id(z_array):>44}")
z_array += y_array
print(f"id of z after iniplace addition {id(z_array):>20}")
z_array = z_array + y_array
print(f"id of z after non-in-place addition  {id(z_array):>10}")

id of z                              140330476183152
id of z after iniplace addition      140330476183152
id of z after non-in-place addition  140330476182864


In [46]:
def add_not_in_place(x, y):
    x = x + y
    return x

def add_in_place(x, y):
    x += y
    return x

In [47]:
%timeit add_not_in_place(x_array, y_array)

978 ns ± 23.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [48]:
%timeit add_in_place(x_array, y_array)

1.18 µs ± 34.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
