# Jupyter Notebook

Jupyter is just a wrapper for Python code. It comes with a lot of convenience tools though. There are two main modes which you can be in:
    - Edit mode: Typing code or markdown,
    - Command mode: where you can select cells and run them

Every jupyter session starts fresh. So you can run cells when you open the notebook. You can either run each individually by pressing SHIFT+Enter on the highlighted cell, or run all cells. The cell tab at the top gives you these options. Expore the options in the tab, there are not many of them, but they will be used often enough

# Numpy

Numpy is the core of fast matrix computing in Python. Lists in python are too slow to do lots of math with. Numpy save data in contiguous memory and uses a fast C++ backend to make everything fast. As an example we will use Jorge's problem of finding the closest point between to point sets.

In [54]:
#Here we want to go into a file and munge out the data
#We can do calculations in raw python first as well and time it.



In [1]:
import numpy as np
ps1 = np.random.random((40,3))
ps2 = np.random.random((30,3))

ps1

array([[0.21353496, 0.3834453 , 0.56432614],
       [0.69772295, 0.39672954, 0.2747583 ],
       [0.3034489 , 0.41751568, 0.69772927],
       [0.30413305, 0.67913954, 0.37630028],
       [0.15937635, 0.40533564, 0.97341271],
       [0.69822766, 0.95237304, 0.49655389],
       [0.52424429, 0.52050969, 0.15214705],
       [0.64268822, 0.77106205, 0.23334826],
       [0.76984718, 0.90613019, 0.66798594],
       [0.02097098, 0.10248472, 0.35366696],
       [0.30289593, 0.68625763, 0.98727945],
       [0.69270558, 0.50318029, 0.40569417],
       [0.57050612, 0.01199752, 0.73033142],
       [0.40833829, 0.96199326, 0.9340937 ],
       [0.77476349, 0.92064619, 0.30979368],
       [0.71643566, 0.13264324, 0.93598299],
       [0.89097785, 0.9849063 , 0.55294764],
       [0.28109038, 0.37466919, 0.00901289],
       [0.62930505, 0.18521589, 0.29459521],
       [0.75078031, 0.88621428, 0.01533064],
       [0.25550922, 0.30569632, 0.68970241],
       [0.58248162, 0.11384877, 0.67044518],
       [0.

In [56]:
ps1 + ps2

ValueError: operands could not be broadcast together with shapes (40,3) (30,3) 

In [2]:
ps1[0:10, 0]

array([0.21353496, 0.69772295, 0.3034489 , 0.30413305, 0.15937635,
       0.69822766, 0.52424429, 0.64268822, 0.76984718, 0.02097098])

In [3]:
ps1[:,:].shape #This doesn't change anything, just asks for everything

(40, 3)

In [4]:
ps1[:, None, :].shape, ps2[None,:,:].shape #Insert a dimension of size one into the arrays

((40, 1, 3), (1, 30, 3))

In [5]:
disp_vec_mat = (ps1[:, None, :] - ps2[None,:,:]) #Magical operation
disp_vec_mat.shape

(40, 30, 3)

In [6]:
dist_mat = np.sqrt((disp_vec_mat ** 2).sum(axis=-1))
dist_mat.shape

(40, 30)

In [7]:
#minimum distance is:
print(dist_mat.argmin(0).argmin())
print(dist_mat.argmin(1).argmin())

5
13


# Broadcasting

In [8]:
arr = np.arange(16).reshape(8,2)
arr.shape

(8, 2)

In [9]:
const = np.array([2])
const.shape

(1,)

In [10]:
(arr + const).shape

(8, 2)

In [11]:
vec = np.array([3,4])
vec.shape

(2,)

In [12]:
arr + vec #makes sense

array([[ 3,  5],
       [ 5,  7],
       [ 7,  9],
       [ 9, 11],
       [11, 13],
       [13, 15],
       [15, 17],
       [17, 19]])

In [13]:
arr + np.array([1,2,3])

ValueError: operands could not be broadcast together with shapes (8,2) (3,) 

In [14]:
def broacast_op(arr1, arr2):
    return np.sqrt( ((arr1[:,None,:] - arr2[None,:,:])**2).sum(axis=-1) )

%timeit -n10 broacast_op(ps1, ps2)

10 loops, best of 3: 62.8 µs per loop


In [15]:
import numba
broadcast_op_numb = numba.jit(broacast_op)

%timeit -n10 broadcast_op_numb(ps1, ps2)

The slowest run took 2172.88 times longer than the fastest. This could mean that an intermediate result is being cached.
10 loops, best of 3: 72.7 µs per loop
