## Lecture 3 - making code go fast feat. NumPy

### Intro to Numpy

Numpy is built around the **ndarray**, which you can think of as a matrix of arbitrary dimension.

![ndarray instantiation](https://miro.medium.com/v2/resize:fit:1200/1*sxnhgeSptW8Jfol8XUyP-Q.png)

In [None]:
import numpy as np

Most common ways of instantiating numpy arrays:
- np.array(list)
- np.zeros((shape object))
- np.arange(start, stop, step)
    
Properties of arrays:
- array.shape: returns a tuple with the relevant information. E.g. generate_binary_numbers(5).shape = (32, 5)
- array.ndim: len(array.shape).
- array.size: np.product(array.shape)

#### Accessing elements of arrays: array slicing

Basic slicing works the same way as on Python lists, just across multiple dimensions, potentially. For example:

In [None]:
a = np.arange(25).reshape(5,5)
print(a)

In [None]:
a[1:4, 0:2]

###### mini-exercises (3 hands)

What is the output of a[3:, :3]?

How about a[2:4, 1::2]?

#### advanced indexing

Numpy is much cooler than base Python. Specifically, you can index a numpy array with *a numpy array*. This is called "advanced indexing". A simple example:

In [None]:
zip_code_index = np.array([6,0,6,3,7])
b = np.arange(30)**2
print(b)

What do I get here? (3 hands)

In [None]:
b[zip_code_index]

This can get somewhat wild - what do I get as the output of these two cells?

In [None]:
b[a]

In [None]:
a[b[:3]]

Note that even though b is a one dimensional array, we can slice it two-dimensionally!!

##### Advanced slicing: use case

Boolean arrays can be easily generated in numpy:

In [None]:
a > 12

and can be used to index arrays (most commonly the array itself):

In [None]:
a[a > 12] = 100
a

Operations on arrays

We just saw one example - we set some values in an array to an integer. Other examples:

In [None]:
c = np.repeat(np.arange(5),5).reshape(5,5)
c

In [None]:
a

In [None]:
a * c

In [None]:
a + c

Operations are *all* element-wise unless otherwise specified (e.g. for "normal" matrix multiplication, use @). Because things are elementwise, arrays of the same shape can be operated on as you would expect. But what about something like

In [None]:
a + c[:, 0]

What happened here?

#### Broadcasting

Broadcasting is numpy's process of attempting to "morph" two arrays into having the same shape so that element-wise operations can be applied. (I thought this was more black magic and I just sort of... tried transposing arrays, reshaping things, etc. until something worked, until about two months ago. Now I more or less understand broadcasting, and it's actually pretty simple.)

Broadcasting works as follows:

Numpy _prepends_ arrays with dimensions of size 1 as necessary until they have the same number of dimensions, then compares dimensions starting from the rightmost element of a shape tuple, and deems two arrays compatible if, for each dimension:
1. both arrays have the same size
2. one (or both) arrays have size 1.

For all dimensions with size 1, numpy will then "stretch" this dimension to make it have the same shape as non-1 sizes of the same dimension before doing the operation. Here is a picture depicting this process:

![ndarray broadcasting](https://numpy.org/doc/stable/_images/broadcasting_2.png)

As a numerical example, if you have matrices d with d.shape = (8,3,1,8) and e with e.shape = (3,5,1), d + e will not throw an error and will have shape (8,3,5,8).

So, from the earlier code, a + c[:, 0] worked because a has shape (5,5) and c[:, 0] has shape (5,) -> (1,5), which can be broadcasted to (5,5).

###### mini exercises:

For each of the following, determine whether the two arrays have compatible dimensions, and if they do, what the dimensions of the resulting array after a binary operation are.

1. f.shape = (5,1,3,2), g.shape = (1,3)
2. f.shape = (5,1,3,2), h.shape = (1,3,1)
3. f.shape = (5,1,3,2), i.shape = (1,3,1,1)
4. f.shape = (5,1,3,2), j.shape = (1,3,1,1,1)
5. f.shape = (5,1,3,2), k.shape = (1,3,1,1,1,1)

Everything else in numpy is just functions. Numpy (+ scipy) has functions for everything you could ever want, seriously. As an example, I was calculating p-values by fitting points to a null distribution, and using the definition of a p-value as 1-cdf. Some of my p-values were very very small, so they were being returned as 0, which caused their log to be bad, etc. 

Turns out every (continuous) distribution in scipy.stats can return log(1-cdf) with more precision than manually computing the log of 1 minus the cdf. wild.

Lastly, the axis keyword is important and a little confusing - basically applies a numpy function along a "direction":

In [None]:
d = np.arange(15).reshape(3,5)
print(d)
print(np.sum(d))
print(np.sum(d, axis=0))
print(np.sum(d, axis=1))

Oftentimes, you know what the shape of the resulting array you want is but not what axis that corresponds to - for example, you know you want to average something over time within 100 different experiments, is that axis=0? 1?
My preferred way to remember this is that axis=i will delete the ith value from the shape.
d.shape = (3,5) -> axis=0 makes the shape (5,), axis=1 makes the shape (3,)

<center><h1>how to make the code fast</h1></center>

<br/>

### The golden rule of code optimization

<br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>






# DON'T DO IT


<br/><br/>
<br/><br/>
<br/><br/>
<br/><br/>
<br/><br/>



## yet





### make sure your code does _exactly_ what you want and does so _correctly_ before even thinking about making it fast.

### Then:
- vectorize
- reduce function calls + other cleverness
- present an offering to our savior, Numba, and hope they smile favorably upon your code

**working example**: an extension to HW 1 problem 3 - adapted from Maryn Carlson's code

In [None]:
import numpy as np
def binary_digits(n):
    if n == 1:
        return [[0], [1]]
    return [[*row, i] for row in binary_digits(n-1) for i in range(2)]

In [None]:
bin_mat_small = binary_digits(3)
bin_mat_small

In [None]:
bin_mat_10 = binary_digits(10)

Problem: count the number of values that equal 1 in both row i and j of a binary matrix.

In [None]:
def create_counts_matrix_purepython(binMat):
    nstates = len(binMat)
    n = len(binMat[0])
    counts = []

    for i in range (nstates):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i][row_idx] == 1 and binMat[j][row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [None]:
create_counts_matrix_purepython(bin_mat_small)

In [None]:
test_result = create_counts_matrix_purepython(bin_mat_10)

### profiling - "this took a while, but how long, exactly?"

In [None]:
%%timeit -r 3 -n 3
create_counts_matrix_purepython(bin_mat_10)

##### tqdm aside:

In [None]:
from tqdm.notebook import tqdm
def create_counts_matrix_tqdm(binMat):
    nstates = len(binMat)
    n = len(binMat[0])
    counts = []

    for i in tqdm(range(nstates)):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i][row_idx] == 1 and binMat[j][row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [None]:
_ = create_counts_matrix_tqdm(bin_mat_10)

In [None]:
%load_ext line_profiler

In [None]:
%lprun -f create_counts_matrix_purepython create_counts_matrix_purepython(bin_mat_10)

alternatives: cProfile + snakeviz (*show plot*), scalene (I can't read it but it's apparently good)
https://coderzcolumn.com/tutorials/python/snakeviz-visualize-profiling-results-in-python is a good intro to snakeviz

### level 1: vectorization

Vectorization just means converting for loops to numpy operations. As a quick example, the formula for the allele frequency in the n+1st generation under selection is $$p' = p + sp(1-p)/2$$. Two ways to compute a vector of allele frequencies in generation n+1 given a vector of allele frequencies in generation n and a selection coefficient are:

In [None]:
import numpy as np
p = np.random(size=100)
s = 0.01
p_prime = np.zeros_like(p)
for i in range(p.shape[0]):
    p_prime[i] = p[i] + s*p[i]*(1-p[i])
    
    
p_prime = p + s*p*(1-p)

The second is faster, easier to understand, and closer to the formula above in appearance.

Now, let's apply this to create_counts_matrix:

In [None]:
def create_counts_matrix_purepython(binMat):
    nstates = len(binMat)
    n = len(binMat[0])
    counts = []

    for i in range (nstates):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i][row_idx] == 1 and binMat[j][row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [None]:
def create_counts_matrix_level1(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range(nstates):
        for j in range(nstates):
            prod = binMat[i,:]*binMat[j,:]
            nThrees  = np.sum (prod == 1)
            counts[i, j] = nThrees
    return counts

In [None]:
bin_arr_10 = np.array(bin_mat_10)

In [None]:
%%timeit -r 2 -n 1
create_counts_matrix_level1(bin_arr_10)

In [None]:
%lprun -f create_counts_matrix_level1 create_counts_matrix_level1(bin_arr_10)

### level 2: reduce calls + cleverness

how do we deal with that gosh darn for loop??

In [None]:
def create_counts_matrix_level2(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range (nstates):
        temp_prod = np.zeros_like(binMat)
        for j in range(nstates):
            temp_prod[j, :] = binMat[i,:]*binMat[j,:]
        nThrees  = np.sum(temp_prod == 1, axis=1)
        counts[i, :] = nThrees
    return counts

In [None]:
%%timeit -r 3 -n 2
create_counts_matrix_level2(bin_arr_10)

In [None]:
%lprun -f create_counts_matrix_level2 create_counts_matrix_level2(bin_arr_10)

In [None]:
def create_counts_matrix_level2_2(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range (nstates) :
        plus = binMat[i,:]*binMat #broadcasting!!
        nThrees  = np.sum(plus == 1, axis=1)
        counts[i, :] = nThrees
    return counts

In [None]:
%%timeit -n 10
create_counts_matrix_level2_2(bin_arr_10)

In [None]:
%lprun -f create_counts_matrix_level2_2 create_counts_matrix_level2_2(bin_arr_10)

In [None]:
def create_counts_matrix_level2_3(binMat):
    return np.einsum("ij, kj -> ik", binMat, binMat)

In [None]:
np.all(create_counts_matrix_level2_3(bin_mat_10) == create_counts_matrix_level2_2(bin_arr_10))

In [None]:
%%timeit -n 50
create_counts_matrix_level2_3(bin_arr_10)

### level 3: praise Numba

numba is straight up black magic

like all good black magic, it's a bit finnicky:
- don't use numba for certain stuff (non-numerical data, randomness)
- doesn't produce even remotely useful error messages.

In [None]:
from numba import njit

In [None]:
def create_counts_matrix_purepython(binMat):
    nstates, n = binMat.shape
    counts = []
    for i in range (nstates):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx] == 1 and binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [None]:
@njit(cache=True)
def create_counts_matrix_numba(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range (nstates) :
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx] == 1 and binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[i, j] = cell_val
    return counts

In [None]:
_ = create_counts_matrix_numba(bin_arr_10)

In [None]:
%%timeit -n 50
create_counts_matrix_numba(bin_arr_10)

In [None]:
%lprun -f create_counts_matrix_numba create_counts_matrix_numba(bin_arr_10)

In [None]:
@njit(cache=True)
def create_counts_matrix_numba_2(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)
    for i in range (nstates) :
        for j in range(i, nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx] == 1 and binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[j, i] = counts[i, j] = cell_val
    return counts

In [None]:
_ = create_counts_matrix_numba_2(bin_arr_10)

In [None]:
%%timeit -n 50
create_counts_matrix_numba_2(bin_arr_10)

In [None]:
@njit(cache=True)
def create_counts_matrix_numba_3(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)
    for i in range (nstates) :
        for j in range(i, nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx]*binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[j, i] = counts[i, j] = cell_val
    return counts

In [None]:
_ = create_counts_matrix_numba_3(bin_arr_10)

In [None]:
%%timeit -n 50
create_counts_matrix_numba_3(bin_arr_10)

#### what have we learned

- <h4>write code that works first!</h4>
- if it's slow, use a profiler (e.g. line_profiler, cProfile + snakeviz) to figure out how slow it is and where it's slow
- vectorize, rewrite to minimize function calls, @njit