## Lecture 3 - making code go fast feat. NumPy

### Intro to Numpy

Numpy is built around the **ndarray**, which you can think of as a matrix of arbitrary dimension.

![ndarray instantiation](https://miro.medium.com/v2/resize:fit:1200/1*sxnhgeSptW8Jfol8XUyP-Q.png)

In [99]:
import numpy as np

Most common ways of instantiating numpy arrays:
- np.array(list)
- np.zeros((shape object))
- np.arange(start, stop, step)
    
Properties of arrays:
- array.shape: returns a tuple with the relevant information. E.g. generate_binary_numbers(5).shape = (32, 5)
- array.ndim: len(array.shape).
- array.size: np.product(array.shape)

#### Accessing elements of arrays: array slicing

Basic slicing works the same way as on Python lists, just across multiple dimensions, potentially. For example:

In [100]:
a = np.arange(25).reshape(5,5)
print(a)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]


In [101]:
a[1:4, 0:2]

array([[ 5,  6],
       [10, 11],
       [15, 16]])

In [103]:
a[2:4, 1::2]

array([[11, 13],
       [16, 18]])

###### mini-exercises (3 hands)

What is the output of a[3:, :3]?

How about a[2:4, 1::2]?

#### advanced indexing

Numpy is much cooler than base Python. Specifically, you can index a numpy array with *a numpy array*. This is called "advanced indexing". A simple example:

In [104]:
zip_code_index = np.array([6,0,6,3,7])
b = np.arange(30)**2
print(b)

[  0   1   4   9  16  25  36  49  64  81 100 121 144 169 196 225 256 289
 324 361 400 441 484 529 576 625 676 729 784 841]


What do I get here? (3 hands)

In [105]:
b[zip_code_index]

array([36,  0, 36,  9, 49])

This can get somewhat wild - what do I get as the output of these two cells?

In [106]:
b[a]

array([[  0,   1,   4,   9,  16],
       [ 25,  36,  49,  64,  81],
       [100, 121, 144, 169, 196],
       [225, 256, 289, 324, 361],
       [400, 441, 484, 529, 576]])

In [107]:
a[b[:3]]

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [20, 21, 22, 23, 24]])

Note that even though b is a one dimensional array, we can slice it two-dimensionally!!

##### Advanced slicing: use case

Boolean arrays can be easily generated in numpy:

In [108]:
a > 12

array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

and can be used to index arrays (most commonly the array itself):

In [109]:
a[a > 12] = 100
a

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100]])

Operations on arrays

We just saw one example - we set some values in an array to an integer. Other examples:

In [110]:
c = np.repeat(np.arange(5),5).reshape(5,5)
c

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [111]:
a

array([[  0,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100]])

In [114]:
a * c

array([[  0,   0,   0,   0,   0],
       [  5,   6,   7,   8,   9],
       [ 20,  22,  24, 200, 200],
       [300, 300, 300, 300, 300],
       [400, 400, 400, 400, 400]])

In [115]:
a + c

array([[  0,   1,   2,   3,   4],
       [  6,   7,   8,   9,  10],
       [ 12,  13,  14, 102, 102],
       [103, 103, 103, 103, 103],
       [104, 104, 104, 104, 104]])

Operations are *all* element-wise unless otherwise specified (e.g. for "normal" matrix multiplication, use @). Because things are elementwise, arrays of the same shape can be operated on as you would expect. But what about something like

In [116]:
a + c[:, 0]

array([[  0,   2,   4,   6,   8],
       [  5,   7,   9,  11,  13],
       [ 10,  12,  14, 103, 104],
       [100, 101, 102, 103, 104],
       [100, 101, 102, 103, 104]])

What happened here?

#### Broadcasting

Broadcasting is numpy's process of attempting to "morph" two arrays into having the same shape so that element-wise operations can be applied. I thought this was more black magic and I just sort of... tried transposing arrays, reshaping things, etc. until something worked, until about two years ago. Now I more or less understand broadcasting, and it's actually pretty simple!

###### Broadcasting intuition: 0-d and 1-d.
Scalars (0-dimensional arrays) can broadcast with anything.
![scalar broadcasting](https://numpy.org/doc/stable/_images/broadcasting_1.png)

In [117]:
a + 5 #works

array([[  5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14],
       [ 15,  16,  17, 105, 105],
       [105, 105, 105, 105, 105],
       [105, 105, 105, 105, 105]])

In [118]:
a[0, :] + 5 #works

array([5, 6, 7, 8, 9])

In [125]:
[0,1,2,3] + [1,2,3,4]

[0, 1, 2, 3, 1, 2, 3, 4]

In [128]:
np.array([[0],[1],[3]]).shape

(3, 1)

What about 1-d arrays? If array f has shape (m,) and array g has shape (n,), when will f + g be a valid operation in NumPy?




NOTE: what about m|n or n|m? This is, for good reason, NOT valid! No way of knowing what to do unambiguously. For example, if f = [1,2,3] and g = [1,2,3,4,5,6], is f + g broadcasted properly as

[1,2,3,1,2,3]
\+
[1,2,3,4,5,6]

or

[1,1,2,2,3,3]
\+
[1,2,3,4,5,6]?

###### General broadcasting rules

For two ndarrays, broadcasting more or less follows the 1-d rules for each dimension. How can you tell if two arrays are broadcastable against each other?

1. Right-align the shape of both arrays
2. For each dimension, ensure that the size matches or equals one.

So:

![ndarray example](numpy_code_snippet.png)

For all dimensions with size 1, numpy will then "stretch" this dimension to make it have the same shape as non-1 sizes of the same dimension before doing the operation. Here is a picture depicting this process:

![ndarray broadcasting](https://numpy.org/doc/stable/_images/broadcasting_2.png)

As a numerical example, if you have matrices d with d.shape = (8,3,1,8) and e with e.shape = (3,5,1), d + e will not throw an error and will have shape (8,3,5,8).

So, from the earlier code, a + c[:, 0] worked because a has shape (5,5) and c[:, 0] has shape (5,) -> (1,5), which can be broadcasted to (5,5).

###### mini exercises:

For each of the following, determine whether the two arrays have compatible dimensions, and if they do, what the dimensions of the resulting array after a binary operation are.

1. f.shape = (5,1,3,2), g.shape = (1,3)
2. f.shape = (5,1,3,2), h.shape = (1,3,1)
3. f.shape = (5,1,3,2), i.shape = (1,3,1,1)
4. f.shape = (5,1,3,2), j.shape = (1,3,1,1,1)
5. f.shape = (5,1,3,2), k.shape = (1,3,1,1,1,1)

Everything else in numpy is just functions. Numpy (+ scipy) has functions for everything you could ever want, seriously. As an example, I was calculating p-values by fitting points to a null distribution, and using the definition of a p-value as 1-cdf. Some of my p-values were very very small, so they were being returned as 0, which caused their log to be bad, etc. 

Turns out every (continuous) distribution in scipy.stats can return log(1-cdf) with more precision than manually computing the log of 1 minus the cdf. wild.

Lastly, the axis keyword is important and a little confusing - basically applies a numpy function along a "direction":

In [132]:
d = np.arange(15).reshape(3,5)
print(d)
print(np.sum(d))
print(np.sum(d, axis=0))
print(np.sum(d, axis=1))

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
105
[15 18 21 24 27]
[10 35 60]


Oftentimes, you know what the shape of the resulting array you want is but not what axis that corresponds to - for example, you know you want to average something over time within 100 different experiments, is that axis=0? 1?
My preferred way to remember this is that axis=i will delete the ith value from the shape.
d.shape = (3,5) -> axis=0 makes the shape (5,), axis=1 makes the shape (3,)

<center><h1>how to make the code fast</h1></center>

<br/>

### The golden rule of code optimization

<br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>






# DON'T DO IT


<br/><br/>
<br/><br/>
<br/><br/>
<br/><br/>
<br/><br/>



## yet





### make sure your code does _exactly_ what you want and does so _correctly_ before even thinking about making it fast.

### Then:
- vectorize
- reduce function calls + other cleverness
- present an offering to our savior, Numba, and hope they smile favorably upon your code

**working example**: Fun with binary sequences of length n - adapted from Maryn Carlson's code

In [133]:
import numpy as np
def binary_digits(n):
    if n == 1:
        return [[0], [1]]
    return [[*row, i] for row in binary_digits(n-1) for i in range(2)]

In [134]:
bin_mat_small = binary_digits(3)
bin_mat_small

[[0, 0, 0],
 [0, 0, 1],
 [0, 1, 0],
 [0, 1, 1],
 [1, 0, 0],
 [1, 0, 1],
 [1, 1, 0],
 [1, 1, 1]]

In [135]:
bin_mat_11 = binary_digits(11)

Problem: count the number of values that equal 1 in both row i and j of a binary matrix of size n.

Expected output: B := a (2^n, 2^n) matrix where B_ij = # of values equal to 1 in both row i and row j of binary_digits(n)

In [138]:
def create_counts_matrix_purepython(binMat):
    nstates = len(binMat)
    n = len(binMat[0])
    counts = []

    for i in range (nstates):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i][row_idx] == 1 and binMat[j][row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [140]:
create_counts_matrix_purepython(bin_mat_small)

[[0, 0, 0, 0, 0, 0, 0, 0],
 [0, 1, 0, 1, 0, 1, 0, 1],
 [0, 0, 1, 1, 0, 0, 1, 1],
 [0, 1, 1, 2, 0, 1, 1, 2],
 [0, 0, 0, 0, 1, 1, 1, 1],
 [0, 1, 0, 1, 1, 2, 1, 2],
 [0, 0, 1, 1, 1, 1, 2, 2],
 [0, 1, 1, 2, 1, 2, 2, 3]]

In [142]:
test_result = create_counts_matrix_purepython(bin_mat_11)

### profiling - "this took a while, but how long, exactly?"

In [143]:
%%timeit -r 3 -n 1
create_counts_matrix_purepython(bin_mat_11)

2.21 s ± 5.23 ms per loop (mean ± std. dev. of 3 runs, 1 loop each)


##### tqdm aside:

In [144]:
from tqdm.notebook import tqdm
def create_counts_matrix_tqdm(binMat):
    nstates = len(binMat)
    n = len(binMat[0])
    counts = []

    for i in tqdm(range(nstates)):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i][row_idx] == 1 and binMat[j][row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [145]:
_ = create_counts_matrix_tqdm(bin_mat_11)

  0%|          | 0/2048 [00:00<?, ?it/s]

In [146]:
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [147]:
%lprun -f create_counts_matrix_purepython create_counts_matrix_purepython(bin_mat_11)

Timer unit: 1e-09 s

Total time: 23.6361 s
File: /var/folders/fr/xd48pkns7m1dqpzds0bm2vl80000gn/T/ipykernel_10640/2795158026.py
Function: create_counts_matrix_purepython at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def create_counts_matrix_purepython(binMat):
     2         1       1000.0   1000.0      0.0      nstates = len(binMat)
     3         1       2000.0   2000.0      0.0      n = len(binMat[0])
     4         1          0.0      0.0      0.0      counts = []
     5                                           
     6      2049     476000.0    232.3      0.0      for i in range (nstates):
     7      2048     436000.0    212.9      0.0          counts.append([])
     8   4196352  857233000.0    204.3      3.6          for j in range(nstates):
     9   4194304  799222000.0    190.5      3.4              cell_val = 0
    10  50331648 9655212000.0    191.8     40.8              for row_idx in range(n):
    

alternatives: cProfile + snakeviz, scalene (I can't read it but it's apparently good)

https://coderzcolumn.com/tutorials/python/snakeviz-visualize-profiling-results-in-python is a good intro to snakeviz

https://codesolid.com/how-do-i-profile-python-code/ is a good all-around article

### level 1: vectorization

Vectorization just means converting for loops to numpy operations. As a quick example, the formula for the allele frequency in the n+1st generation under selection is $$p' = p + sp(1-p)/2$$. Two ways to compute a vector of allele frequencies in generation n+1 given a vector of allele frequencies in generation n and a selection coefficient are:

In [None]:
import numpy as np
p = np.random.default_rng(5).uniform(size=100)
s = 0.01
p_prime = np.zeros_like(p)
for i in range(p.shape[0]):
    p_prime[i] = p[i] + s*p[i]*(1-p[i])/2
    
    
p_prime = p + s*p*(1-p)

The second is faster, easier to understand, and closer to the formula above in appearance.

Now, let's apply this to create_counts_matrix:

In [148]:
def create_counts_matrix_purepython(binMat):
    nstates = len(binMat)
    n = len(binMat[0])
    counts = []

    for i in range (nstates):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i][row_idx] == 1 and binMat[j][row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [149]:
def create_counts_matrix_level1(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range(nstates):
        for j in range(nstates):
            prod = binMat[i,:]*binMat[j,:]
            num_double_ones  = np.sum (prod == 1)
            counts[i, j] = num_double_ones
    return counts

In [150]:
bin_arr_11 = np.array(bin_mat_11)

In [151]:
%%timeit -r 2 -n 1
create_counts_matrix_level1(bin_arr_11)

13 s ± 48 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)


In [152]:
%lprun -f create_counts_matrix_level1 create_counts_matrix_level1(bin_arr_11)

Timer unit: 1e-09 s

Total time: 28.5326 s
File: /var/folders/fr/xd48pkns7m1dqpzds0bm2vl80000gn/T/ipykernel_10640/377914284.py
Function: create_counts_matrix_level1 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def create_counts_matrix_level1(binMat):
     2         1       1000.0   1000.0      0.0      nstates, n = binMat.shape
     3         1     315000.0 315000.0      0.0      counts = np.zeros((nstates, nstates), dtype=np.int8)
     4                                           
     5      2049     414000.0    202.0      0.0      for i in range(nstates):
     6   4196352  855929000.0    204.0      3.0          for j in range(nstates):
     7   4194304 2923494000.0    697.0     10.2              prod = binMat[i,:]*binMat[j,:]
     8   4194304     2.36e+10   5617.7     82.6              num_double_ones  = np.sum (prod == 1)
     9   4194304 1190303000.0    283.8      4.2              counts[i, j] = num_doubl

### level 2: reduce calls + cleverness

Calling numpy functions is VERY expensive!!
We are calling a numpy function 2048*2048 times - this is not good!
How do we reduce the number of function calls? What if we could remove that line from the for loop? 

In [153]:
def create_counts_matrix_level2(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range (nstates):
        temp_prod = np.zeros_like(binMat)
        for j in range(nstates):
            temp_prod[j, :] = binMat[i,:]*binMat[j,:]
        num_double_ones  = np.sum(temp_prod == 1, axis=1)
        counts[i, :] = num_double_ones
    return counts

In [154]:
%%timeit -r 3 -n 2
create_counts_matrix_level2(bin_arr_11)

2.56 s ± 9 ms per loop (mean ± std. dev. of 3 runs, 2 loops each)


In [155]:
%lprun -f create_counts_matrix_level2 create_counts_matrix_level2(bin_arr_11)

Timer unit: 1e-09 s

Total time: 4.33593 s
File: /var/folders/fr/xd48pkns7m1dqpzds0bm2vl80000gn/T/ipykernel_10640/4177989341.py
Function: create_counts_matrix_level2 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def create_counts_matrix_level2(binMat):
     2         1       1000.0   1000.0      0.0      nstates, n = binMat.shape
     3         1     427000.0 427000.0      0.0      counts = np.zeros((nstates, nstates), dtype=np.int8)
     4                                           
     5      2049     427000.0    208.4      0.0      for i in range (nstates):
     6      2048   15541000.0   7588.4      0.4          temp_prod = np.zeros_like(binMat)
     7   4196352  844666000.0    201.3     19.5          for j in range(nstates):
     8   4194304 3398529000.0    810.3     78.4              temp_prod[j, :] = binMat[i,:]*binMat[j,:]
     9      2048   74562000.0  36407.2      1.7          num_double_ones  = np.s

In [156]:
def create_counts_matrix_level2_2(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range (nstates) :
        plus = binMat[i,:]*binMat #broadcasting!!
        num_double_ones  = np.sum(plus == 1, axis=1)
        counts[i, :] = num_double_ones
    return counts

In [157]:
%%timeit -n 10
create_counts_matrix_level2_2(bin_arr_11)

89.6 ms ± 16.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [158]:
%lprun -f create_counts_matrix_level2_2 create_counts_matrix_level2_2(bin_arr_11)

Timer unit: 1e-09 s

Total time: 0.128524 s
File: /var/folders/fr/xd48pkns7m1dqpzds0bm2vl80000gn/T/ipykernel_10640/2143965526.py
Function: create_counts_matrix_level2_2 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def create_counts_matrix_level2_2(binMat):
     2         1       1000.0   1000.0      0.0      nstates, n = binMat.shape
     3         1     206000.0 206000.0      0.2      counts = np.zeros((nstates, nstates), dtype=np.int8)
     4                                           
     5      2049     421000.0    205.5      0.3      for i in range (nstates) :
     6      2048   29427000.0  14368.7     22.9          plus = binMat[i,:]*binMat #broadcasting!!
     7      2048   96750000.0  47241.2     75.3          num_double_ones  = np.sum(plus == 1, axis=1)
     8      2048    1719000.0    839.4      1.3          counts[i, :] = num_double_ones
     9         1          0.0      0.0      0.0      return c

In [159]:
def create_counts_matrix_level2_3(binMat):
    return np.einsum("ij, kj -> ik", binMat, binMat)

In [160]:
np.all(create_counts_matrix_level2_3(bin_mat_11) == create_counts_matrix_level2_2(bin_arr_11))

True

In [161]:
%%timeit -n 50
create_counts_matrix_level2_3(bin_arr_11)

20.4 ms ± 388 μs per loop (mean ± std. dev. of 7 runs, 50 loops each)


In [162]:
%lprun -f create_counts_matrix_level2_3 create_counts_matrix_level2_3(bin_arr_11)

Timer unit: 1e-09 s

Total time: 0.021041 s
File: /var/folders/fr/xd48pkns7m1dqpzds0bm2vl80000gn/T/ipykernel_10640/4215511792.py
Function: create_counts_matrix_level2_3 at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def create_counts_matrix_level2_3(binMat):
     2         1   21041000.0  2.1e+07    100.0      return np.einsum("ij, kj -> ik", binMat, binMat)

### level 3: praise Numba

numba is straight up black magic

like all good black magic, it's a bit finnicky:
- don't use numba for certain stuff (non-numerical data, randomness)
- doesn't produce even remotely useful error messages.

In [163]:
from numba import njit

In [164]:
def create_counts_matrix_purepython(binMat):
    nstates, n = binMat.shape
    counts = []
    for i in range (nstates):
        counts.append([])
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx] == 1 and binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[i].append(cell_val)
    return counts

In [165]:
@njit(cache=True)
def create_counts_matrix_numba(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)

    for i in range (nstates) :
        for j in range(nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx] == 1 and binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[i, j] = cell_val
    return counts

In [166]:
_ = create_counts_matrix_numba(bin_arr_11)

In [167]:
%%timeit -n 50
create_counts_matrix_numba(bin_arr_11)

28.4 ms ± 306 μs per loop (mean ± std. dev. of 7 runs, 50 loops each)


In [168]:
%lprun -f create_counts_matrix_numba create_counts_matrix_numba(bin_arr_11)

  profile = LineProfiler(*funcs)


Timer unit: 1e-09 s

Total time: 0 s
File: /var/folders/fr/xd48pkns7m1dqpzds0bm2vl80000gn/T/ipykernel_10640/1808351567.py
Function: create_counts_matrix_numba at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           @njit(cache=True)
     2                                           def create_counts_matrix_numba(binMat):
     3                                               nstates, n = binMat.shape
     4                                               counts = np.zeros((nstates, nstates), dtype=np.int8)
     5                                           
     6                                               for i in range (nstates) :
     7                                                   for j in range(nstates):
     8                                                       cell_val = 0
     9                                                       for row_idx in range(n):
    10                                                

In [169]:
@njit(cache=True)
def create_counts_matrix_numba_2(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)
    for i in range (nstates) :
        for j in range(i, nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx] == 1 and binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[j, i] = counts[i, j] = cell_val
    return counts

In [170]:
_ = create_counts_matrix_numba_2(bin_arr_11)

In [171]:
%%timeit -n 50
create_counts_matrix_numba_2(bin_arr_11)

15.2 ms ± 234 μs per loop (mean ± std. dev. of 7 runs, 50 loops each)


In [172]:
@njit(cache=True)
def create_counts_matrix_numba_3(binMat):
    nstates, n = binMat.shape
    counts = np.zeros((nstates, nstates), dtype=np.int8)
    for i in range (nstates) :
        for j in range(i, nstates):
            cell_val = 0
            for row_idx in range(n):
                if binMat[i, row_idx]*binMat[j, row_idx] == 1:
                    cell_val += 1
            counts[j, i] = counts[i, j] = cell_val
    return counts

In [175]:
create_counts_matrix_numba_3(bin_arr_11)

array([[ 0,  0,  0, ...,  0,  0,  0],
       [ 0,  1,  0, ...,  1,  0,  1],
       [ 0,  0,  1, ...,  0,  1,  1],
       ...,
       [ 0,  1,  0, ..., 10,  9, 10],
       [ 0,  0,  1, ...,  9, 10, 10],
       [ 0,  1,  1, ..., 10, 10, 11]], dtype=int8)

In [174]:
%%timeit -n 50
create_counts_matrix_numba_3(bin_arr_11)

7.61 ms ± 193 μs per loop (mean ± std. dev. of 7 runs, 50 loops each)


#### what have we learned

- <h4>write code that works first!</h4>
- if it's slow, use a profiler (e.g. line_profiler, cProfile + snakeviz) to figure out how slow it is and where it's slow
- vectorize, rewrite to minimize function calls, @njit