# Numpy Tutorial
Joshua Stough, 202-

[Numpy](https://numpy.org/) is a powerful toolkit for the handling of large N-dimensional arrays or matrices, conveniently wrapped in Python. Given the relative inefficiencies of handling large lists of numbers in native Python, Numpy [`ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html)s have become the backbone of scientific computation in Python. In this notebook we'll first show some time comparisons between `ndarray` and Python list basic functions, then explore a some of the very useful Numpy methods. Implicit in some of the later material will be the use of Matplotlib to visualize some of the multidimensional array data that we're manipulating.

1. [**Speed Comparisons**](#speedup)
1. [**Numpy Essentials**](#essentials)

## Imports

In [18]:
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np
import sys
import dis
# Alternative to using the timeit magic command:
# from timeit import timeit

<a id='speedup'></a>
## Speed Comparisons
We know Python lists as a really easy-to-use dynamic list implementation that includes powerful object-oriented and functional interactions, where elements of a list can be of any type. This flexibility of the list type is accomplished through the use of pointers in a way that is hidden from the programmer--under the hood. Numpy on the other hand constrains all elements of an `ndarray` to be of the same type; under the hood, numpy arrays can be stored in contiguous memory with constant access time. 

A tremendous speedup results from executing numpy's compiled library code on contiguous memory, versus Python's native implementations that require dealing with pointer indirection. 

To demonstrate this, we'll make a Python list of random integers using numpy's [randint](https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html), a (uniformly distributed) random number generator. We'll also make a numpy `ndarray` copy of that list. This copy will be in memory that is independent of the list, which we are just copying the values from. 

Then we can test certain functionality on the `list` and `ndarray` collections separately, using IPython's [`timeit` magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit).. 

In [26]:
# Make a list of random integers in [0,10M)
lyst = [np.random.randint(0,10000000) for x in range(1000)]
np_lyst = np.array(lyst)

In [27]:
%%timeit
min(lyst)

10.3 µs ± 8.53 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


The expression we're timing is `min(lyst)`, which is the Python function to return the minimum of any iterable of comparable types in Python. The default output of `timeit` tells us with some degree of certainty how much wall time calculating `min(lyst)` is likely to take, probably in units of microseconds $\mu s$. `timeit` computes this by taking several runs (default 7) and each time executing the expression potentially thousands of times to get to some accuracy in the measurement.

In [28]:
%%timeit
np.min(np_lyst)

3.03 µs ± 16.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


You should notice a significant speedup in the `np.min` result over the Python `min` result. On my machine I observe ~3x speedup on a list of length 1000 (1K). In that there is always some baseline overhead cost due to the Python interpreter (as opposed to compiled library calls), you may notice that the speedup improves the larger the list size is made to be. I observed a ~12x speedup on a list size of 10K, and a 15-20x speedup beyond 100K. 

In [5]:
np_lyst.dtype

dtype('int64')

In [6]:
# Using getsizeof to ask Python the size in bytes of a Python variable.
sys.getsizeof(lyst[0])

28

In [7]:
type(lyst[0])

int

One significant note here is that there is a big precision reduction: the *integer* primitive in Python is for arbitrary magnitude integral values and requires 28 bytes, while the implicitly determined `np.int64` requires only 8 bytes. In conjunction with the direct packing (instead of pointer packing) in memory, the `ndarray` variable `np_lyst` requires much less memory than the Python variable `lyst`, and likely engenders much better cache coherence (read: average memory access time). Read more about it [here](https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference) or [here](https://medium.com/@gough.cory/performance-of-numpy-array-vs-python-list-194c8e283b65). Additionally it is possible to write your own efficient C code implmentations that can be called from Python: read (much) more [here](https://medium.com/analytics-vidhya/beating-numpy-performance-by-extending-python-with-c-c9b644ee2ca8).

Computing the minimum of a collection is a straightforward linear complexity problem, $O(n)$. We'll now try another example, this time of sorting a `list/ndarray`, which you'll remember generally has complexity $O(n\log{}n)$.

In [11]:
%%timeit
sorted(lyst)

885 µs ± 197 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [12]:
%%timeit
np.sort(np_lyst)

362 µs ± 33 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Both the [`sorted`](https://docs.python.org/3.7/howto/sorting.html#sortinghowto) and [`np.sort`](https://numpy.org/doc/stable/reference/generated/numpy.sort.html) methods are functions in the sense that they do not modify the argument collection (whether `list` or `ndarray`), but rather return a sorted copy of that collection. Alternative object-oriented in-place `sort()` and [`ndarray.sort()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.sort.html#numpy.ndarray.sort) methods are also available. However, for the purposes of timing, sorting an already sorted list is pretty uninteresting. In the sorting experiment above, I observed ~2x speedup using the Numpy equivalent on list size 1K.

In [29]:
%%timeit
[4*x for x in lyst]

29 µs ± 9.66 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [30]:
%%timeit
list(map(lambda x: 4*x, lyst))

51.8 µs ± 43.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [31]:
%%timeit
4*np_lyst

730 ns ± 2.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


The final timing examples above show the efficiency of Python list comprehensions and the equivalent numpy expressions. The above list comprehensions (or `list(map(...`) create a new list object where every element of the original list has been scaled by a constant. Such Pythonic expressions execute in interpreted bytecode, which can be quite a bit slower than comparative compiled C. Using smart [operator overloading](https://docs.python.org/3/reference/datamodel.html#special-method-names), the creators of Numpy encoded equivalent functionality that executes almost entirely in C, resulting in significant speedup (e.g., ~70x on list size 10K in my observations). We can actually observe this efficiency indirectly just in the amount of bytecode involved in the competing approaches, using Python's [disassembler module](https://docs.python.org/3/library/dis.html).

In [39]:
dis.dis('[4*x for x in lyst]')

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f241af05540, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (lyst)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f241af05540, file "<dis>", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                12 (to 18)
              6 STORE_FAST               1 (x)
              8 LOAD_CONST               0 (4)
             10 LOAD_FAST                1 (x)
             12 BINARY_MULTIPLY
             14 LIST_APPEND              2
             16 JUMP_ABSOLUTE            4
        >>   18 RETURN_VALUE


In [38]:
dis.dis('4*np_lyst')

  1           0 LOAD_CONST               0 (4)
              2 LOAD_NAME                0 (np_lyst)
              4 BINARY_MULTIPLY
              6 RETURN_VALUE


As an aside, `ndarray`s are static in that they cannot change size easily (without $O(n)$ cost). While this is a deficiency relative to Python's dynamic list implementation (with $O(1)$ cost to append), this is not much of a problem usually. 

Across computational workflows much more complicated than the above simple experiments demonstrate, people have found Numpy to be up to 100 times faster than native Python. To be clear, `ndarray`s cannot easily replace all Python lists in our lives: we are after all programming in Python! But in computationally intensive endeavors like image processing, where dealing with millions of pixels is an every second kind of thing, we will have to leverage Numpy's massive speedups.

<a id='essentials'></a>
## Numpy Essentials

Lot of stuff goes here maybe.   argmin/max, swapping dimensions, [where/filter], [histogram](https://numpy.org/doc/stable/reference/generated/numpy.histogram.html)

In [20]:
ll = map(lambda x: 4*x, lyst)

In [21]:
ll

<map at 0x7f241ac099e8>

In [41]:
np.__version__

'1.19.1'