<center><img src=img/MScAI_brand.png width=70%></center>

# Numpy Philosophy

Short video/notebook with notes on: why Numpy?

* Speed
* Abstraction
* A library of pre-written functions

Numpy ("Numerical Python", pronounced NUM-pie, not NUM-pee) is a library used in Python for numerical computing. Most scientific computing work in Python relies on Numpy as a base.

But we can already do numerical calculations in Python, so why does Numpy exist? 

1. **Speed**. Numpy makes many numerical calculations much faster.

2. **Abstraction**. It is very handy to be able to think of our equations as (e.g.) $y = \beta x$ as opposed to $y_0 = \beta x_0$, $y_1 = \beta x_1$, etc., even though they mean the same thing.

3. **A library**. Numpy provides many common functions. "Batteries included."

### Numpy array

"NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers" - https://numpy.org/devdocs/user/quickstart.html

### Vectorisation

Instead of writing a `for`-loop to process each item of a Python `list`, we write a single line to process all elements of a Numpy `array`, "all at once". 

This is called *vectorisation*. The same concept is essential to good performance in Matlab and R.


Compare:

```python
L = [1, 2, 3, 4]
s = 0
for x in L:
    s += x
```


And:

```python
    
x = np.array(L)
s = x.sum()
```


From the point of view of *abstraction*, the for-loop is hidden. From the point of view of *speed*, the for-loop is moved from pure Python into an underlying function written in C or Fortran.

When dealing with large data, Python can be slow. If we have a list of 10 numbers and we calculate the mean, it is instantaneous. But if we have 10 million numbers, it will be slow. The reason for this is Python's *flexibility*. Python allows a list to contain any type of value, eg we can have a mixed list of `int`s, `float`s, `string`s, other `list`s, and so on. Python has to check what type each value is before deciding how to add it (or whether it even *can* add it). 

In a Numpy array, all elements are of the same type, e.g. all `float`. Thus there is no need for Python to waste time checking what type each value is. The saving is probably a factor of 100, depending on the workload.

### Further reading

* Great introduction: https://jalammar.github.io/visual-numpy/
* A cheat sheet: https://www.dataquest.io/blog/numpy-cheat-sheet/
* Exercises (many quite difficult): https://github.com/rougier/numpy-100
* Nice textbook reference on Numpy with several longer worked examples: http://www.labri.fr/perso/nrougier/from-python-to-numpy/
* If you're already good at Matlab: https://www.numpy.org/devdocs/user/numpy-for-matlab-users.html
* If you're already good at R: http://mathesaurus.sourceforge.net/r-numpy.html