In [1]:
import numpy as np

# NumPy basics

So, you've just started a machine learning course. Perhaps you've never programmed before or perhaps you've programmed in some other language (like MatLab...yuck). Or maybe you're just interested in trying out some ML in Python. Either way, `NumPy` is an invaluable tool in any practitioner's arsenal.

## So what is `NumPy` anyways?
`NumPy` (short for **Num**erical **Py**thon) is a Python library used to perform linear algebra operations quickly and consisely. 

A *library* is a piece of code written by someone else that you can use in your program. Libraries are usually used by calling *functions*, which take parameters and return values based on the values of the parameters. One advantage of this is so-called *abstraction*, where we don't need to worry about how the library works, so long as it works correctly.

`NumPy` works on a central object - the **array**. An array can be thought of as a list of fixed size, that contains data of a particular *type*. For our purposes, we will consider only consider numerical types i.e. `int` and `float`. `NumPy` also provides functions for efficient vector and matrix operations on these arrays. For our purposes, we don't need to know _why_ they're efficient; it suffices to know that they are. However, if you're interested in the why, please see the bonus sections at the end of the notebook.

Throughout this notebook, there will be examples. The examples aren't comprehensive, and it's not necessary to memorise all the examples. Sometimes several examples will show how to do the same thing but in a slightly different way each time. What's important is to understand why these steps are being performed. If it's unclear, please shoot me an email or raise a GitHub issue. 

### Documentation
It is important to stress throughout this that precious few people will know everything to know about `NumPy`, or indeed any library. As such, the developers helpfully provide documentation, which can be found [here](https://numpy.org/doc/stable/). The API reference can be found [here](https://numpy.org/doc/stable/reference/index.html#reference) and will be more useful if you have a specific question you need answering. The ability to read documentation well and to be able to quickly locate relevant information is a skill well worth developing, and I encourage you to use the docs to hone this. 

If you don't know something or have a specific question, your best bet is to search the documentation.

## Initialising arrays
Okay, let's take a look at some simple `NumPy` procedures. We start by initialising some arrays. The minimum amount of information needed to initialise an array is its size. 

This code initialises an array containing only zeroes. This array has a fixed length of 5.

In [2]:
xs = np.zeros(5)
xs

array([0., 0., 0., 0., 0.])

We can do the same with ones.

In [9]:
xs = np.ones(5)
xs

array([1., 1., 1., 1., 1.])

We can even just initialise the array without filling it with data. This is more efficient than previous methods, but may result in an array that initially contains random data.

In [10]:
xs = np.empty(5)
xs

array([1., 1., 1., 1., 1.])

In this case, it contains the contents of the previous array initialisation. This is **not** a coincidence - those interested in the why can read the first bonus section at the end of the notebook.

Arrays are also zero-indexed (just like lists). As such, we can access positions 0-4, but not 5.

Remember, arrays are of a fixed length. As such, if we know the data _a priori_ (beforehand), we can declare the array as follows

In [3]:
xs = [1,4,3,0,7]
ys = np.array(xs)
ys

array([1, 4, 3, 0, 7])

## Using arrays as glorified lists
We can use arrays in much the same way as lists. In general, you will want to avoid doing this, but as with anything in programming, you will sometimes need this fine grained control.

Let's see how to access elements

In [11]:
xs = [1, 4, 3, 0, 7]
ys = np.array(xs)
ys[0], ys[1], ys[2], ys[3], ys[4]

(1, 4, 3, 0, 7)

We now know we can access arrays by element. However, if we want to be able to iterate over arrays by position, we need to know when to stop.

To do this, `NumPy` has several methods:

- `ndim` gives the number of dimensions of the array
- `size` gives the total number of elements of the array
- `shape` returns a tuple containing the length of each dimension of the array

Let's demonstrate this below

In [13]:
xs = np.zeros([5,3,2])
assert 3 == xs.ndim, "Wrong dimension: %d" % xs.ndim
assert 5*3*2 == xs.size, "Wrong size: %d" % xs.size
assert (5, 3, 2) == xs.shape, "Wrong shape: %s" % str(xs.shape)

The above code also makes use of a useful new construct: **assertions**. An assertion will not do anything if the expression on the left evaluates to `True`. If the left hand expression evaluates to `False`, then the assertion will throw an exception containing the message on the right hand side. Asserti

## Where does Linear Algebra come in?
Despite what it looks like, you've already seen some examples of vectors and matrices. `NumPy` uses a single dimensional array to represent a vector, and a two dimensional array to represent a matrix. We can do operations on them like follows.

In [4]:
xs = np.linspace(0, 10, 5, endpoint=True)
ys = np.array([1,3,4,2,5])

xs@ys

92.5

## Bonus 1 - why are arrays blazing quick?
Okay, strap yourself in for a quick computer architecture lesson.

We need to break this question down into two parts:

- Quick in comparison to what?
- Why are they quick in comparison to this other structure?

We will deal with each point in turn

### What are `NumPy` arrays quick in comparison to?

To the first point, `NumPy` arrays are quick in comparison to Python lists. They're probably quicker than most other Python datastructures also. However, lists have no fixed size, and can hold objects of several different types. The list's flexibility makes them very useful in a language like Python, and lists will be acceptable for most cases the average programmer will come across. However, when it comes to computationally intensive calculations, such as those found in machine learning, Python's list doesn't cut it any more.


### Bonus bonus - how much quicker are `NumPy` arrays
Let's do some practical tests