# Intro to NumPy

Hello and welcome to your NumPy primer as part of NCI's Parallel Python data science course.

This notebook is designed to cover the fundamentals of `NumPy` very briefly to give you a basis to use it as part of your data analytics workflow. 


In [None]:
import os
# The jupyter notebook is launched from your $HOME directory.
# Change the working directory to the folder
# which was created in your username directory under /scratch/vp91

#TODO 
os.chdir(os.path.expandvars("/scratch/vp91/$USER/Data-Analytics/"))


## Intro

What is NumPy? To take the simplest explanation, NumPy is the most fundamental package for numerical computing in Python. 
It provides a key functionality that Python lacks, array computing! I'll explain what I mean by array computing later. 

NumPy provides the basic building blocks for the numerical aspects of the PyData stack and is the underlying data representation in a whole host of packages (Think Pandas, Dask, ....)

Alright, lets jump in! 

#### Adding two vectors **without** NumPy

We are going to first explore what the world would look like without NumPy by trying to add two vectors of integers together. 

In vector notation this could be written as:

$A + B = X$

where A, B and X are vectors

To do this in pure python we would use two lists:

In [None]:
# create our two vectors
A = [0,1,2,3,4,5,6,7,8,9,10,11]
B = [10,11,12,13,14,15,16,17,18,19,20,21]

In [None]:
# lets loop over it using a numerical index and a for loop

# check the vectors are the same length
assert(len(A) == len(B))
n_elem = len(A)
X = []
for i in range(n_elem):
    X.append(A[i] + B[i])
    
X 

Okay this worked but it did seem a little verbose and clunky. If we were going to do things this way we might as well use a compiled language.

It is also a little bit slow when our lists get bigger. Lets do some basic timings for a 100k element list

In [None]:
# construct our lists
N = 100000
A = []
B = []
for i in range(N):
    A.append(i)
    B.append(i)

Now lets time adding them together using the `%%timeit` Jupyter cell magic which does several runs of the cell and measures its execution time. Very handy!

In [None]:
%%timeit
X = []
for i in range(N):
    X.append(A[i] + B[i])

On my laptop this took ~ 17 ms give or take. This may sound fast but this operation is very common and can be orders of magnitude faster in compiled languages like C or Fortran. 

**We can do better**


### Adding two vectors with NumPy

Numpy is primarily library for doing array computing among other things. The wikipedia definition of array computing is :

"In computer science, array programming refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings."

What does this mean for us?

We can do away with the loops over the number of elements and add our two vectors together directly. Lets see this in action.

In [None]:
# import numpy
import numpy as np

First lets make arrays out of the 100K element lists we used before.

In [None]:
A_numpy = np.array(A)
B_numpy = np.array(B)
A_numpy

We will go more into the attributes of arrays later, but arrays have a handy attribute called `shape`

In [None]:
A_numpy.shape

We can see the array has a `shape` of 100000 in the first dimension

Now lets add our arrays together using array computing !!

In [None]:
X_numpy = A_numpy + B_numpy

Wow that was a lot easier than our janky for loop but is it faster?

In [None]:
%%timeit

X_numpy = A_numpy + B_numpy

On my laptop this is ~ 300x faster!!!

But how? To cut a long story short, the numerically  intensive routines in NumPy  are written in C/C++ (sometimes Cython) giving us the speed of a compiled language with the ease of writing Python.

### Challenge

Multiply the vectors A and B to form array C using  NumPy

In [None]:
# compute C = A*B

<details><summary><b>Solution</b></summary>
   <pre>
    <br> C_numpy = A_numpy* B_numpy
   </pre>
</details>

What else can array computation with NumPy give us?

There are also vectorised (a word used to described vector style operations) implementations of many special functions. For example



In [None]:
sin_B = np.sin(B_numpy)
sin_B

In [None]:
mean_A = np.mean(A_numpy)
mean_A

### Slicing and  multi-dimensional arrays

Arrays can be sliced just like a list using `start:stop:step` notation

In [None]:
# construct a range of values using arange
r = np.arange(20)
r

In [None]:
# grab every element 
r[:]

In [None]:
# grab the first ten elements
r[:10]

In [None]:
# grab the 10th to 15th elements 
r[10:15]

In [None]:
# grab the last 10 elements 
r[-10:]

In [None]:
# grab every second element 
r[::2]

Arrays can have multiple dimensions! this is the real power of NumPy as you will hopefully see in the Parallel python course.

In [None]:
# make a range and then reshape it to by a two dimensional array
N = 20
two_dimensional = np.arange(3*N).reshape(N,3)
two_dimensional

In [None]:
two_dimensional.shape

We can see that our array is two dimensional with shape `20,3`. Slicing works similarly for multidimensional arrays with the indicies ordered row, column


In [None]:
# slice to get a row  
two_dimensional[0,:]


In [None]:
#slice to get a column
two_dimensional[:,1]


In [None]:
# slice to get an individual value  (the first one)
two_dimensional[0,0]

### Using NumPy functions on multidimensional arrays

If you only wish to compute a function or property on a single dimension you can provide an axis using the `axis=X` keyword.

In [None]:
# mean in both dimensions

np.mean(two_dimensional)

In [None]:
#mean in the first dimension
np.mean(two_dimensional, axis=0)

In [None]:
# mean in the second dimension
np.mean(two_dimensional, axis=1)

### More complicated operations

NumPy also supports arbitrary linear algebra operations.
The full scope of these is beyond this course, but just as an example lets compute the square of the matrix $Z$

\begin{matrix}
1 & 2 & 3\\
4 & 5 & 6\\
7 & 8 & 9
\end{matrix}


$Z^2 = ZZ$


In [None]:
Z = np.arange(9).reshape(3,3)

z_squared = np.matmul(Z, Z)
z_squared

The same can be achieved using the matmul operator `@`

In [None]:
z_squared = Z@Z
z_squared

### More than two dimensions

We can make arrays up to 32 dimensions. Lets make two 3D tensors X and Y and find their tensor product $ X \otimes Y$

In [None]:
N = 20
X = np.arange(N*3*3).reshape(N,3,3)
Y = np.arange(N*3*3).reshape(N,3,3)
X.shape


In [None]:
res = np.tensordot(X, Y, axes=0)


In [None]:
res.shape

### Summary

The maths we have done here is not really important. In fact I am terrible at linear algebra myself.

What is important is that you can see the immense power of array computation using numpy and have a feel for how to work with arrays rather than lists or individual numbers. 

The power of NumPy is incredible and I guarantee that if you have a numerical  problem it can be made simpler with NumPy!

Hopefully this has given you just a small taste of what you can do with numpy and helps you out with the rest of the course. 