# NumPy

NumPy is a first-rate library for numerical programming

* Widely used in academia, finance and industry
* Mature, fast, stable and under continuous development


## Introduction

The essential problem that NumPy solves is fast array processing

For example, suppose we want to create an array of 1 million random draws from a uniform distribution and compute the mean

If we did this in pure Python it would be orders of magnitude slower than C or Fortran

In NumPy this would be dealt with like this:

In [1]:
import numpy as np
x = np.random.uniform(0, 1, size=1000000)
x.mean()

0.49955965776662287

#### A Comment on Vectorization

NumPy is great for operations that are naturally vectorized

Vectorized operations are precompiled routines that can be sent in batches, like

* matrix multiplication and other linear algebra routines
* generating a vector of random numbers
* applying a fixed transformation (e.g., sine or cosine) to an entire array


## NumPy Arrays

The most important thing that NumPy defines is an array data type formally called a `numpy.ndarray`

To create a numpy array of only zeros:

In [4]:
z = np.zeros(3)

In [5]:
type(z)

numpy.ndarray

In [8]:
z

array([0., 0., 0.])

numpy arrays are like python lists except:

* The data must be homogenous - all elements of the same type
* The types must be one of the types (`dtypes`) provided by numpy

The most important ofthe `dtypes` are:

* int64
* float64
* bool

There are other dtypes representing complex numbers, unsigned integers, etc. google for it

The default for darray is `float64` 

In [7]:
type(z[0])

numpy.float64

If we want to for instance use integers instead of float:

In [10]:
a = np.zeros(3, dtype=int)

In [11]:
type(a[0])

numpy.int64

### Shape and Dimension

In [16]:
z = np.zeros(10)

In [17]:
z

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

Here `z` is a flat array of no dimension - no row or column vector

We can use the attribute `shape` to see the dimensions:

In [18]:
z.shape

(10,)

Here the tuple only has one element, 10, which is the lenght of the array

To give it dimension, we can change the shape attibute 

In [19]:
z.shape = (10,1)

In [20]:
z

array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]])

In [21]:
z.shape

(10, 1)

So now it has 10 rows, and 1 column

In [22]:
z.shape = (10,2)

ValueError: cannot reshape array of size 10 into shape (10,2)

So we can't give it a shape that doesn't make sense. Trying to make it a $10\times2$ matrix fails as there are only a total of 10 elements, not 20

In [24]:
z.shape= (2,5)

In [25]:
z

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [27]:
z = np.zeros(4)
z.shape = (2,2)

In [28]:
z

array([[0., 0.],
       [0., 0.]])

We could also have used the following, so that we don't have to reshape it:

In [29]:
z = np.zeros((2,2))

In [30]:
z

array([[0., 0.],
       [0., 0.]])

### Creating Arrays

We can also create ones and the identity matrix:

In [32]:
np.ones((2,2))

array([[1., 1.],
       [1., 1.]])

In [33]:
np.identity(2)

array([[1., 0.],
       [0., 1.]])

For identity, the parameter $n$ creates an $n\times n$ identity matrix

We can create an empty array in memory that can be populated later:

In [35]:
z = np.empty(3)

In [36]:
z

array([1.28822975e-231, 3.11109762e+231, 1.28822975e-231])

The values are garbarge - they represent whatever is in the memoery locations allocated.

We can also greate a grid of evenly spaced numbers:

In [37]:
np.linspace(2,4,5)

array([2. , 2.5, 3. , 3.5, 4. ])

So that creates an array starting at 2, ending at 4 that has a total of 5 values evenly spaced

We can also make arrays from Python datatypes (provdied they can be mapped to `dtypes`)

In [39]:
z = np.array([10,20])
z

array([10, 20])

In [40]:
type(z)

numpy.ndarray

We can also specify the type:

In [44]:
z = np.array([10,20], dtype=float)   ### float is equivalent to np.float64
z

array([10., 20.])

Multiple dimensions:

In [45]:
z = np.array([[1,2],[3,4]])
z

array([[1, 2],
       [3, 4]])

In [46]:
z.shape

(2, 2)

There is also a function `np.asarray` that does not make a distinct copy of the data already in a NumPy array

In [49]:
na = np.linspace(2,4,5)
np.array(na)

array([2. , 2.5, 3. , 3.5, 4. ])

To check if it's copied or not:

In [50]:
na is np.array(na)

False

So a new copy was made.

In [51]:
na is np.asarray(na)

True

So if did not make a new copy

### Array Indexing

You can access elements in a flat array just like python

In [52]:
z = np.linspace(1,2,5)
z

array([1.  , 1.25, 1.5 , 1.75, 2.  ])

In [53]:
z[0]

1.0

In [54]:
z[0:2]

array([1.  , 1.25])

In [55]:
z[-1]

2.0

For 2d arrays its similar:

In [56]:
z = np.array([[1,2],[3,4]])
z

array([[1, 2],
       [3, 4]])

In [58]:
z[0,0]

1

In [59]:
z[0,1]

2

So its `array[row,column]` but slicing also can be applied

In [60]:
z[0,:]  # return the first row

array([1, 2])

In [61]:
z[1,:]

array([3, 4])

In [62]:
z[:,0]   # return the first column as an array 

array([1, 3])

In [64]:
z[:,:]  # obviously all the rows and columns

array([[1, 2],
       [3, 4]])

In [65]:
z[-1,:]   # last row

array([3, 4])

And so on

Numpy arrays or integers can also be used to extract data:

In [68]:
z = np.linspace(1,2,5)
i = np.array([0,1,3])
z[i]   # returns the 1s, 2nd and 4th elements

array([1.  , 1.25, 1.75])

the `dtype` `bool` can also be used:

In [70]:
d = np.array([0,1,1,0,1], dtype=bool)
d

array([False,  True,  True, False,  True])

In [72]:
z[d]   # the 2nd, 3rd and 5th elements

array([1.25, 1.5 , 2.  ])

All elements of an array can be set to the same value using slicing:

In [74]:
z = np.empty(5)
z[:] = 42

In [75]:
z

array([42., 42., 42., 42., 42.])

In [76]:
z[2:3] = 21

In [77]:
z

array([42., 42., 21., 42., 42.])

In [78]:
### Array Methods

In [79]:
a = np.array([4,3,2,1])

In [81]:
a.__dir__()

['__repr__',
 '__hash__',
 '__str__',
 '__lt__',
 '__le__',
 '__eq__',
 '__ne__',
 '__gt__',
 '__ge__',
 '__iter__',
 '__add__',
 '__radd__',
 '__sub__',
 '__rsub__',
 '__mul__',
 '__rmul__',
 '__mod__',
 '__rmod__',
 '__divmod__',
 '__rdivmod__',
 '__pow__',
 '__rpow__',
 '__neg__',
 '__pos__',
 '__abs__',
 '__bool__',
 '__invert__',
 '__lshift__',
 '__rlshift__',
 '__rshift__',
 '__rrshift__',
 '__and__',
 '__rand__',
 '__xor__',
 '__rxor__',
 '__or__',
 '__ror__',
 '__int__',
 '__float__',
 '__iadd__',
 '__isub__',
 '__imul__',
 '__imod__',
 '__ipow__',
 '__ilshift__',
 '__irshift__',
 '__iand__',
 '__ixor__',
 '__ior__',
 '__floordiv__',
 '__rfloordiv__',
 '__truediv__',
 '__rtruediv__',
 '__ifloordiv__',
 '__itruediv__',
 '__index__',
 '__matmul__',
 '__rmatmul__',
 '__imatmul__',
 '__len__',
 '__getitem__',
 '__setitem__',
 '__delitem__',
 '__contains__',
 '__new__',
 '__array__',
 '__array_prepare__',
 '__array_wrap__',
 '__array_ufunc__',
 '__sizeof__',
 '__copy__',
 '__deepcop

In [84]:
a.sort()   # Inplace sorting 
a

array([1, 2, 3, 4])

In [86]:
a.sum()     # Sum elements

10

In [87]:
a.mean()

2.5

In [88]:
a.max()

4

In [90]:
a.argmax()   # Returns index of maximal element

3

In [91]:
a.cumsum()   # Cumulative sum

array([ 1,  3,  6, 10])

In [92]:
a.cumprod()   # Cumulative product

array([ 1,  2,  6, 24])

In [93]:
a.var()   # Variance

1.25

In [94]:
a.std()   # std deviation

1.118033988749895

In [95]:
a.shape=(2,2)
a

array([[1, 2],
       [3, 4]])

In [96]:
a.T    # Transpost, equivalent to a.transpose()

array([[1, 3],
       [2, 4]])

In [100]:
a.transpose()

array([[1, 3],
       [2, 4]])

If `z` is nondecreasing array, then `z.searchsorted(a)` returns the index of the first element of `z` that is `>= a` 

In [103]:
z = np.linspace(2,4,5)
z

array([2. , 2.5, 3. , 3.5, 4. ])

In [106]:
z.searchsorted(2.2)   # 2.5 is the first element that is >= 2.2

1

Many of the functions have equivalents in the numpy namespace:

In [107]:
a = np.array([4,3,2,1])
a

array([4, 3, 2, 1])

In [108]:
np.sort(a)     # not done in place

array([1, 2, 3, 4])

In [109]:
a

array([4, 3, 2, 1])

In [110]:
np.sum(a)

10

In [111]:
np.std(a)

1.118033988749895

and so on

### Operations on arrays

#### Arithmetic operations

The arithmitic operators, `+`, `-`, '/', `*` and `**` act *elementwise* on arrays

In [3]:
a = np.array([1,2,3,4])
b = np.array([5,6,7,8])

In [4]:
a + b

array([ 6,  8, 10, 12])

In [5]:
a * b

array([ 5, 12, 21, 32])

In [6]:
a + 10

array([11, 12, 13, 14])

In [7]:
a * 10

array([10, 20, 30, 40])

2D arrays work in a similar way

In [10]:
a = np.ones((2,2))
b = np.ones((2,2))

In [11]:
a + b

array([[2., 2.],
       [2., 2.]])

In [12]:
 a - b

array([[0., 0.],
       [0., 0.]])

In [13]:
a * b

array([[1., 1.],
       [1., 1.]])

`A * B` is not the matrix product, it is an element-wise product

#### Matrix Multiplication

In [14]:
A = np.ones((2,2))
B = np.ones((2,2))

In [15]:
A @ B

array([[2., 2.],
       [2., 2.]])

That was the dot product ${A}\cdot {B}$

In older version of python, `np.dot` functions should be used

In [17]:
np.dot(A,B)

array([[2., 2.],
       [2., 2.]])

We can also use `@` to take the dot product of two flat arrays

In [18]:
a = np.array([1,2,3])
b = np.array([4,5,6])
a @ b   # should be 1*4 + 2*5 + 3*6 = 32

32

We can also use `@` when one element is a python list or tuple

In [29]:
A = np.array([[1,2],[3,4]])
A.shape

(2, 2)

In [30]:
b = A @ (0,1)
b

array([2, 4])

In [31]:
b.shape

(2,)

Since we are postmultiplying, the tuple is treated as a column vector, however the result is not a column vector, it is a flat array. For proper a proper column vector result we actually need to take the dot product of two arrays, as in ${n \times m} \cdot {m \times 1} = {n \times 1}$

That would be:

In [26]:
A = np.array([[1,2],[3,4],[4,5]])   # 3 x 2
A.shape

(3, 2)

In [24]:
B = np.array([0,1])
B.shape = (2,1)  # 2 x 1
B    

array([[0],
       [1]])

In [27]:
c = A @ B   # 3 x 1
c

array([[2],
       [4],
       [5]])

In [28]:
c.shape

(3, 1)

### Mutability and copying Arrays

NumPy arrays are mutable datatypes, like Python lists

i.e. their contents can be mutated/altered in memery after initialization


In [32]:
a = [1,2,3]
b = a
b[1] =3
a


[1, 3, 3]

Freaky so we changed a by changing b.... Let see that work for NumPy arrays

In [34]:
a = np.array([2,4])
a

array([2, 4])

In [35]:
a[-1] = 0
a

array([2, 0])

In [36]:
a = np.random.randn(3)
a

array([0.14761356, 0.51651361, 0.99145772])

In [37]:
b = a
b[0] = 0
a

array([0.        , 0.51651361, 0.99145772])

So, once again we changed a by changing b

What's going on here? 

The name `b` is bound to `a` and just becomes another reference to the array

This means that we pass around pointers to data rather than making copies, so `a` and `b` both reference the same pointer



#### Making copies

It is possible to make an independant copy of `a` if required

In recent versions of NumPy it's best to use the `np.copyto` function

In [38]:
a = np.random.randn(3)
a

array([1.17716924, 1.37563437, 0.18292832])

In [39]:
b = np.empty_like(a)   # make an empty array that looks like a
np.copyto(b,a)
b

array([1.17716924, 1.37563437, 0.18292832])

Now b is an independant (deep) copy of a

In [40]:
b[0] = 0
b

array([0.        , 1.37563437, 0.18292832])

In [41]:
a

array([1.17716924, 1.37563437, 0.18292832])

## Additional Functionality

### Vectorized Functions

NumPy provides vectorized versions of the standard functions `log`,`exp`, `sin`, etc. that act *element-wise* on arrays

In [42]:
z = np.array([1,2,3])
np.sin(z)

array([0.84147098, 0.90929743, 0.14112001])

This means we dont need to write something like this to do our own element-wise methods:

In [43]:
n = len(z)
y = np.empty(n)
for i in range(n):
    y[i] = np.sin(z[i])

In [44]:
y

array([0.84147098, 0.90929743, 0.14112001])

Because they act element-wise on arrays, these functions are called *vectorized functions*

In NumPy-speak, they are also called *ufuncs*, which stands for “universal functions”

As we say earlier, the arithmetic operators also work element-waise, so combining them with ufuncs give a very rich set of fast element-wise functions 

In [45]:
z

array([1, 2, 3])

In [46]:
(1 /np.sqrt(2 * np.pi)) * np.exp(-0.5 * z**2)

array([0.24197072, 0.05399097, 0.00443185])

So we applied ${1 \over \sqrt {2 \pi}} e ^ {-{1 \over 2} {z_i}^2}$ to each element ${z_i}$ of the array $z$

Not all user defined functions will act element-wise. For example the function below will fail if passed in an array

In [56]:
def f(x):
    return 1 if x>0 else 0

In [57]:
f(z)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The function `np.where` provides an alternative:

In [58]:
z = np.random.randn(4)
z

array([-0.05874416, -1.47498184,  0.47025275,  1.3347185 ])

In [59]:
np.where(z >0, 1, 0)   # Insert 1 if element > 0, otherwise 0

array([0, 0, 1, 1])

You can also use `np.vectorize` to vectorize a user function, but beware, it may not vectorize well and might be slower than an element-wise version as per above user function

In [60]:
f = np.vectorize(f)
f(z)

array([0, 0, 1, 1])

### Comparisons

As a rule, comparisons are done element-wise

In [61]:
x = np.array([2,3])
y = np.array([2,3])
x == y

array([ True,  True])

In [62]:
y[0] = 1
x == y

array([False,  True])

In [63]:
x != y

array([ True, False])

For `>`, `<`, `>=`, and `<=` is similar

In [64]:
z = np.linspace(0,10,5)
z

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [65]:
z > 3

array([False, False,  True,  True,  True])

This is useful for conditional extraction 

In [67]:
b = z > 3
b

array([False, False,  True,  True,  True])

In [68]:
z[b]

array([ 5. ,  7.5, 10. ])

This can also be done in one step:

In [69]:
z[z > 3]

array([ 5. ,  7.5, 10. ])

## Exercises