# Numpy

For Python, the standard library that is used for working with linear algebra is numpy. In fact, most other libraries (like SKLearn and Pandas) that use linear algebra use Numpy under the hood to do the heavy lifting for them. Numpy is commonly imported as `np`. In this notebook we will look at some of the most basic numpy operations, to get a bit more familiar.

In [1]:
import pandas as pd
import numpy as np

# Array

The standard datatype in Numpy is an Array. Arrays are much like a List in Python, in that they store values in a consecutive index, however in an array all values are of a certain datatype while Python Lists can be mixed. There is a trick in which the datatype of an array is `object` meaning that they can store any Python value, but this more or less removes the advantages that Numpy has namely very fast computations. 

Arrays are fixed in size, the can be multi-dimensional. If you wish to extend an array, the trick is to horizontally or vertically stack arrays together. However, the other dimensions must match for this to work. 

In [2]:
l1 = [1, 2, 3, 4, 5]
a = np.array(l1)
a

array([1, 2, 3, 4, 5])

In [3]:
l2 = [[1,2,3], [5,4,1], [3,6,7]]   # two dimensional, rows * columns
b = np.array(l2)
b

array([[1, 2, 3],
       [5, 4, 1],
       [3, 6, 7]])

In [8]:
b[1,1:]

array([4, 1])

There are a few special constructors like `ones` and `zeros` that will create a new array of the given size with all resp. ones and zeros. The size should be a tuple. The `eye` constructor gives a squared identity matrix.

In [4]:
np.ones((3,2))

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [5]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

We can also create arrays with a range of numbers or with random numbers using `arange`, `linspace` and `random`. These result in a vector, which we can reshape into any other size.

In [6]:
np.arange(2, 10)

array([2, 3, 4, 5, 6, 7, 8, 9])

In [7]:
np.linspace(2, 4, 5)   # interpolates 5 values in the range 2-4, upper bound is inclusive

array([2. , 2.5, 3. , 3.5, 4. ])

In [8]:
np.linspace(2, 3, 6).reshape((2,3))

array([[2. , 2.2, 2.4],
       [2.6, 2.8, 3. ]])

In [9]:
np.random.rand(2,3)   # will sample from a uniform distribution between 0 and 1

array([[0.86998989, 0.87826231, 0.56217193],
       [0.36173954, 0.88065615, 0.16797243]])

In [10]:
np.random.randn(3,2)  # will sample from a Gaussian (Normal) distribution with mean 0 sd 1 

array([[-1.20611432, -2.93472094],
       [ 0.76638185,  1.43963728],
       [ 2.36718577,  0.60732877]])

# Indexing

Indexing in a 1-dimensional array is similar to indexing in a Python List. For multi-dimensional arrays indexing works slightly different. In the indexing operator `[]` we can give a comma separated list of addresses to index the different dimensions. 

In [11]:
t = np.arange(1, 10).reshape((3,3))
t

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [12]:
t[1]   # row with index 1

array([4, 5, 6])

In [13]:
t[:, 1]  # columns with index 1, however it is flattened

array([2, 5, 8])

In [14]:
t[:, 1:2] # column with index 1, however now as a column vector

array([[2],
       [5],
       [8]])

# Conditional operations and logical selections

We can use any Numpy array in an expression. The comparisons and operations are performed elementwise.

In [15]:
s = np.arange(9,0,-1).reshape((3,3))
s

array([[9, 8, 7],
       [6, 5, 4],
       [3, 2, 1]])

In [16]:
t + s

array([[10, 10, 10],
       [10, 10, 10],
       [10, 10, 10]])

In [17]:
t > s

array([[False, False, False],
       [False, False,  True],
       [ True,  True,  True]])

We can combine comparisons with AND & and OR |, but have to use brackets.

In [18]:
(t * t <= 10) & (t > 1)

array([[False,  True,  True],
       [False, False, False],
       [False, False, False]])

# Broadcasting

When the sizes of the vectors in an experession do not match, Numpy automatically `broadcasts` the values, which is as if the values in the smaller array are replicated along the dimensions. Broadcasting only works if the dimensions between operands are an exact multiple of each other.

In [19]:
t  > s[2]   # the row s[2] is broadcasted to compare with the three rows in t

array([[False, False,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [20]:
t <= 5     # the value 5 is broadcasted to match the size of t

array([[ True,  True,  True],
       [ True,  True, False],
       [False, False, False]])

# Statistics and functions

We can compute some easy statistics, using `min`, `max`, `sum`, `mean`. For multidimensional arrays, we can specify axis to operate accros the rows dimension (axis=0) or the columns dimension (axis=1).

Sumpy also support common mathematical functions such as sine `sin`, cosine `cos`, square root `sqrt`, absolute `abs`, standard deviation `std`.

In [21]:
t.sum()

45

In [22]:
t.sum(axis=0)

array([12, 15, 18])

In [23]:
t.mean(axis=1)

array([2., 5., 8.])

In [24]:
np.sqrt(np.sum(t * t))

16.881943016134134

In [25]:
(t - t.mean()) / t.std()   # normalize t to a zero mean and standard deviation of 1

array([[-1.54919334, -1.161895  , -0.77459667],
       [-0.38729833,  0.        ,  0.38729833],
       [ 0.77459667,  1.161895  ,  1.54919334]])

# vector algebra

Arrays also supports linear algebra operators. Most operators work are elementwise, but there is a special dot product `@` operator for vectors which works like a matrix multiplication on matrices. There are also functions to compute the determinant and inverse of matrices, to solve linear algebra equations.

In [26]:
t[0] @ t.T   # mutiply the first row in t with the transpose of t

array([14, 32, 50])

Multiply a matrix with it's inverse approximately gives an identity matrix.

In [27]:
r = np.random.rand(3,3)
np.linalg.inv(r) @ r

array([[ 1.00000000e+00,  1.98257416e-16,  1.32006405e-16],
       [ 1.97712792e-17,  1.00000000e+00, -4.09815630e-17],
       [-3.93535148e-18, -3.60701135e-17,  1.00000000e+00]])