In [1]:
import numpy as np

## Introduction

Numpy is Python's answer to Matlab -- a package dedicated to matrix algebra and the manipulation of ndarray objects. It is also extremely influential -- most of the Python scientific stack is build either directly on top of numpy (scipy, numba, scikit-learn) or makes use of numpy-style notation (jax, tensorflow, pytorch and other more specialized tensor packages)

# Questions

## Define column vectors

In [2]:
x = np.array([-1, 0, 1, 4 ,9, 2, 1, 4.5, 1.1, -0.9])
y = np.array([1, 1, 2, 2, 3, 3, 4, 4, 5, np.nan])

Numpy arrays are neither column nor row vectors by default, because they have only one dimension. You can add another one using the `.reshape` method, `np.expand_dims`, or `None` slicing

In [4]:
print(x.reshape(-1, 1).shape)
print(np.expand_dims(x, -1).shape)
print(x[:, None].shape)

(10, 1)
(10, 1)
(10, 1)


## Check shapes

Check the shapes with the `.shape` property (already seen above)

In [5]:
x.shape == y.shape

True

## Logical operations

Like matlab, these perform element-wise computation, and return arrays of the same size.

In [28]:
print(f'x > y:\t\t {x > y:}')
print(f'x < 0:\t\t {x < 0}')

# Arithmetic operators (+ - / * **) happen before logicals
# see https://docs.python.org/3/reference/expressions.html#operator-precedence
print(f'x + 3 >= 0:\t {x + 3 >= 0}')
print(f'y < 0:\t\t {y < 0}')

x > y:		 [False False False  True  True False False  True False False]
x < 0:		 [ True False False False False False False False False  True]
x + 3 >= 0:	 [ True  True  True  True  True  True  True  True  True  True]
y < 0:		 [False False False False False False False False False False]


## Multiple boolean comparisons -- `np.all`

Check that a condition evaluates to true for every element in an array using `np.all`. Bitwise `and` comparison can be done with `and` or `&`, bitwise `or` with `or` or `|`

In [32]:
# And comparison
print(np.all(x + 3 >= 0) and np.all(y > 0))
# Equivalent in this case (because the boolean is scalar)
print(np.all(x + 3 >= 0) & np.all(y > 0))

# Or comparsion
print(np.all(x + 3 >= 0) or np.all(y > 0))

# Equivalent in this case (because the boolean is scalar)
print(np.all(x + 3 >= 0) | np.all(y > 0))

False
False
True
True


## 1.5 Multiple boolean comparisons -- `np.any`

Likewise, we can check if any element of an array is True using `np.any` 

In [33]:
np.any(y > 0)

True

## Math with arrays

### Element-wise operators

Basic math operations `+ - / * **` (plus bitwise operators) broadcast element-wise across arrays

In [37]:
print(f'x + y:\t{x + y}')
print(f'x * y:\t{x * y}')
print(f'x / y:\t{x + y}')

x + y:	[ 0.   1.   3.   6.  12.   5.   5.   8.5  6.1  nan]
x * y:	[-1.   0.   2.   8.  27.   6.   4.  18.   5.5  nan]
x / y:	[ 0.   1.   3.   6.  12.   5.   5.   8.5  6.1  nan]


### Linear Algebra

The `@` operator signals matrix multiplication. We need to add a 2nd dimension to `x` and `y` to use it flexibly. For 1d-arrays, `x @ y` is the inner product, `(x * y).sum()`

In [41]:
# inner product two ways
print(np.inner(x, y))
print(x @ y)

# outer product, two ways
print(np.outer(x ,y))
print(x[:, None] @ y[None])

nan
nan
[[-1.  -1.  -2.  -2.  -3.  -3.  -4.  -4.  -5.   nan]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   nan]
 [ 1.   1.   2.   2.   3.   3.   4.   4.   5.   nan]
 [ 4.   4.   8.   8.  12.  12.  16.  16.  20.   nan]
 [ 9.   9.  18.  18.  27.  27.  36.  36.  45.   nan]
 [ 2.   2.   4.   4.   6.   6.   8.   8.  10.   nan]
 [ 1.   1.   2.   2.   3.   3.   4.   4.   5.   nan]
 [ 4.5  4.5  9.   9.  13.5 13.5 18.  18.  22.5  nan]
 [ 1.1  1.1  2.2  2.2  3.3  3.3  4.4  4.4  5.5  nan]
 [-0.9 -0.9 -1.8 -1.8 -2.7 -2.7 -3.6 -3.6 -4.5  nan]]
[[-1.  -1.  -2.  -2.  -3.  -3.  -4.  -4.  -5.   nan]
 [ 0.   0.   0.   0.   0.   0.   0.   0.   0.   nan]
 [ 1.   1.   2.   2.   3.   3.   4.   4.   5.   nan]
 [ 4.   4.   8.   8.  12.  12.  16.  16.  20.   nan]
 [ 9.   9.  18.  18.  27.  27.  36.  36.  45.   nan]
 [ 2.   2.   4.   4.   6.   6.   8.   8.  10.   nan]
 [ 1.   1.   2.   2.   3.   3.   4.   4.   5.   nan]
 [ 4.5  4.5  9.   9.  13.5 13.5 18.  18.  22.5  nan]
 [ 1.1  1.1  2.2  2.2  3.3  3.3  4.4 

### Solve

For matlab's `x / y`, we need `np.linalg.solve` or `np.linalg.inv`. These aren't defined on vectors, though.

## More Element-wise operations

Basically any function we want will operate element-wise on arrays.

By default, numpy will complain about out-of-domain inputs to functions like `np.log`. In the following codeblock I use a context manager to silence these errors temporarily. You can see it correctly evaluates negative inputs to `np.nan`, and 0 to `-np.inf`.

I also used a the `np.printoptions` context manager to force numpy to show me fewer decimal places (I wanted everything on one line)

In [52]:

with np.errstate(all='ignore'), np.printoptions(linewidth=1000, precision=3, suppress=True):
    # Log is the natural log, np.log10 is the base-10 log
    print(np.log(x))
    print(np.exp(x))

[  nan  -inf 0.    1.386 2.197 0.693 0.    1.504 0.095   nan]
[   0.368    1.       2.718   54.598 8103.084    7.389    2.718   90.017    3.004    0.407]


## Combining functions and boolean logic

No problem, it works as expected. Note that `nan` evaluates to `False` for all inequalities, including with itself.

In [65]:
with np.errstate(all='ignore'), np.printoptions(linewidth=1000, precision=3):
    print(np.sqrt(x))
    print(np.any(np.sqrt(x) >= 2))

print('\n')
print('np.nan equals itself?: ', np.nan == np.nan)

[  nan 0.    1.    2.    3.    1.414 1.    2.121 1.049   nan]
True


np.nan equals itself?:  False


## Sums

You can make sums using the `.sum()` method.

In [72]:
print('x.sum():', x.sum())

# By default, anything + np.nan is np.nan 
print('y.sum():', (y ** 2).sum())

# Use np.nansum to ignore the nans when computing the map reduction
print('np.nansum(y):', np.nansum(y ** 2))

x.sum(): 20.700000000000003
y.sum(): nan
np.nansum(y): 85.0


## More compositions

Expressions are evaluated inside to outside, so we can do many element-wise operations inside `np.sum` or `np.nansum`

In [73]:
np.nansum(x * y ** 2)

233.5

## Counting `True` elements

Python is a dynamically typed language, which means you can do things like add booleans together. In this case, `True` is re-cast to `1`, and `False` to `0`. The number of `True` elements in a boolean mask is thus just the sum of the mask.

In [74]:
(x > 0).sum()

7

## Broadcasting

Certain element-wise operations can be broadcast together, even when the shapes are not identical. f(x, y), where x is `(n, 1)` and y is `(1, m`), for example, will result in a `(n, m)` matrix where the `i,j`th element is `f(x[i, 0], y[0, j])`

In this first example, `f` is multiplication:

In [77]:
# Note the double list in np.array to make a row vector
y[:, None] * np.array([[-1, 1]])

array([[-1.,  1.],
       [-1.,  1.],
       [-2.,  2.],
       [-2.,  2.],
       [-3.,  3.],
       [-3.,  3.],
       [-4.,  4.],
       [-4.,  4.],
       [-5.,  5.],
       [nan, nan]])

Here we see addition

In [78]:
x[:, None] + np.array([[-1, 0, 1]])

array([[-2. , -1. ,  0. ],
       [-1. ,  0. ,  1. ],
       [ 0. ,  1. ,  2. ],
       [ 3. ,  4. ,  5. ],
       [ 8. ,  9. , 10. ],
       [ 1. ,  2. ,  3. ],
       [ 0. ,  1. ,  2. ],
       [ 3.5,  4.5,  5.5],
       [ 0.1,  1.1,  2.1],
       [-1.9, -0.9,  0.1]])

## Dimension reduction
For ndarrays, summation can be done over a specified axis using the `axis` argument

In [79]:
np.nansum(y[:, None] * np.array([[-1, 1]]), axis=1)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

## Working with 2d arrays

Also known as matrices.

In [80]:
X = np.array([[1, 4, 7],
              [2, 5, 8],
              [3, 6, 9]])

Transpose with the `.T` method

In [81]:
print(X.T)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


Shape is still the `.shape` property

In [82]:
X.shape

(3, 3)

Determinant is in `np.linalg.det`

In [83]:
np.linalg.det(X)

0.0

## Computing the trace in 3 ways

Level 1 - Use `np.trace`

In [93]:
np.trace(X)

15

Level 2 - Use `np.diag` to extract the diagonal, then take the sum

In [94]:
np.diag(X).sum()

15

Level 3 - Use Einstein notation

In [95]:
np.einsum('nn', X)

15

## Set diagonal elements

Numpy offers a suite of index functions to quickly set parts of matrices. In this case, we want `np.diag_indices_from`

In [97]:
X[np.diag_indices_from(X)] = [7, 8, 9]
X

array([[7, 4, 7],
       [2, 8, 8],
       [3, 6, 9]])

## Eigenvalues

We can get the eigenvalues of a matrix with `np.linalg.eigvals`. 

A matrix is positive definite if all eigenvalues are positive, so we will also check that and print something if it's true

In [100]:
eigs = np.linalg.eigvals(X)
if np.all(eigs > 0):
    print('Matrix is positive definite')
elif np.all(eigs < 0):
    print('Matrix is negative definite')

Matrix is positive definite


## Matrix inversion

You can use `np.linalg.inv`, but `np.linalg.solve(X, np.eye(X.shape[0])` is typically recommended for numerical stability reasons. 

In [103]:
X_inv1 = np.linalg.inv(X)
X_inv2 = np.linalg.solve(X, np.eye(X.shape[0]))

# np.allclose checks if two arrays are equivalent up to requested accuracy
np.allclose(X_inv1, X_inv2)

True

## Matrix-Vector multiplication

It also uses `@`

In [104]:
a = np.array([1, 3, 2]).reshape(-1, 1) # column vector
print(a.T @ X)

# This will broadcast a across the rows of X
print(a.T * X)

print(X @ a)

[[19 40 49]]
[[ 7 12 14]
 [ 2 24 16]
 [ 3 18 18]]
[[33]
 [42]
 [39]]


## Chained matrix multiplication

`@` can be used like any other arithmetic operator, but numpy also offers `np.linalg.multi_dot`, which does some optimizations to speed things up. In this case though, `multi_dot` is actually slower.

In [110]:
%timeit a.T @ X @ a

1.22 µs ± 8.61 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [111]:
%timeit np.linalg.multi_dot([a.T, X, a])

2.64 µs ± 22.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


In [112]:
B1 = a.T @ X @ a
B2 = np.linalg.multi_dot([a.T, X, a])
np.allclose(B1, B2)

True

## Defining Matrices

There are helper functions `np.hstack` and `np.vstack` that stack matrices horizontally and vertically, respectively.

In [115]:
I = np.eye(3)
A = np.arange(1, 10).reshape(3, 3)
Y = np.hstack([A, I])
Z = np.vstack([A, I])

print(Y)
print(Z)

[[1. 2. 3. 1. 0. 0.]
 [4. 5. 6. 0. 1. 0.]
 [7. 8. 9. 0. 0. 1.]]
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


## Making vectors

In [122]:
# Python always starts from 0, and arange is right-exclusive
x1 = np.arange(1, 10)

# np.tile concatenates an array to itself N times
x2 = np.tile([0, 1], 5)

# np.ones returns an array of the requested size filled with 1's
x3 = np.ones(10)

# Tile can also be done using Python's built in multiplication overload on lists
x4 = np.array([-1, 1] * 5)

# np.arange can start and stop from anywhere
x5 = np.arange(1980, 2011)

#For the last one we can just use broadcast division
x6 = np.arange(101) / 100

In [123]:
# Print the first 5 elements of each array
for x in [x1, x2, x3, x4, x5, x6]:
    print(x[:5])

[1 2 3 4 5]
[0 1 0 1 0]
[1. 1. 1. 1. 1.]
[-1  1 -1  1 -1]
[1980 1981 1982 1983 1984]
[0.   0.01 0.02 0.03 0.04]


## Linspace

Numpy copied Matlab's linspace as `np.linspace`.

In [125]:
np.linspace(-np.pi, np.pi, 500)[:10]

array([-3.14159265, -3.1290011 , -3.11640955, -3.10381799, -3.09122644,
       -3.07863488, -3.06604333, -3.05345178, -3.04086022, -3.02826867])

## Question 25

This is matlab specific about the order of computation for the `:` operator. In Python this is obvious, because we don't have access to a handy operator for making range arrays.

In [126]:
print(np.arange(1, 11 + 1))
print(np.arange(1, 11) + 1)

[ 1  2  3  4  5  6  7  8  9 10 11]
[ 2  3  4  5  6  7  8  9 10 11]


## Defining more vectors

In [128]:
# Note the double list to make a row vector
x = np.array([[1, 1.1, 9, 8, 1, 4, 4, 1]])
y = np.array([[1, 2, 3, 4, 4, 3, 2, np.nan]]).T

# Use the `dtype` argument in the array constructor to get True/False from 1/0
z = np.array([[1, 1, 0, 0, 1, 0, 0, 0]], dtype='bool').T

## Slicing

Slicing in numpy works just like matlab, EXCEPT indices start from `0` and are right exclusive. 

In [150]:
print(x[0, 1:6]) #x(2:5)
print(x[0, 3:-2]) #x(4:end-2) -- end is given by -1 in numpy
print(x[0, [0, 4, 7]]) #x([1, 5, 8])
print(x[0, np.arange(1, 4).repeat(4)])

# Boolean masking works as expected
print(y[z]) # No need to explicitly index the 0th row, because z is a 2d array
print(y[~z])

# TODO: Check matlab output for this
print(y[x.squeeze() > 2])

# TODO: Check matlab output for this
print(y[x.squeeze() == 1])

# TODO: Check matlab output for this
print(x[0, ~np.isnan(y.ravel())])

print(y[~np.isnan(y)])

[1.1 9.  8.  1.  4. ]
[8. 1. 4.]
[1. 1. 1.]
[1.1 1.1 1.1 1.1 9.  9.  9.  9.  8.  8.  8.  8. ]
[1. 2. 4.]
[ 3.  4.  3.  2. nan]
[[3.]
 [4.]
 [3.]
 [2.]]
[[ 1.]
 [ 4.]
 [nan]]
[1.  1.1 9.  8.  1.  4.  4. ]
[1. 2. 3. 4. 4. 3. 2.]


## Setting array elements via index

Works as expected, with the caveats above.

To get all indices corresponding to a condition, use `np.where`

In [153]:
x2 = x.copy() # If you don't copy you'll get unexpected results in some cases

x2[np.where(x2 == 4)] = -4
x2

array([[ 1. ,  1.1,  9. ,  8. ,  1. , -4. , -4. ,  1. ]])

## Setting array elements, continued

In [154]:
x2[np.where(x == 1)] = np.nan
x2

array([[ nan,  1.1,  9. ,  8. ,  nan, -4. , -4. ,  nan]])

## Setting array elements, cont cont

In [155]:
# TODO: Check what this does in matlab, it's not obviously valid numpy code
#x2[z] = []

## Matrices via hstack and reshape

In [166]:
# Reshape works row-wise, so we have to reshape to the transposed matrix first
M = np.hstack([np.arange(1, 13), np.arange(12, 0, -1)]).reshape(6, 4).T
M

array([[ 1,  5,  9, 12,  8,  4],
       [ 2,  6, 10, 11,  7,  3],
       [ 3,  7, 11, 10,  6,  2],
       [ 4,  8, 12,  9,  5,  1]])

## More matrix slicing and masking

In [170]:
print(M[0, 2]) # M(1,3)
print(M[:, 4]) # M(:, 5)
print(M[1, :]) # M(2, :)

# the 2nd element of the slice doens't change relative to matlab syntax because
# python is right exclusive and matlab is right inclusive
print(M[1:3, 2:4]) # M(2:3, 3:4)
print(M[1:4], 3) # M(2:4, 4)

#TODO: Check these, I'm sure fancy indexing in numpy is different to matlab
print(M[np.where(M > 5)])
print(M[:, np.where(M[0, :] <= 5)])
print(M[np.where(M[:, 1] > 6), :])
print(M[np.where(M[:, 1] > 6), 3:6])

9
[8 7 6 5]
[ 2  6 10 11  7  3]
[[10 11]
 [11 10]]
[[ 2  6 10 11  7  3]
 [ 3  7 11 10  6  2]
 [ 4  8 12  9  5  1]] 3
[ 9 12  8  6 10 11  7  7 11 10  6  8 12  9]
[[[1 5 4]]

 [[2 6 3]]

 [[3 7 2]]

 [[4 8 1]]]
[[[ 3  7 11 10  6  2]
  [ 4  8 12  9  5  1]]]
[[[10  6  2]
  [ 9  5  1]]]


## Conditional slicing

Print rows of M where column 5 is at least 3 times larger than column 6

In [175]:
print(M[np.where(M[:, 4] >= 3 * M[:, 5])].squeeze())

[[ 3  7 11 10  6  2]
 [ 4  8 12  9  5  1]]


## Element-wise conditional counts, 1

Count the number of elements of M that are larger than 7

In [176]:
(M > 7).sum()

10

## Element-wise conditional counts, 2

Count the number of elements of M in row 2 that are smaller than their neighbors in row 1.

In [179]:
(M[2] < M[1]).sum()

3

## Element-wise conditional counts, 3

Count the number of elements of M that are larger than their left neighbor

In [186]:
# Use a column shift. The 0th column has no left neighbor, and the last column 
# is nobody's left neighbor, so compare like this:
(M[:, 1:] > M[:, :-1]).sum()

10