# Numpy -  multidimensional data arrays

## Introduction

Numpy is not part of the "standard library", but it might as well be for engineers. Numpy is Python's answer to Matlab - the "back end" is implemented in C so its performance is very fast (comparable to Matlab).

In [1]:
import numpy as np

## Creating `numpy` arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples
* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc.
* reading data from files

In [None]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])

print(v)

# a matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])

print(M)

type(v), type(M)

The difference between the `v` and `M` arrays is only their shapes. We can get information about the shape and size of an array by using the `shape` and `size` properties.

In [None]:
print(v.shape)
print(M.shape)
print(v.size)
print(M.size)

Arrays are similar to lists, but they must contain a single type:

In [None]:
M[0,0] = "hello"

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

In [None]:
M = np.array([[1, 2], [3, 4]], dtype=complex)

M

### Creating arrays with functions

It is often more efficient to generate large arrays instead of creating them from lists. There are a few useful functions for this in numpy:

* `np.arange` - create a range with a specified step size (endpoints not included)
* `np.linspace` - create a range with a specified number of points (endpoints *are* included)
* `np.logspace` - create a range with a specified number of points in log space (endpoints *are* included)
* `np.mgrid` - create points on a multi-dimensional grid (similar to meshgrid in matlab)
* `np.random.rand` - create random number matrix from a uniform distribution
* `np.random.randn` - create random number matrix from a standard normal distribution
* `np.zeros` - create a matrix of zeros
* `np.ones` - create a matrix of ones
* `np.eye` - create identity matrix

In [None]:
x = np.arange(0, 10, 0.5) # arguments: start, stop, step
print(x)
x = np.linspace(0,10,15)
print(x)
x = np.logspace(0,3,10,base=10)
print(x)
print([np.log10(xi) for xi in x])

In [None]:
x, y = np.mgrid[0:5, 0:5] # similar to meshgrid in MATLAB
print(x)
print(y)

In [None]:
# uniform random numbers in [0,1]
rand_uniform = np.random.rand(3,3)
print(rand_uniform)
# standard normal distributed random numbers
rand_normal = np.random.randn(3,3)
print(rand_normal)

In [None]:
z = np.zeros((3,3)) #note that these take 1 tuple argument instead of multiple integers
one = np.ones((3,3))
I = np.eye(3,3) #but not this one... this is an annoying inconsistency.
print(z)
print(one)
print(I)

## File I/O

* Numpy has built-in functionality for reading/writing CSV or TSV (tab-separated value) files

Consider the following example:

In [2]:
!head stockholm_td_adj.dat

'head' is not recognized as an internal or external command,
operable program or batch file.


In [3]:
data = np.genfromtxt('stockholm_td_adj.dat')
print(data.shape)

(77431, 7)


Numpy can also write `csv` files from arrays:

In [None]:
M = np.random.rand(6,6)
np.savetxt("random-matrix.csv", M)
M1 = np.genfromtxt("random-matrix.csv")
print(M1==M)

## Manipulating arrays

Once we generate `numpy` arrays, we need to interact with them. This involves a few operations:

* indexing - accessing certain elements
* index "slicing" - accessing certain subsets of elements
* fancy indexing - combinations of indexing and slicing

This is not very different from Matlab.

We can index elements in an array using square brackets and indices:

In [None]:
# v is a vector, and has only one dimension, taking one index
print(v[0])
# M is a matrix, or a 2 dimensional array, taking two indices 
print(M[1,1])
# If an index is ommitted then the whole row is returned
print(M[1])
# This means that we can also index with multiple brackets if we want to type more:
print(M[1][1] == M[1,1])

The same thing can be achieved with using `:` instead of an index: 

In [None]:
print(M[1,:]) # row 1
print(M[:,1]) # column 1

We can assign new values to elements or rows in an array using indexing:

In [None]:
M[0,0] = 1
print(M)
M[:,2] = -1
print(M)

### Index slicing

Index slicing is the name for the syntax `M[lower:upper:step]` to extract a subset of an array.

In [None]:
A = np.arange(1,20)
print(A)
print(A[1:8:2])
print(A[1:8]) #This is the most common usage
print(A[5:])
print(A[-3:])

Array values can also be assigned using slicing:

In [None]:
A[1:3] = [-2,-3]
print(A)

Index slicing works exactly the same way for multidimensional arrays:

In [None]:
R = np.random.rand(10,10,10)
print(R.shape)
subR = R[3:5, 1:4, 0]
print(subR.shape)
print(subR)

### Fancy indexing

Fancy indexing is the name for when an array or list is used in-place of an index: 

In [None]:
R = np.random.rand(4,4)
print(R)
print('-'*10)
row_indices = [1, 3]
print(R[row_indices])

In [None]:
col_indices = [1, -1] # remember, index -1 means the last element
print(R[row_indices, col_indices])

### Transposing arrays

Arrays can easily be transposed with `.T`.

In [None]:
skinny = np.random.rand(8,2)
print(skinny)
print(skinny.shape)
fat = skinny.T
print(fat)
print(fat.shape)

## Linear algebra in Numpy

Formulating your code as matrix-matrix and matrix-vector operations in Numpy will make it much more efficient. We will briefly cover syntax for:

* scalar*vector
* scalar*matrix
* matrix*vector
* matrix*matrix
* inverse
* determinant
* solve Ax=b

### Scalar-array operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [None]:
v1 = np.arange(0, 5)
print(v1)
print('-'*10)
print(v1*2)
print('-'*10)
print(v1+2)

Same goes for matrices:

In [None]:
M = np.random.rand(2,2)
print(M)
print('-'*10)
print(M*2)
print('-'*10)
print(M+2)

### Element-wise array-array operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations. This is different from Matlab!

In [None]:
v1 = np.arange(2,6)
print(v1)
print(v1*v1)
print(v1/v1)

print('-'*10)

M = np.array([[1,2],[3,4]])
print(M)
print(M*M)

### Matrix algebra

What about matrix mutiplication?

* use the `dot` function (recommended)
* use the `matrix` class (`+`, `*`, `-` use matrix algebra)

In [None]:
A = np.eye(3,3)
v = np.array([1,2,3])
print(np.dot(A,v))
print(np.dot(A,A))
print(np.dot(v,v))

A = np.matrix(A)
v = np.matrix(v)
print(A*v.T)
print(A*A)
print(v*v.T)

### Common matrix operations

We can easily calculate the inverse and determinant using `inv` and `det`

In [None]:
A = np.array([[-1,2],[3,-1]])
print(A)
print(np.linalg.inv(A))
print(np.linalg.det(A))

## Data processing in with Numpy arrays

Numpy provides a number of functions to calculate statistics of datasets in arrays. 

For example, let's calculate some properties from the Stockholm temperature dataset used above.

In [None]:
# reminder, the tempeature dataset is stored in the data variable:
print(data.shape)
print('Y: {}, M: {}, D: {}, Avg: {}, Low: {}, Hi: {}, Loc: {}'.format(*data[0, :]))

We can use numpy to easily calculate:

* mean
* standard deviation
* variance
* min/max

In [None]:
print(np.mean(data))
print(data.mean())
# the mean of the entire dataset is pretty meaningless...

# the temperature data is in column 3
print(data[:,3].mean())

In [None]:
#We can calculate standard deviation, variance, min, and max in the same way:
print('stdev:',np.std(data[:,3]))
print('variance:',np.var(data[:,3]))
print('min',np.min(data[:,3]))
print('max',np.max(data[:,3]))

#note that all of these are also *methods* of the array *class*
print(data[:,3].std())

### Calculations with higher-dimensional data

Sometimes we want to apply an operation across a single dimension. For example, we might want the mean of very column. This is controlled with the `axis` argument:

In [None]:
avgs = np.mean(data,axis=0)
print(data.shape)
print(avgs.shape)
print(avgs)

In [None]:
R = np.random.rand(3,3,3)
print(R.mean())
print(R.mean(axis=2))

## Reshaping and resizing arrays

The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays. There are rules that govern how this reshaping takes place.

In [None]:
print(R.shape)
n,m,p = R.shape
Q = R.reshape((n, m*p))
print(Q.shape)
F = R.flatten() #the "flatten" function turns the whole array into a vector
print(F.shape)

Two common pitfalls in reshaping arrays:

* Reshaping rules do not behave as expected
* Reshaping provides a different "view" of the data, but **does not copy it**

In [None]:
print(R[0,0,0])
print(F[0])
print(R[0,1,0])
print(F[1])
print(F[3])

In [None]:
print(R[0,0,0])
Q[0] = 10
print(R[0,0,0]) #resize does not copy the data
F[0] = 6
print(R[0,0,0]) #flatten makes copies

### Making "deep copy"

If you really want a copy of an array, use the `np.copy` function:

In [None]:
A = np.array([[1, 2], [3, 4]])
print(A)
B = A
B[0,0] = 10
print(A)
Acopy = np.copy(A)
Acopy[1,1] = 6
print(A)

## Using arrays in conditions

`if` statements and other boolean expressions are ambiguous with arrays.

* `any` checks to see if any members are true/false
* `all` checks to see if all members are true/false

In [None]:
print(M)
print(M>1)

In [None]:
if (M > 1).any():
    print("at least one element in M is larger than 5")
else:
    print("no element in M is larger than 5")

In [None]:
if (M > 1).all():
    print("all elements in M are larger than 5")
else:
    print("all elements in M are not larger than 5")

## Further reading

* http://github.com/jrjohansson/scientific-python-lectures - Lecture 3 is the more detailed version of this lecture.
* http://numpy.scipy.org
* http://scipy.org/Tentative_NumPy_Tutorial
* http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.