# The NumPy library

## Introduction

The `numpy` package (module) is used in almost all numerical computation using Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good. 

To use `numpy` you need to import the module, using for example:

In [165]:
from numpy import * 

Using the syntax `from <package_name> import *` you are importing all _modules_ from the corresponding package.
In the `numpy` package the terminology used for vectors, matrices and higher-dimensional data sets is *array*.

> **Recommendation** <br/> 
It is recommended to import the numpy package using `import numpy as np` (you can use any other short reference name instead of `np`) and than invoke `numpy` methods using the call 
```
np.<numpy_method>(args,)
```
> So that:
- one can easily check which function belongs to the `numpy` package
- it is easier to **avoid conflicts**: if we define a function with the same name of an existing `numpy` method the code will contain multiple definitions and eventually raise an error. 
> Justified by the fact this is a tutorial on `numpy`, we will import all `numpy` modules and consequently simplify the code.

## Creating `numpy` arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples
* using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, etc.
* reading data from files

### From lists

For example, to create new vector and matrix arrays from Python lists we can use the `numpy.array` function.

In [166]:
# a vector: the argument to the array function is a Python list
v = array([1,2,3,4])

v

array([1, 2, 3, 4])

In [168]:
# a matrix: the argument to the array function is a nested Python list
M = array([[1, 2], [3, 4]])

M

array([[1, 2],
       [3, 4]])

The `v` and `M` objects are both of the type `ndarray` that the `numpy` module provides.

In [169]:
type(v), type(M)

(numpy.ndarray, numpy.ndarray)

The difference between the `v` and `M` arrays is only their shapes. We can get information about the shape of an array by using the `ndarray.shape` property.

In [170]:
v.shape

(4,)

In [171]:
M.shape

(2, 2)

The number of elements in the array is available through the `ndarray.size` property:

In [172]:
M.size

4

Equivalently, we could use the function `numpy.shape` and `numpy.size`

In [173]:
shape(M)

(2, 2)

In [174]:
size(M)

4

So far the `numpy.ndarray` looks awefully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type? 

There are several reasons:

* **Python lists** are very general. They can contain any kind of object. They are dynamically typed. They **do not support mathematical functions** such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
* Numpy arrays are **statically typed** and **homogeneous**. The type of the elements is determined when the array is created. Homogeneous means that all elements in the array have the same type. 
* Numpy arrays are memory efficient.
* Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of `numpy` arrays can be implemented in a compiled language (C and Fortran is used).

Using the `dtype` (data type) property of an `ndarray`, we can see what type the data of an array has:

In [175]:
M.dtype

dtype('int64')

We get an error if we try to assign a value of the wrong type to an element in a numpy array:

In [176]:
M[0,0] = "hello"

ValueError: invalid literal for int() with base 10: 'hello'

If we want, we can explicitly define the type of the array data when we create it, using the `dtype` keyword argument: 

In [177]:
M = array([[1, 2], [3, 4]], dtype=complex)

M

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

Common data types that can be used with `dtype` are: `int`, `float`, `complex`, `bool`, `object`, etc.

We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

### Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in `numpy` that generate arrays of different forms. Some of the more common are:

-  **`arange`**

In [178]:
# create a range

x = arange(0, 10, 1) # arguments: start, stop, step

x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [179]:
x = arange(-1, 1, 0.1)

x

array([-1.00e+00, -9.00e-01, -8.00e-01, -7.00e-01, -6.00e-01, -5.00e-01,
       -4.00e-01, -3.00e-01, -2.00e-01, -1.00e-01, -2.22e-16,  1.00e-01,
        2.00e-01,  3.00e-01,  4.00e-01,  5.00e-01,  6.00e-01,  7.00e-01,
        8.00e-01,  9.00e-01])

- **`linspace` and `logspace`**

In [180]:
# using linspace, both end points ARE included
linspace(0, 10, 25)

array([ 0.   ,  0.417,  0.833,  1.25 ,  1.667,  2.083,  2.5  ,  2.917,
        3.333,  3.75 ,  4.167,  4.583,  5.   ,  5.417,  5.833,  6.25 ,
        6.667,  7.083,  7.5  ,  7.917,  8.333,  8.75 ,  9.167,  9.583,
       10.   ])

In [181]:
logspace(0, 10, 10, base=e)

array([1.000e+00, 3.038e+00, 9.228e+00, 2.803e+01, 8.515e+01, 2.587e+02,
       7.858e+02, 2.387e+03, 7.251e+03, 2.203e+04])

- **`mgrid`**

In [184]:
x, y = mgrid[0:5, 0:5] # similar to meshgrid in MATLAB

In [185]:
x

array([[0, 0, 0, 0, 0],
       [1, 1, 1, 1, 1],
       [2, 2, 2, 2, 2],
       [3, 3, 3, 3, 3],
       [4, 4, 4, 4, 4]])

In [186]:
y

array([[0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4],
       [0, 1, 2, 3, 4]])

- **random data** : support for random data is provided by the random module

In [187]:
from numpy import random

In [188]:
# uniform random numbers in [0,1]. 
random.rand(5,5)                         # array has shape (5, 5)

array([[0.856, 0.075, 0.562, 0.164, 0.538],
       [0.265, 0.078, 0.019, 0.763, 0.902],
       [0.594, 0.91 , 0.24 , 0.633, 0.475],
       [0.032, 0.388, 0.778, 0.504, 0.611],
       [0.715, 0.511, 0.386, 0.747, 0.828]])

In [189]:
# standard normal distributed random numbers
random.randn(5,5)

array([[-0.609,  1.242,  0.82 ,  1.389, -1.585],
       [-1.733,  2.217, -0.097,  0.748, -1.076],
       [-1.065,  0.267,  2.469, -0.475,  0.284],
       [ 0.049,  0.817, -1.361,  0.155,  0.04 ],
       [-0.46 , -0.566, -1.016, -0.774,  0.412]])

- **`diag`**

In [190]:
# a diagonal matrix
diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [191]:
# diagonal with offset from the main diagonal
diag([1,2,3], k=1) 

array([[0, 1, 0, 0],
       [0, 0, 2, 0],
       [0, 0, 0, 3],
       [0, 0, 0, 0]])

- **`zeros` and `ones`**

In [192]:
zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [194]:
ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

## File I/O

### Comma-separated values (CSV)

A very common file format for data files is comma-separated values (CSV), or related formats such as TSV (tab-separated values). To read data from such files into Numpy arrays we can use the `numpy.genfromtxt` function. For example, we can read the temperature dataset `record.dat` which you have been provided together with the tutorial jupyter notebooks. 

In [195]:
!head record.dat

1800  1  1    -6.1    -6.1    -6.1 1
1800  1  2   -15.4   -15.4   -15.4 1
1800  1  3   -15.0   -15.0   -15.0 1
1800  1  4   -19.3   -19.3   -19.3 1
1800  1  5   -16.8   -16.8   -16.8 1
1800  1  6   -11.4   -11.4   -11.4 1
1800  1  7    -7.6    -7.6    -7.6 1
1800  1  8    -7.1    -7.1    -7.1 1
1800  1  9   -10.1   -10.1   -10.1 1
1800  1 10    -9.5    -9.5    -9.5 1


> the `!` prepended in front of a code line means that the jupyter notebook executed the command in the shell. In this case, the command `head <path/to/file>` prints to screen the first 10 lines in the file.

As you see, data are represented in tabular form with any column being a different unit and the rows representing time. The function transforms the table in a corresponding matrix. 

In [196]:
data = genfromtxt('record.dat')

In [197]:
data.shape

(77431, 7)

Using `numpy.savetxt` we can store a Numpy array to a file in CSV format:

In [198]:
M = random.rand(3,3)

M

array([[0.125, 0.307, 0.302],
       [0.028, 0.704, 0.694],
       [0.814, 0.328, 0.33 ]])

In [199]:
savetxt("random-matrix.csv", M)

In [200]:
!cat random-matrix.csv

1.253778287804843128e-01 3.071388397622902833e-01 3.017315254207740827e-01
2.817738270136160850e-02 7.038468741141026275e-01 6.936843073982236207e-01
8.138099467760314676e-01 3.276830822702246904e-01 3.299924813548145153e-01


> the shell command `cat <path/to/file>` prints to screen the entire content of the file. 

You can specify the format using the _keyword argument_ `fmt`. In the example we specify the format to be a float number with 5 decimal digits.

In [201]:
savetxt("random-matrix.csv", M, fmt='%.5f') # fmt specifies the format

!cat random-matrix.csv

0.12538 0.30714 0.30173
0.02818 0.70385 0.69368
0.81381 0.32768 0.32999


### Numpy's native file format

Useful when storing and reading back numpy array data. Use the functions `numpy.save` and `numpy.load`:

In [204]:
print("Saved matrix\n", M)
save("random-matrix.npy", M)

# Indexing works as for lists, we will see more later
M[0,0] = 0.0
print("\nModified matrix\n", M)

M = load("random-matrix.npy")

print("\nLoaded matrix\n", M)

Saved matrix
 [[0.125 0.307 0.302]
 [0.028 0.704 0.694]
 [0.814 0.328 0.33 ]]

Modified matrix
 [[0.    0.307 0.302]
 [0.028 0.704 0.694]
 [0.814 0.328 0.33 ]]

Loaded matrix
 [[0.125 0.307 0.302]
 [0.028 0.704 0.694]
 [0.814 0.328 0.33 ]]


## More properties of the numpy arrays

In [206]:
M.itemsize # bytes per element

8

In [207]:
M.nbytes # number of bytes

72

In [208]:
M.ndim # number of dimensions

2

## Manipulating arrays

### Indexing

We can index elements in an array using square brackets and indices:

In [209]:
# v is a vector, and has only one dimension, taking one index
v[0]

1

In [210]:
# M is a matrix, or a 2 dimensional array, taking two indices 
M[1,1]

0.7038468741141026

If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array) 

In [211]:
M

array([[0.125, 0.307, 0.302],
       [0.028, 0.704, 0.694],
       [0.814, 0.328, 0.33 ]])

In [212]:
M[1]

array([0.028, 0.704, 0.694])

The same thing can be achieved with using `:` instead of an index: 

In [213]:
M[1,:] # row 1

array([0.028, 0.704, 0.694])

In [214]:
M[:,1] # column 1

array([0.307, 0.704, 0.328])

We can assign new values to elements in an array using indexing:

In [215]:
M[0,0] = 1

In [216]:
M

array([[1.   , 0.307, 0.302],
       [0.028, 0.704, 0.694],
       [0.814, 0.328, 0.33 ]])

In [217]:
# also works for rows and columns
M[1,:] = 0
M[:,2] = -1

In [218]:
M

array([[ 1.   ,  0.307, -1.   ],
       [ 0.   ,  0.   , -1.   ],
       [ 0.814,  0.328, -1.   ]])

### Index slicing

Index slicing is the technical name for the syntax `M[lower:upper:step]` to extract part of an array:

In [219]:
A = array([1,2,3,4,5])
A

array([1, 2, 3, 4, 5])

In [220]:
A[1:3]

array([2, 3])

Array slices are *mutable*: if they are assigned a new value the original array from which the slice was extracted is modified:

In [221]:
A[1:3] = [-2,-3]

A

array([ 1, -2, -3,  4,  5])

We can omit any of the three parameters in `M[lower:upper:step]`:

In [222]:
A[::] # lower, upper, step all take the default values [0, array_length, 1]

array([ 1, -2, -3,  4,  5])

In [223]:
A[::2] # step is 2, lower and upper defaults to the beginning and end of the array

array([ 1, -3,  5])

In [224]:
A[:3] # first three elements

array([ 1, -2, -3])

In [225]:
A[3:] # elements from index 3

array([4, 5])

Negative indices counts from the end of the array (positive index from the begining):

In [226]:
A = array([1,2,3,4,5])

In [227]:
A[-1] # the last element in the array, as you wrote A[-1, length, 1] = A[length-1] =  A[last element]

5

In [228]:
A[-3:] # the last three elements

array([3, 4, 5])

Index slicing works exactly the same way for multidimensional arrays. As an example, using _nested lists comprehension_, we can easily build the products table from 1 to 10. 

In [229]:
A = array([[m*n for n in range(1,11)] for m in range(1,11)])


A

array([[  1,   2,   3,   4,   5,   6,   7,   8,   9,  10],
       [  2,   4,   6,   8,  10,  12,  14,  16,  18,  20],
       [  3,   6,   9,  12,  15,  18,  21,  24,  27,  30],
       [  4,   8,  12,  16,  20,  24,  28,  32,  36,  40],
       [  5,  10,  15,  20,  25,  30,  35,  40,  45,  50],
       [  6,  12,  18,  24,  30,  36,  42,  48,  54,  60],
       [  7,  14,  21,  28,  35,  42,  49,  56,  63,  70],
       [  8,  16,  24,  32,  40,  48,  56,  64,  72,  80],
       [  9,  18,  27,  36,  45,  54,  63,  72,  81,  90],
       [ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100]])

In [230]:
# strides
A[::2, ::2]

array([[ 1,  3,  5,  7,  9],
       [ 3,  9, 15, 21, 27],
       [ 5, 15, 25, 35, 45],
       [ 7, 21, 35, 49, 63],
       [ 9, 27, 45, 63, 81]])

### Fancy indexing

Fancy indexing refers to using an array or list in-place of an index.

In [231]:
row_indices = [1, 2, 3]
A[row_indices]

array([[ 2,  4,  6,  8, 10, 12, 14, 16, 18, 20],
       [ 3,  6,  9, 12, 15, 18, 21, 24, 27, 30],
       [ 4,  8, 12, 16, 20, 24, 28, 32, 36, 40]])

In [232]:
col_indices = [1, 2, -1]       # remember, index -1 means the last element
A[row_indices, col_indices]    # A[1,1], A[2,2], A[3,-1]

array([ 4,  9, 40])

#### Index masks
If the index mask is an Numpy array of data type `bool`, then an element is selected (True) or not (False) depending on the value of the index mask at the position of each element. This is very similar to the way also Matlab works. 

In [233]:
B = array([n for n in range(10)])
B

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Generally, the index mask is obtained as the result of some _logic condition_ applied on the original matrix. For example we could ask if elements are even and selecting only those using the resulting mask. 

In [234]:
even_mask = B%2 == 0

print("Mask =         ", even_mask)
print("Even numbers = ", B[even_mask])

Mask =          [ True False  True False  True False  True False  True False]
Even numbers =  [0 2 4 6 8]


In [235]:
# same thing
row_mask = array([1,0,1,0,1,0,1,0,1,0], dtype=bool)
B[row_mask]

array([0, 2, 4, 6, 8])

This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [236]:
x = arange(0, 10, 0.5)
x

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. ,
       6.5, 7. , 7.5, 8. , 8.5, 9. , 9.5])

In [237]:
mask = (5 < x) * (x < 7.5)   # * works as the logic AND, you could equivalently use &
print(mask)


[False False False False False False False False False False False  True
  True  True  True False False False False False]


In [238]:
x[mask]

array([5.5, 6. , 6.5, 7. ])

## Functions for extracting data from arrays and creating arrays

### where

The index mask can be converted to position index using the `where` function

In [239]:
indices = where(mask)

indices

(array([11, 12, 13, 14]),)

In [240]:
x[indices] # this indexing is equivalent to the fancy indexing x[mask]

array([5.5, 6. , 6.5, 7. ])

### diag

With the diag function we can also extract the diagonal and subdiagonals of an array:

In [241]:
diag(A)

array([  1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

In [242]:
diag(A, -1)

array([ 2,  6, 12, 20, 30, 42, 56, 72, 90])

### take

The `take` function is similar to fancy indexing described above:

In [243]:
v2 = arange(-3,3)
v2

array([-3, -2, -1,  0,  1,  2])

In [244]:
row_indices = [1, 3, 5]
v2[row_indices] # fancy indexing

array([-2,  0,  2])

In [245]:
v2.take(row_indices)

array([-2,  0,  2])

But `take` also works on lists and other objects, returning a `numpy` array:

In [246]:
take([-3, -2, -1,  0,  1,  2], row_indices)

array([-2,  0,  2])

### choose

Constructs an array by picking elements from several arrays: 

In [247]:
which = [1, 0, 1, 0, 2, 2]
choices = [[-2,-2,-2,-2,-2,-2], [5,5,5,5,5,5], [4,5,6,7,8,9]]

# What will the result be?
# c[0] = choice[1][0] = 5
# c[1] = choice[0][1] = -2
# ...
# c[5] = choice[2][5] = 9
c = choose(which, choices)

print(c)

[ 5 -2  5 -2  8  9]


## Linear algebra

**Vectorizing code** is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations, like matrix-matrix multiplication.

### Scalar-array operations

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [248]:
v1 = arange(0, 3)

In [249]:
v1 * 2

array([0, 2, 4])

In [250]:
v1 + 2

array([2, 3, 4])

In [251]:
A =  diag([1,2,3])
print("A*2 =\n", A * 2)
print("\nA+2 =\n", A + 2)

A*2 =
 [[2 0 0]
 [0 4 0]
 [0 0 6]]

A+2 =
 [[3 2 2]
 [2 4 2]
 [2 2 5]]


### Element-wise array-array operations

When we add, subtract, multiply and divide arrays with each other, the default behaviour is **element-wise** operations (for Matlab users, this is the ".\*" operator). 

In [252]:
A = array([[1, 1, 1], [2, 2, 2], [1, 1, 1]])
product = A*A

print("A     =\n", A)
print("\nA.*A =\n",pr)

A     =
 [[1 1 1]
 [2 2 2]
 [1 1 1]]

A.*A =
 [[1 1 1]
 [4 4 4]
 [1 1 1]]


In [253]:
v1 * v1

array([0, 1, 4])

If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:

In [254]:
A.shape, v1.shape

((3, 3), (3,))

In [255]:
A * v1

array([[0, 1, 2],
       [0, 2, 4],
       [0, 1, 2]])

### Matrix algebra

What about matrix mutiplication? There are two ways. We can either use the `dot` function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments: 

In [256]:
dot(A, A)

array([[4, 4, 4],
       [8, 8, 8],
       [4, 4, 4]])

In [257]:
dot(A, v1)

array([3, 6, 3])

In [258]:
dot(v1, v1)

5

Alternatively, we can cast the array objects to the type `matrix`. This changes the behavior of the standard arithmetic operators `+, -, *` to use matrix algebra.

In [259]:
M = matrix(A)
v = matrix(v1).T # make it a column vector (.T is the transpose since the default initialization is as row vector)

In [260]:
v

matrix([[0],
        [1],
        [2]])

In [261]:
M * M

matrix([[4, 4, 4],
        [8, 8, 8],
        [4, 4, 4]])

In [262]:
M * v

matrix([[3],
        [6],
        [3]])

In [263]:
# inner product
v.T * v

matrix([[5]])

In [264]:
# with matrix objects, standard matrix algebra applies
v + M*v

matrix([[3],
        [7],
        [5]])

If we try to add, subtract or multiply objects with incomplatible shapes we get an error:

In [265]:
v = matrix([1,2,3,4,5,6]).T

In [266]:
shape(M), shape(v)

((3, 3), (6, 1))

In [267]:
M * v

ValueError: shapes (3,3) and (6,1) not aligned: 3 (dim 1) != 6 (dim 0)

See also the related functions: `inner`, `outer`, `cross`, `kron`, `tensordot`. Try for example `help(kron)`.

### Array/Matrix transformations

Above we have used the `.T` to transpose the matrix object `v`. We could also have used the `transpose` function to accomplish the same thing. 

Other mathematical functions that transform matrix objects are:

In [268]:
C = matrix([[1j, 2j], [3j, 4j]])
C

matrix([[0.+1.j, 0.+2.j],
        [0.+3.j, 0.+4.j]])

In [269]:
conjugate(C)

matrix([[0.-1.j, 0.-2.j],
        [0.-3.j, 0.-4.j]])

> Recaal that the transpose of a complex matrix is the __Hermitian conjugate__ computed as the conjugate transpose of the original matrix.

In [270]:
C.H

matrix([[0.-1.j, 0.-3.j],
        [0.-2.j, 0.-4.j]])

We can extract the real and imaginary parts of complex-valued arrays using `real` and `imag`:

In [271]:
real(C) # same as: C.real

matrix([[0., 0.],
        [0., 0.]])

In [272]:
imag(C) # same as: C.imag

matrix([[1., 2.],
        [3., 4.]])

Or the complex argument and absolute value

In [273]:
angle(C+1) # for MATLAB User: angle is used instead of arg

array([[0.785, 1.107],
       [1.249, 1.326]])

In [274]:
abs(C)

matrix([[1., 2.],
        [3., 4.]])

### Matrix computations

#### Inverse

In [275]:
linalg.inv(C) # equivalent to C.I 

matrix([[0.+2.j , 0.-1.j ],
        [0.-1.5j, 0.+0.5j]])

In [276]:
C.I * C

matrix([[1.00e+00+0.j, 0.00e+00+0.j],
        [2.22e-16+0.j, 1.00e+00+0.j]])

#### Determinant

In [277]:
linalg.det(C)

(2.0000000000000004+0j)

In [278]:
linalg.det(C.I)

(0.49999999999999967+0j)

### Data processing

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays. 

For example, let's calculate some properties from the temperature dataset used above.

In [279]:
# reminder, the tempeature dataset is stored in the data variable:
shape(data)

(77431, 7)

- **mean**

In [280]:
# the temperature data is in column 3
mean(data[:,3])

6.197109684751585

- **standard deviations** and **variance**

In [281]:
std(data[:,3]), var(data[:,3])

(8.282271621340573, 68.59602320966341)

- **min** and **max**

In [282]:
# lowest daily average temperature
data[:,3].min()

-25.8

In [283]:
# highest daily average temperature
data[:,3].max()

28.3

- **sum**, **prod**, and **trace**

In [284]:
d = arange(0, 10)
d

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [285]:
# sum up all elements
sum(d)

45

In [286]:
# product of all elements
prod(d+1)

3628800

In [287]:
# cummulative sum
cumsum(d)

array([ 0,  1,  3,  6, 10, 15, 21, 28, 36, 45])

In [288]:
# cummulative product
cumprod(d+1)

array([      1,       2,       6,      24,     120,     720,    5040,
         40320,  362880, 3628800])

In [289]:
# same as: diag(A).sum()
trace(A)

4

### Computations on subsets of arrays

We can compute with subsets of the data in an array using indexing, fancy indexing, and the other methods of extracting data from an array (described above).

For example, let's go back to the temperature dataset (`-n 3` optional argument tells `head` to return only the first 3 lines):

In [290]:
!head -n 3 record.dat

1800  1  1    -6.1    -6.1    -6.1 1
1800  1  2   -15.4   -15.4   -15.4 1
1800  1  3   -15.0   -15.0   -15.0 1


The dataformat is: year, month, day, daily average temperature, low, high, location.

If we are interested in the average temperature only in a particular month, say February, then we can create a index mask and use it to select only the data for that month using `unique`. As you can imagine, `unique` returns a vector with unique instances of elements contained in the input array:

In [291]:
unique(data[:,1]) # the month column takes values from 1 to 12

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.])

In [292]:
mask_feb = data[:,1] == 2

In [293]:
# the temperature data is in column 3
mean(data[mask_feb,3])

-3.212109570736596

With these tools we have very powerful data processing capabilities at our disposal. For example, to extract the average monthly average temperatures for each month of the year only takes a few lines of code: 

In [294]:
months = arange(1,13)
monthly_mean = array([mean(data[data[:,1] == month, 3]) for month in months])

set_printoptions(precision=3)
print(monthly_mean)

[-3.045 -3.212 -0.832  3.888  9.561 14.66  17.319 16.118 11.819  6.76
  1.946 -1.211]


This is quite a cold place, isn't it?

### Calculations with higher-dimensional data

When functions such as `min`, `max`, etc. are applied to a multidimensional arrays, it is sometimes useful to apply the calculation to the entire array, and sometimes only on a row or column basis. Using the `axis` argument we can specify how these functions should behave: 

In [295]:
m = random.rand(3,3)
m

array([[0.692, 0.076, 0.2  ],
       [0.491, 0.011, 0.78 ],
       [0.573, 0.228, 0.233]])

In [296]:
# global max
m.max()

0.7797123181505959

In [297]:
# max in each column
m.max(axis=0)

array([0.692, 0.228, 0.78 ])

In [298]:
# max in each row
m.max(axis=1)

array([0.692, 0.78 , 0.573])

#### Warning
This is many times cause of confusion. The column axis the the first one (`axis=0`) but when indexing a matrix the order is `m[row_index][col_index]`, so indexing goes from the most inner to the most outer axis. 

Many other functions and methods in the `array` and `matrix` classes accept the same (optional) `axis` keyword argument.

## Reshaping, resizing and stacking arrays

The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays.

In [299]:
A = array([[m + n*m for m in range(5)] for n in range(5)])
A

array([[ 0,  1,  2,  3,  4],
       [ 0,  2,  4,  6,  8],
       [ 0,  3,  6,  9, 12],
       [ 0,  4,  8, 12, 16],
       [ 0,  5, 10, 15, 20]])

In [300]:
n, m = A.shape

In [301]:
B = A.reshape((1,n*m))
B

array([[ 0,  1,  2,  3,  4,  0,  2,  4,  6,  8,  0,  3,  6,  9, 12,  0,
         4,  8, 12, 16,  0,  5, 10, 15, 20]])

In [302]:
B[0,0:5] = 5 # modify the array

B

array([[ 5,  5,  5,  5,  5,  0,  2,  4,  6,  8,  0,  3,  6,  9, 12,  0,
         4,  8, 12, 16,  0,  5, 10, 15, 20]])

In [303]:
A # and the original variable is also changed. B is only a different view of the same data

array([[ 5,  5,  5,  5,  5],
       [ 0,  2,  4,  6,  8],
       [ 0,  3,  6,  9, 12],
       [ 0,  4,  8, 12, 16],
       [ 0,  5, 10, 15, 20]])

We can also use the function `flatten` to make a higher-dimensional array into a vector. But this function create a copy of the data.

In [304]:
B = A.flatten()

B

array([ 5,  5,  5,  5,  5,  0,  2,  4,  6,  8,  0,  3,  6,  9, 12,  0,  4,
        8, 12, 16,  0,  5, 10, 15, 20])

In [305]:
B[0:5] = 10

B

array([10, 10, 10, 10, 10,  0,  2,  4,  6,  8,  0,  3,  6,  9, 12,  0,  4,
        8, 12, 16,  0,  5, 10, 15, 20])

In [306]:
A # now A has not changed, because B's data is a copy of A's, not refering to the same data

array([[ 5,  5,  5,  5,  5],
       [ 0,  2,  4,  6,  8],
       [ 0,  3,  6,  9, 12],
       [ 0,  4,  8, 12, 16],
       [ 0,  5, 10, 15, 20]])

The takehome message hier is that you should use the `numpy` methods with care and check if they perform or do not the array copy. 

## Adding a new dimension: `newaxis` and `expand_dims`

With `newaxis`, we can insert new dimensions in an array, for example converting a vector to a column or row matrix:

In [307]:
v = array([1,2,3])

In [308]:
shape(v)

(3,)

In [309]:
# make a column matrix of the vector v
v[:, newaxis]

array([[1],
       [2],
       [3]])

In [310]:
# column matrix
v[:,newaxis].shape

(3, 1)

In [311]:
# row matrix
v[newaxis,:].shape

(1, 3)

Note that the previous lines did not change the original vector.

In [312]:
v.shape
v[0] = 10
print(v)

[10  2  3]


Alternatively, you can use `expand_dims` where you explicitly reference which axis is added. As in the previous case, `expand_dims` is not performing a vector copy, but all new varialbles are different views of the same object. 

In [313]:
v1 = expand_dims(v, axis=0)
print(v1.shape)

v2 = expand_dims(v, axis=1)
print(v2.shape)

v3 = expand_dims(v2, axis=2)
print(v3.shape)

print(v.shape)

# Change the first element of v
v[0] = 10

# Corresponding element in v3 also changes
print("\nv = \n", v)
print("\nv3 = \n",v3)

(1, 3)
(3, 1)
(3, 1, 1)
(3,)

v = 
 [10  2  3]

v3 = 
 [[[10]]

 [[ 2]]

 [[ 3]]]


## Stacking and repeating arrays

Using function `repeat`, `tile`, `vstack`, `hstack`, and `concatenate` we can create larger vectors and matrices from smaller ones:

- **`tile`** and **`repeat`**

In [314]:
a = array([[1, 2], [3, 4]])

In [315]:
# repeat each element 3 times
repeat(a, 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [316]:
# tile the matrix 3 times 
tile(a, 3)

array([[1, 2, 1, 2, 1, 2],
       [3, 4, 3, 4, 3, 4]])

- **`concatenate`** : it requires dimensions of the matrices being concatenated along one axis to be consistent

In [317]:
b = array([[5, 6]])

In [318]:
concatenate((a, b), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6]])

In [319]:
concatenate((a, b.T), axis=1)

array([[1, 2, 5],
       [3, 4, 6]])

- **`hstack`** and **`vstack`** : horizontal and vertical stacking respectively 

In [320]:
vstack((a,b))

array([[1, 2],
       [3, 4],
       [5, 6]])

In [321]:
hstack((a,b.T))

array([[1, 2, 5],
       [3, 4, 6]])

## Copy and "deep copy"

To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (technical term: **pass by reference**). 

In [322]:
A = array([[1, 2], [3, 4]])

A

array([[1, 2],
       [3, 4]])

In [323]:
# now B is referring to the same array data as A 
B = A 

In [324]:
# changing B affects A
B[0,0] = 10

B

array([[10,  2],
       [ 3,  4]])

In [325]:
A

array([[10,  2],
       [ 3,  4]])

If we want to avoid this behavior, so that when we get a new completely independent object `B` copied from `A`, then we need to do a so-called "deep copy" using the function `copy`:

In [326]:
B = copy(A)

In [327]:
# now, if we modify B, A is not affected
B[0,0] = -5

B

array([[-5,  2],
       [ 3,  4]])

In [328]:
A

array([[10,  2],
       [ 3,  4]])

## Iterating over array elements

Generally, we want to avoid iterating over the elements of arrays whenever we can (at all costs). The reason is that in a interpreted language like Python (or MATLAB), iterations are really slow compared to vectorized operations. 

However, sometimes iterations are unavoidable. For such cases, the Python `for` loop is the most convenient way to iterate over an array:

In [329]:
v = array([1,2,3,4])

for element in v:
    print(element)

1
2
3
4


In [330]:
M = array([[1,2], [3,4]])

for row in M:
    print("row", row)
    
    for element in row:
        print('\t>',element)

row [1 2]
	> 1
	> 2
row [3 4]
	> 3
	> 4


When we need to iterate over each element of an array and modify its elements, it is convenient to use the `enumerate` function to obtain both the element and its index in the `for` loop: 

In [331]:
for row_idx, row in enumerate(M):
    print("row_idx", row_idx, "row", row)
    
    for col_idx, element in enumerate(row):
        print("\tcol_idx", col_idx, "element", element)
       
        # update the matrix M: square each element
        M[row_idx, col_idx] = element ** 2

row_idx 0 row [1 2]
	col_idx 0 element 1
	col_idx 1 element 2
row_idx 1 row [3 4]
	col_idx 0 element 3
	col_idx 1 element 4


In [332]:
# each element in M is now squared
M

array([[ 1,  4],
       [ 9, 16]])

## Vectorizing functions: `vectorize`

As mentioned several times by now, to get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write work with vector inputs.

In [333]:
def Theta(x):
    """
    Scalar implemenation of the Heaviside step function.
    """
    if x >= 0:
        return 1
    else:
        return 0

In [334]:
Theta(array([-3,-2,-1,0,1,2,3]))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

OK, that didn't work because we didn't write the `Theta` function so that it can handle a vector input... 

To get a vectorized version of Theta we can use the Numpy function `vectorize`. In many cases it can automatically vectorize a function:

In [335]:
Theta_vec = vectorize(Theta)

In [336]:
Theta_vec(array([-3,-2,-1,0,1,2,3]))

array([0, 0, 0, 1, 1, 1, 1])

We can also implement the function to accept a vector input from the beginning (requires more effort but might give better performance):

In [337]:
def Theta(x):
    """
    Vector-aware implemenation of the Heaviside step function.
    """
    return 1 * (x >= 0)

In [338]:
Theta(array([-3,-2,-1,0,1,2,3]))

array([0, 0, 0, 1, 1, 1, 1])

In [339]:
# still works for scalars as well
Theta(-1.2), Theta(2.6)

(0, 1)

## Using arrays in conditions

When using arrays in conditions,for example `if` statements and other boolean expressions, one needs to use `any` or `all`, which requires that any or all elements in the array evalutes to `True`:

In [340]:
M

array([[ 1,  4],
       [ 9, 16]])

In [341]:
if (M > 5).any():
    print("at least one element in M is larger than 5")
else:
    print("no element in M is larger than 5")

at least one element in M is larger than 5


In [342]:
if (M > 5).all():
    print("all elements in M are larger than 5")
else:
    print("all elements in M are not larger than 5")

all elements in M are not larger than 5


## Type casting

Since Numpy arrays are *statically typed*, the type of an array **does not change** once created. But we can explicitly cast an array of some type to another using the `astype` functions. This always create a new array of new type:

In [343]:
M.dtype

dtype('int64')

In [344]:
M2 = M.astype(float)

M2

array([[ 1.,  4.],
       [ 9., 16.]])

In [345]:
M2.dtype

dtype('float64')

In [346]:
M3 = M.astype(bool)

M3

array([[ True,  True],
       [ True,  True]])

### `asarray`

The major difference is that `array` will make a copy of the original object and not edit it unless `copy` keyword argument is not set to false. Meaning that the original object remains unchanged. `asarray` would reflect the changes in the original array like object itself. `asarray`has the following parameters (some optionals):
- `a`: an array like object such as tuples, lists etc.
- `dtype`: data type
- `order`: memory representation (save row-wise or column-wise)

#### Example

In [351]:
A = matrix(ones((3,3)))

#use numpy.array to modify A. Doesn't work because you are modifying a copy
array(A)[2]=2
A

matrix([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [352]:
# use numpy.asarray to modify A. It worked because you are modifying A itself
asarray(A)[2]=2
A

matrix([[1., 1., 1.],
        [1., 1., 1.],
        [2., 2., 2.]])

## References

- http://jrjohansson.github.io
- https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

### Further reading

* http://numpy.scipy.org
* http://scipy.org/Tentative_NumPy_Tutorial
* http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.