# Numpy: Numeric python

**Python objects**

- high-level number objects: integers, floating point
- containers: lists (costless insertion and append), dictionaries (fast lookup)

**Numpy provides**

- extension package to Python for multi-dimensional arrays
- closer to hardware (efficiency)
- designed for scientific computation (convenience)

In [None]:
import numpy as np

## creating Numpy arrays

the most basic way to create numpy array is from list

- list of numbers &rarr; 1D array
- list of list of numbers &rarr; 2D array
- ...

one dimensional array

In [None]:
l = [1,2,3,4,5]
a = np.array(l)
a

two dimensional array

In [None]:
l = [[1,2,3],
     [4,5,6],
     [7,8,9]]
a = np.array(l)
a

we see that a 2 dimensional array is created from **a list of list**, so we can easily extend the definition of a 2 dimensional array is **an array of a 1 dimensional arrays**, ex:

In [None]:
a[0]

is the first row of a (remember that slicing index starts from 0 like IDL). we can access a specific element by 

In [None]:
a[0,0]

and slicing a sub-array by `[start : stop+1 : step]`

In [None]:
a[0:-1,0:-1]

like string and list, `[:]` gives all the elements in this dimension, for example in 2D

In [None]:
a[:,:]

![array slicing](./images/array_slicing.png)

as you see above, since a 2D array is simply an array of 1D arrays, from memory's point of view, 2D arrays should be stored row by row. so when you are looping over a high dimensional array, you should start from outer dimensional

In [None]:
n = 250
temp = np.random.randn(n,n,n)

In [None]:
%%timeit
for i in range(n):
    for j in range(n):
        for k in range(n):
            temp[i,j,k] = 2*temp[i,j,k]

In [None]:
temp = np.random.randn(n,n,n)

In [None]:
%%timeit
for i in range(n):
    for j in range(n):
        for k in range(n):
            temp[k,j,i] = 2*temp[k,j,i]

but as you know, the most efficient way is do it by **elementwise** (vectorized operation)

In [None]:
temp = np.random.randn(n,n,n)

In [None]:
%%timeit 
temp[:,:,:] = 2*temp[:,:,:]

numpy provides us many functions to create arrays efficiently

In [None]:
# given interval
np.arange(0, 10, 2)

In [None]:
# given data point
np.linspace(-1, 1, 51)

In [None]:
# all set to one
np.ones(10)

In [None]:
# all to zero
np.zeros(10)

and many many ...

a numpy array has many attribute. the attribute of an numpy array is just like the hair color, height, weight, etc of a human being, for example:

In [None]:
l = [[1,2,3],
     [4,5,6]]
a = np.array(l)
a

In [None]:
# its data type is
a.dtype

In [None]:
# how many byte it takes in memory
a.nbytes

In [None]:
# 2 dimensional array
a.ndim

In [None]:
# 2 rows x 3 columns
a.shape

and a numpy array also has many many methods(like functions, so `()` is needed), which means **do something with this array**, for example

In [None]:
# max, min, mean, std
a.max(), a.min(), a.mean(), a.std()

In [None]:
# find index of max element
a.argmax()

In [None]:
# cumulated sumation, cumulated product
a.cumsum(), a.cumprod()

In [None]:
# change its data type
a.astype(np.float32)

In [None]:
# find the diagonal elements (pivots)
a.diagonal()

Ok, we now know a numpy array has a attribute called shape, and to make an array with different shape is just to `.reshape()` it. 

the reshaped array will follow the memory order.

for example:

In [None]:
# original array
a = np.arange(12)
a

In [None]:
# create a 2D array from a 1D array
a.reshape(2,6)

In [None]:
# to a 3D array
a.reshape(2,2,3)

to flatten a high dimensional array

In [None]:
b = a.reshape(2,6)
b

In [None]:
b.flatten()

or we can flatten it by setting the shape to -1

In [None]:
b.reshape(-1)

following this thought, we can easily create a column vector by

In [None]:
b.reshape(-1,1)

since we now know that a necessary parameter to create a numpy array is **its shape**, we are able to

In [None]:
np.ones()           # show its shape arg by shift+tab

In [None]:
np.ones([2,3])

In [None]:
np.zeros([2,3])

In [None]:
np.identity(3)

this is the same with

In [None]:
a = np.ones(3)
# create a diagonal matrix using a as pivots
b = np.diag(a)
b

then, if you want to create a **band matrix** which used heavily in linear system for example radiative transfer

In [None]:
a = np.ones(3)
ap = np.ones(4)
np.diag(a,-1)*2 + np.diag(ap) + np.diag(a,1)*3

# copies and view

In [None]:
a = np.arange(10)
a

In [None]:
b = a[::2]
b

In [None]:
np.may_share_memory(a,b)

we see that this 2 variable share the same memory. so a very straight forward result is **if we change the the value of b's element, a is also changed**.

In [None]:
b[0] = 12
print("b: ", b)
print("a: ", a)

so we have to be very care of these **viewing** (or called **reference**), if we want to make them independent with each other, use the `.copy()` method

In [None]:
a = np.arange(10)
b = a[::2].copy()     # this will create a new object in memory

In [None]:
b[0] = 12
print("b: ", b)
print("a: ", a)

## fancy indexing

fancy indexing is that we 

- creating a list of integer to tell which element we want

In [None]:
# normal slicing
a = np.array([1,3,5,7,9])
a[0]

In [None]:
a[[1,2,3]]

and it is not necessary to be a list, we can use integer array also

In [None]:
index = np.arange(1,4)
index

In [None]:
a[index]

but this is not convenient when dealing with multi-dimensional array. so to handle multi-dimensional array

- we create a boolean mask by some condition, and use the mask to pick elements from an array where its mask value is True

In [None]:
a = np.arange(12).reshape(2,6)
a

In [None]:
mask = a > 3
mask

then the fancy indexing gives

In [None]:
a[mask]

for short we can do it in a very wasy understanding way

In [None]:
a[a>3]

however, we noticed that the result array is flattened, for example, no matter what shape array a is, `a[a>3]` always returns a 1D array. to satisfy this kind of needs, we use `numpy.ma`: [numpy masked array](https://docs.scipy.org/doc/numpy/reference/maskedarray.html)

In [None]:
# the same array a
a = np.arange(12).reshape(2,6)
a

In [None]:
# create a mask to tell where we want to mask
# for example, we do not need where a<3, so we ...
mask = a < 3
mask

In [None]:
b = np.ma.MaskedArray(a, mask)
b

In [None]:
# original data
b.data

In [None]:
# mask
b.mask

we can do numeric operation on the masked array b directly like numpy array

In [None]:
b *= 2
b

In [None]:
b.min()

## elementwise (vectorized) numeric operation 

the same as IDL

In [None]:
a = np.arange(6).reshape(2,3)
b = np.ones((2,3))
b[1,:] *= 2

In [None]:
print("a \n", a)
print("b \n", b)

In [None]:
a * b

In [None]:
np.sin(b)

In [None]:
np.exp(b)

also valid for logical operation

In [None]:
a = np.arange(3)
a

In [None]:
b = np.ones(3)
b

In [None]:
is_a_gt_b = a > b
is_a_gt_b

In [None]:
# and the "AND gate" and "OR gate"
is_a_gt_b.all(), is_a_gt_b.any()

### broadcasting

when our array do not match in shape, numpy will do something for us.

very interesting but **not** as useful as people think, so a brief illustration:

![broadcasting](./images/broadcasting.png)

In [None]:
a = np.array([0,0,0,10,10,10,20,20,20,30,30,30]).reshape(4,3)
a

In [None]:
# if b is a row vector
b = np.arange(3)
b

In [None]:
# extend b to shape (2,3) by copying row
a + b

In [None]:
# if b is a column vector
b = np.array([0,1,2,3]).reshape(4,1)
b

In [None]:
# extend b to shape (2,3) by copying column
a + b

In [None]:
# if you have a column vector and a row vector
a = np.array([0,10,20,30]).reshape(4,1)
b = np.arange(0, 3)

print("a \n", a)
print("b \n", b)

In [None]:
a + b

## axis

for multi-dimension array (ndarray), one more important feature is the **axis**. for example a 4D array has axis (axis=0, axis=1, axis=2, axis=4)

In [None]:
# if you have a 2D array
a = np.arange(6).reshape(2,3)
a

In [None]:
a.sum()         # show its axis arg by shift+tab

In [None]:
# summation along axis 0 
a.sum(axis=0)

In [None]:
# summation along axis 1
a.sum(axis=1)

In [None]:
# mean, the same
a.mean(axis=1)

axis is a very important feature when dealing with high dimensional array.

In [None]:
a = np.ones((2,3,4))
a

In [None]:
# sum over axis 1 and axis 2, then axis 0 has shape 2
a.sum(axis=(1,2))

In [None]:
# the transpose by specifying the order of axis
a.transpose(1,2,0).shape

In [None]:
a.transpose(2,1,0).shape

## data type

In [None]:
# if we don't specify the data type, default data type is np.float64
np.ones( (2,3) ).dtype

In [None]:
# we are allowed to declare its data type at the beginning
a = np.ones( (2,3), dtype=np.int16 )             # show the dtype arg by shift+tab
a

also we can change its type after declaration using the `.astype()` method

In [None]:
a = np.ones( (2,3) ).astype(np.int16)
a

### the most useful thing of dtype is creating a structured numpy array.

[sturctured numpy array](https://docs.scipy.org/doc/numpy/user/basics.rec.html)

- list of tuple as dtype

In [None]:
# if each data point has 3 features, by specifying their names and dtypes
dtype = [("x", np.int16), ("y", np.int16), ("z", np.int16)]

# create three points
points = np.array([(1,2,3),
                   (3,2,1),
                   (-1,-2,1)], dtype=dtype)

In [None]:
points

In [None]:
points[0]['x']

In [None]:
# sort points along their x indices
np.sort(points, order='x')

- dictionary + `numpy.dtype()`

In [None]:
# particle has position(x,y,z) and mass m
particle_dtype = np.dtype({
    'names': ('x', 'y', 'z', 'm'),
    'formats': (np.int16, np.int16, np.int16, np.float64)
})

# initialize three particles
particles = np.ones(3, dtype=particle_dtype)

In [None]:
particles

In [None]:
# the first particle: ('x', 'y', 'z', 'm')
particles[0]

In [None]:
# first particle's mass
particles[0]['m']

## Advanced operation

### numpy.random 

for example:

Return random integers from low (inclusive) to high (exclusive)

```
numpy.random.randint(low, high=None, size=None, dtype='l')
```

In [None]:
# 5 random integers from 0 to 9
np.random.randint(0,10,5)

Randomly permute a sequence x:

```
numpy.random.permutation(x)
```

In [None]:
np.random.permutation(np.arange(10))

Draw random samples from a normal (Gaussian) distribution:

```
numpy.random.normal(loc=0.0, scale=1.0, size=None)
```

- local: mean
- scale: standard deviation
- size: output shape

In [None]:
np.random.normal(0.0, 1.0, 10)

Generate a random sample from a given 1-D array

```
np.random.choice(a, size)
```

In [None]:
# pick a number from np.arange(10) twice
np.random.choice(np.arange(10), 4)

but the result might overlapped with each other. in this case we use the `replace=False` keyword argument

In [None]:
np.random.choice(np.arange(10), 4, replace=False)

and many many else : [https://docs.scipy.org/doc/numpy/reference/routines.random.html](https://docs.scipy.org/doc/numpy/reference/routines.random.html)

### numpy.linalg

by using `dir()` we are able to get a first glimpse of package `numpy.linalg`, a simplified linear algebra package provided by numpy.

In [None]:
dir(np.linalg)

In [None]:
# let's create a 2x2 matrix
a = np.array([[2,1],
              [2,2]])

In [None]:
# and a vector
b = np.array([1,2]).reshape(2,1)

the dot product is provided by `numpy.dot()` function or `.dot()` method.

In [None]:
np.dot(a,b)

In [None]:
a.dot(b)

and to solve a linear system $ Ax = B $ we use `numpy.linalg.solve()`.

In [None]:
np.linalg.solve(a, a.dot(b))

to invert a matrix we use `numpy.linalg.inv()`.

In [None]:
# to invert a trival matrix
a = np.array([[1,0],[0,0.5]])
np.linalg.inv(a)

also we can compute the matrix determinant by `numpy.linalg.det()`.

In [None]:
# to calculate the determinant of an diagonal matrix
a = np.diag(np.arange(1,4))
print('a : \n', a)
print('the determinant of a is : ,', np.linalg.det(a))

to calculate the eigenvector and eigenvalues of a matrix, we use `numpy.linalg.eig()`

In [None]:
a = np.diag(np.arange(1,4))
print('a : \n', a)
eigen_values, eigen_vectors = np.linalg.eig(a)
for i in range(3):
    print("{} : eigenvector {} of eigenvalue {}".format(i, eigen_vectors[:,i], eigen_values[i]))

and many many else : [https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html)

for more detail linear algebra package, go to `scipy.linalg`:

[https://docs.scipy.org/doc/scipy-0.14.0/reference/linalg.html](https://docs.scipy.org/doc/scipy-0.14.0/reference/linalg.html)

### numpy dataIO

Numpy has its onwn binary format 
- `numpy.save()` as .npy file : saving one array
- `numpy.savez()` as .npz file : saving multiple arrays, compressed

In [None]:
# create an array and save it to hard disk
data = np.arange(6)
np.save('./data.npy', data)

In [None]:
# delete our variable "data"
del data

In [None]:
data = np.load('./data.npy')

In [None]:
data

`numpy.save()` do not allow us to keep variable's name. to do this we use `numpy.savez()`.

In [None]:
data1 = np.arange(6)
data2 = np.arange(6,12)
np.savez('./data_1_2.npz', data1=data1, data2=data2)

In [None]:
del data1, data2

In [None]:
data = np.load('./data_1_2.npz')

In [None]:
data.items()

like a dictionary, so we can access the data by 

In [None]:
data['data1']

for whom prefer to .txt file, numpy provides `np.loadtxt()` and `np.savetxt()` to deal with it

In [None]:
data = np.arange(6).astype(np.int16)
np.savetxt('./data.txt', data, fmt='%3d')

In [None]:
%cat ./data.txt

In [None]:
del data

In [None]:
data = np.loadtxt('./data.txt', dtype=np.int16, )
data

third party packages for well-known file formats

- HDF5 : `h5py`, `PyTables`
- NetCDF : `scipy.io.netcdf_file`, `netcdf4-python`, ...
- Matlab : `scipy.io.loadmat`, `scipy.io.savemat`
- MatrixMarket : `scipy.io.mmread`, `scipy.io.mmwrite`
- IDL : `scipy.io.readsav`