<img src="images/inmas.png" width=130x align='right' />

# Notebook 15 - Intermediate NumPy
Material covered in this notebook:

- How to generate arrays from functions
- Performing I/O with arrays
- Manipulating arrays
- Slicing and masking

### Prerequisite
Notebook 14


### Housekeeping for matplotlib

Let's first import matplotlib and ensure that the plots are embedded nicely in the notebook

In [None]:
%matplotlib inline 

import matplotlib.pyplot as plt
import numpy as np

### Mathematical functions in NumPy
As for operators, NumPy provides mathematical functions that can operate on each element of an array

For example:

In [None]:
v0 = np.array([4, 3, 2, 1])
v0 = 1/v0 * np.pi/2
v1 = np.sin(v0)
v1

All math functions are available and operate element-wise on an array (returning an array), or on a scalar

### Creating NumPy arrays using array-generating functions
- For large arrays it is inpractical to initialize the data manually, using explicit Python lists.
- Instead we can use one of the many functions in `numpy` that generate arrays of different forms. 

- Some of the more common are:
    - `arange(start, stop, step)`: similar to `range()` but can also generate floats
    - `linspace(start, stop., step`: similar to `arange()` except end points are included
    - `logspace(start, end, step)`: similar to `linrange()` but points are regularly spaced on a log scale
    - `zeros(shape)` and `ones(shape)`: fill the array with 0 or 1

### The `arange` array generators
The `arange()` function is similar to the `range()` function and creates values using arguments start, stop, and step

In [None]:
x = np.arange(0, 10, 1)
print(x.shape)
print(x)

Unlike its cousin `range()` which only accepts integer arguments, `arange()` understands floats

In [None]:
y = np.arange(0.1, 2.1, 0.1)
y

This function is convenient for generating graphs of mathematical functions

### The following displays a graph of a *sin* function

In [None]:
x = np.arange(-np.pi, np.pi, 0.1)   # start,stop,step
y = np.sin(x)
plt.plot(x,y)
plt.axhline(y=0, linestyle='dotted', color='gray')
plt.xlabel('x-values')
plt.ylabel('sin(x)')
plt.xticks((-np.pi, 0, np.pi), ('-$\pi$', '0', '$\pi$'))
plt.title('Plot of the Sine Function')
plt.show()

### A few matplotlib tricks
Our sinusoid graph is more instructive when using $pi$ units
- The small bars on the $x$-axis are called ticks, which can be associated with labels by using the function `plt.xticks()`
    - xticks takes two tuples (or lists) of matching length for values and labels
- The $x$ axis is added by drawing a horizontal line at $y=0$
- One could have salso used `plt.grid()`

Notice on the previous graph how the last data point is one step short to $\pi$

### The `linspace` and `logspace` array-generating functions
- The `linspace()` and `logspace()` functions are used for regularly-spaced points in their  domains
- Unlike `arange()`, these functions accept (start, end, N) where N is the number of elements
- Unlike `arange()`, the end points are included in these functions

Let's look at specific examples:

In [None]:
np.linspace(0, 10, 5)

`logspace` accepts an optional `base` argument (default=10)

In [None]:
print(np.logspace(0, 3, 4))
print(np.logspace(0, 3, 4, base=np.e))

### Generating arrays with random data
NumPy has different functions for generating random series through the `random` submodule
- For a uniform distribution, the `random.rand()` function is used for populating an array of a specified size with numbers uniformly distributed in the $[0,1)$ interval
- `random.rand()` does not accept a tuple for the shape, but rather integers for dimensions

In [None]:
np.random.rand(2, 5)

- The `random.randn()` function (*n* for Normal), on the other hand, is used for populating an array of a given size with numbers from a Gaussian distribution of mean 0 and variance 1

In [None]:
np.random.randn(2, 4)

### Creating diagonal arrays
Square diagonal arrays can be created directly using the `diag()` constructor

- It takes a list as an argument for specifying the values of the diagonal elements

For example:

In [None]:
x = np.diag([1,2,3])
print('x is of type %r and dimensions %s.'%(type(x), x.shape))
x

### Creating diagonal matrices
Square diagonal matrices are created from square arrays through an additional matrix() constructor

- `diag()` can also accept an additional offset argument using the `k` keyword which defaults to 0

For example, we build a diagonal **matrix** with an offset of 1

In [None]:
x = np.matrix(np.diag([1, 2, 3], k=1))
print('x is of type %r and dimensions %s.'%(type(x), x.shape))
x

Notice how the offset makes the shape of the matrix one size larger

### `zeros` and `ones` constructors
- It is good practice to initialize the elements in an array when it is created, and often with 0, or 1
- For this purpose, the `zeros()` and `ones()` functions are used
- Unlike `rand()`, the required argument is a **tuple** representing the shape of the desired array, thus the double parentheses

In [None]:
np.zeros((2, 4))

In [None]:
np.ones((3, 5))

### Shortcuts to copy the shape of an array
To use the shape of another object, the `ones_like()` and the `zeros_like()` functions can be used

In [None]:
y = np.random.randn(2, 5)
z = np.ones_like(y)
z

In [None]:
z = np.zeros_like(y)
z

###  Comma-separated values (CSV)
- CSV is a common text format for data files, or related formats such as TSV (tab-separated values)

Here is such a file (mouse scroll to see more) that contains the temperature (C) in Stockholm from the 1800's:

In [None]:
datafile = data/stockholm_td_adj2.csv
# On macOS and Linux uncomment the following line
# !head data/stockholm_td_adj2.csv
# On Windows use the following commands
!type data\stockholm_td_adj2.csv

### Reading comma-separated values (CSV) in NumPy
- To read data from such files into Numpy arrays we can use the `numpy.genfromtxt()` function

In [None]:
data = np.genfromtxt('./data/stockholm_td_adj2.csv', delimiter=',')
data.shape

We can see that it read the 4 columns for year, month, day, and temperature values

- The function `genfromtxt()` converted the characters read in ASCII to IEEE floats
- It also recognized the delimiters (comma) to separate the values

### Plotting the data we just read
We will now use matplotlib to display the content of this data file

In [None]:
fig, ax = plt.subplots(figsize=(14,4))
ax.plot(data[:,0]+data[:,1]/12.0+data[:,2]/365, data[:,3])
ax.axis('tight')
ax.set_title('temperatures in Stockholm')
ax.set_xlabel('year')
ax.set_ylabel('temperature (C)');

### Doing more with `genfromtxt()`
The first line of the data file contains a text description of the columns

Let's look at how NumPy read the first two lines: 

In [None]:
data[0:2,:]

Oops! Without our knowledge, matplotlib silently ignored these values while plotting the data

A proper way to read the file would be to skip the header as follows:

In [None]:
data = np.genfromtxt('./data/stockholm_td_adj2.csv', delimiter=',', skip_header=1)
data[0:2, :]

### Options for `genfromtxt()`

- the `delimiter` keyword is used to define how the splitting should take place
    - By default, `genfromtxt` assumes `delimiter=None`, meaning that the line is split along white spaces (including tabs)
    - Common delimiters include a comma (,), a semicolon (;), or tab ('\t')

- `dtype` is used to define the data type of the resulting array. `float` is the default value
    - If `dtype = None`, the dtypes will be determined by the contents of each column, individually

- `skip_header` is used to skip at the beginning of the file. Default is `skip_header=0`

- `names` is a Boolean used to define the field names in a structured dtype
    - If `names=True`, the field names are read from the first line after the first `skip_header` lines

### Take two
Let's read the file again using the structured data type

In [None]:
data2 = np.genfromtxt(
    './data/stockholm_td_adj2.csv', 
    delimiter=',', # comma separated values
    names = True, # the field names are read from the first valid line
    dtype = None) # guess the dtype of each column
data2

Above array are said to have structured datatype, or, structured arrays. Indexing with the field name allows us to access (and modify) individual fields of a structured array:

In [None]:
data2['Year']

### Writing data to a file with NumPy
The function `numpy.savetxt()` can store a Numpy array to a file in CSV format:

In [None]:
M = np.random.rand(3, 3)
print('Random matrix M:', M)
print('---- default options:')
np.savetxt('random-matrix.csv', M)             # savetxt() calls overwrite any existing file. Be cautious!
!cat random-matrix.csv
print('---- using five decimal floats:')
np.savetxt('random-matrix.csv', M, fmt='%.5f') # fmt specifies the float string format (5 decimal points)
!cat random-matrix.csv

### Saving data using Numpy's native file format
- It can easily be realized from the previous slide that precision can be lost when writing data in text format and reading the data at a later time
- For these reasons (and reducing disk space), NumPy has functions for writing and reading data in its native binary format
    - These functions are `numpy.save` and `numpy.load`

In [None]:
np.save("random-matrix.npy", M)     # The file created is a binary file
!file random-matrix.npy

In [None]:
M2 = np.load("random-matrix.npy")
np.sum(np.abs(M2 - M))              # These data are guaranteed to equal with bit-precision 

### Manipulating arrays with index slicing
- We have seen index slicing with lists, strings, and tuples

- Same syntax can be used with arrays, where an index is expected `M[..., lower:upper:step, ...]`
- For a one-dimensional array, it is exactly like the slicing in native Python:

In [None]:
v = np.array([1, 2, 3, 4, 5])
print(v[1:3])

We can omit any of the three parameters in `M[lower:upper:step]`:

In [None]:
print(v[:3])        # first 3 elements
print(v[3:])        # elements after the third element
print(v[:0:-1])     # lower, upper, step all take the default values

### Slicing with negative indices
Let's see some examples of slicing with negative indices:

In [None]:
v = np.array([1, 2, 3, 4, 5])
print(v[:-3])             # elements up to third from the end
print(v[-3:])             # last 3 elements
print(v[::-1])            # reverse order


### Assigning with slices
Array slices are *mutable*, i.e, they can be on the left-hand side of an assignment

In [None]:
v = np.array([1, 2, 3, 4, 5])
v[1:3] = [-2,-3]
v

### Slicing with multidimensional arrays
Index slicing works exactly the same way for multidimensional arrays

We first build an array using a list comprehension:

In [None]:
A = np.array([[n + m*10 for n in range(5)] for m in range(3)])
print(A)

In [None]:
# a block from the original array
A[1:3, 3:5]

In [None]:
# strides
A[::2, ::2]

### Fancy indexing
Fancy indexing is the name for when an array or list is used in-place of an index:

In [None]:
row_indices = [1, 2, 3]
A[row_indices]

In [None]:
col_indices = [1, 2, -1] # remember, index -1 means the last element
A[row_indices, col_indices]

### Index masks
We can also use index masks: If the index mask is an NumPy array of data type `bool`, then an element is selected (True) or not (False) depending on the value of the index mask at the position of each element: 

In [None]:
B = np.array([n for n in range(5)])
B

In [None]:
row_mask = np.array([True, False, True, False, False])
B[row_mask]

In [None]:
# same thing
row_mask = np.array([1,0,1,0,0], dtype=bool)
B[row_mask]

### Creating index masks from element-wise comparisons
This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [None]:
x = np.arange(0, 10, 0.5)
x

In [None]:
mask = (5 < x) * (x < 7.5)
mask

In [None]:
x[mask]

### Key Points
- Use the NumPy math functions and avoid problems
- Save data in native format to avoid loosing precision and unforeseen errors

### Further Readings
- For more details on `genfromtxt()`, see [here](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html). 