<a href="https://www.hydroffice.org/epom/"><img src="images/000_000_epom_logo.png" alt="ePOM" title="Open ePOM home page" align="center" width="12%" alt="Python logo\"></a>

<a href="https://piazza.com/e-learning_python_for_ocean_mapping/fall2019/om100/home"><img src="images/help.png" alt="ePOM" title="Ask questions on Piazza.com" align="right" width="10%" alt="Piazza.com\"></a>
# Introduction to NumPy

The efficient storage and manipulation of numerical arrays is often critical when processing ocean data. This notebook will present a specialized Python package for handling such numerical arrays: [NumPy](https://www.numpy.org/).

Since NumPy arrays are present in (nearly) all the Python scientific packages, time spent learning how to use NumPy effectively helps with many aspects of ocean data science.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The [NumPy](https://www.numpy.org/) package has a key role for scientific computing with Python. 

NumPy (short for Numerical Python) is centered around a powerful N-dimensional array object, but also provides other useful capabilities like linear algebra, Fourier transform, functions to create random numbers, etc.

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

Detailed documentation, tutorials, and other resources can be found at [https://www.numpy.org](https://www.numpy.org).

Before starting to use `numpy`, you have to execute the following cell. Together with code introduced in the past notebooks, this imports `numpy` and assign it the commonly-adopted short name of `np`:

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
import os
import matplotlib.pyplot as plt
import numpy as np

sys.path.append(os.getcwd())

Similarly to what was done in the [Introduction to Matplotlib notebook](VIS_000_Intro_to_Matplotlib.ipynb#Introduction-to-Matplotlib), the cell below retrieves and prints the NumPy version:

In [None]:
print("NumPy version: %s" % (np.__version__, ))

***

## Creating an Array Filled with Zeros 

The first `numpy` function that we introduce is [`zeros()`](https://www.numpy.org/devdocs/reference/generated/numpy.zeros.html?#numpy.zeros). When a single integer is passed, this function creates an 1D array with a number of zero equal to the passed value:

In [None]:
arr = np.zeros(8)
print(arr)

It is also possible to create multi-dimensional array by passing a [`tuple`](https://docs.python.org/3.6/library/stdtypes.html?#tuples). Each value in the `tuple` define the array size for each dimension.

Thus, the following code creates a 2D array with 8 rows and 2 columns passing `(8, 2)`:

In [None]:
arr = np.zeros((8, 2))
print(arr)

Similarly, for 3D (and higher) arrays:

In [None]:
arr = np.zeros((3, 8, 2))
print(arr)

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

You can define how many decimal numbers are visualized with `print()` by using [set_printoptions()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html).

In [None]:
np.set_printoptions(precision=2, floatmode='fixed')

arr = np.zeros(8)
print(arr)

## Common NumPy Terminology

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Each dimension of a `numpy` array is called an **axis**.

The number of axes of a `numpy` array can be retrieved using [`ndim`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ndim.html):

In [None]:
arr = np.zeros((3, 8, 2))
print("Array dimensions: %d" % arr.ndim)

The number of elements along each of the axes is the **length** of the axes.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The length of all the axes is the **shape** of an array. You can retrieve the array shape using [`shape`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.shape.html).

In [None]:
arr = np.zeros((3, 8, 2))
print("Array shape: %s" % (arr.shape, ))

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The **size** of an array is represented by the total number of elements. You can retrieve the array size using [`size`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.size.html).

In [None]:
arr = np.zeros((3, 8, 2))
print("Array size: %s" % (arr.size, ))  # = 3 x 8 x 2

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

A NumPy array is homogeneous. This means that all the elements of an array have the same **data type**. 

You can retrieve an array's data type using [`dtype`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.dtype.html).

In [None]:
arr = np.zeros((3, 8, 2))
print("Array data type: %s" % (arr.dtype, ))

The above cell prints `float64` for the array data type. The `zeros()` function uses such a data type by default.

The table below provides some of the many data types available in NumPy:

| Data Type	  | Description                                          |
|-------------|------------------------------------------------------|
| `bool`      | Boolean (True or False)                              |
| `int8`      | Integer (-128 to 127)                                | 
| `int16`     | Integer (-32768 to 32767)                            |
| `int32`     | Integer (-2147483648 to 2147483647)                  |
| `int64`     | Integer (-9223372036854775808 to 9223372036854775807)| 
| `uint8`     | Unsigned integer (0 to 255)                          | 
| `uint16`    | Unsigned integer (0 to 65535)                        | 
| `uint32`    | Unsigned integer (0 to 4294967295)                   | 
| `uint64`    | Unsigned integer (0 to 18446744073709551615)         | 
| `intp`      | Integer used for indexing                            | 
| `float16`   | Half precision float                                 | 
| `float32`   | Single precision float                               | 
| `float64`   | Double precision float                               | 
| `complex64` | Complex number, represented by two `float32`         | 
| `complex128`| Complex number, represented by two `float64`         | 

You can explicitly define the data type by passing the `dtype` parameter. 

In [None]:
arr = np.zeros((3, 8, 2), dtype=np.int32)
print("Array data type: %s" % (arr.dtype, ))

<img align="left" width="6%" style="padding-right:10px;" src="images/info.png">

To learn more about data types in NumPy, read the [Data type objects (dtype) page](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html).

## Other Mechanisms to Create NumPy Arrays

The `zeros()` function is just one of the several ways to create an array in `numpy`. 

For instance, to create an array of a given shape, but initialized with a specific value (e.g., `8.0`), you can use the [`full()`](https://www.numpy.org/devdocs/reference/generated/numpy.full.html) function:

In [None]:
arr = np.full((3, 5), 8.0)
print(arr)

If you want to create an array filled with a sequence of values, you may use the [`arange()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html) function or the [`linspace()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html) function by selecting the one that fulfills your specific needs:

In [None]:
arr = np.arange(start=0, stop=40, step=8)  # the `stop` value is not included
print(arr)

In [None]:
arr = np.linspace(start=0, stop=40, num=5)  # the `stop` value is included
print(arr)

It is also possible to convert a `list()` or a list of lists into a `numpy` array using the [`array()`](https://www.numpy.org/devdocs/reference/generated/numpy.full.html?#numpy.array) function:

In [None]:
sal_list = [34.4, 34.1, 33.6, 31.7, 31.3, 31.2, 31.0]
sal_arr = np.array(sal_list)
print("array shape: %s" % (sal_arr.shape,))  # nr. of dimensions: 1
print(sal_arr)

In [None]:
sal_list = [34.4, 34.1, 33.6, 31.7, 31.3, 31.2, 31.0]
temp_list = [11.2, 11.0, 13.7, 16.0, 16.1, 16.2, 16.1]
data_arr = np.array([sal_list, temp_list])
print("array shape: %s" % (data_arr.shape,))  # nr. of dimensions: 2
print(data_arr)

---

## Matplolitb Support

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Matplotlib knows how to read `numpy` arrays. 

The next code cell contains a modified version of the example provided in the [Intro to Matplotlib](VIS_000_Intro_to_Matplotlib.ipynb#Customizing-your-plots) notebook:

In [None]:
sal_list = [34.4, 34.1, 33.6, 31.7, 31.3, 31.2, 31.0]
temp_list = [11.2, 11.0, 13.7, 16.0, 16.1, 16.2, 16.1]

# conversion from lists to numpy arrays
sal_arr = np.array(sal_list)
temp_arr = np.array(temp_list)

# plot creation
plt.plot(sal_arr, temp_arr, 
         color='green', linewidth=1.5, linestyle='dotted', 
         marker='o', markersize=12, markerfacecolor='yellow', markeredgecolor='red') # new code
plt.axis([32.0, 34.0, 11.0, 16.0])  
plt.title("T-S Diagram")  
plt.xlabel("Salinity[PSU]") 
plt.ylabel("Temperature[Celsius]")
plt.grid()
plt.show()

***

## Mathematical and Statistical Methods

Numpy arrays have many efficient mathematical and statistical methods. This is one of the motivations to use them in place of `list` containers.

The cell below shows how to use some of these methods. If you want to learn more about them, visit [this Numpy page](https://www.numpy.org/devdocs/reference/arrays.ndarray.html#calculation).

In [None]:
sal_list = [34.4, 34.1, 33.6, 31.7, 31.3, 31.2, 31.0]
sal_arr = np.array(sal_list)
print("The average value is %f" % sal_arr.mean())
print("The standard deviation is %f" % sal_arr.std())
print("The min value is %f" % sal_arr.min())
print("The max value is %f" % sal_arr.max())

<img align="left" width="6%" style="padding-right:10px;" src="images/test.png">

Calculate the average value, the minimum, and the maximum for the `temp_list`, then superimpose those values in a plot showing the temperature values.

*Hint: read about [`axhline()`](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.axhline.html) and [`legend()`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.legend.html) in the Matplotlib documentation.*

In [None]:
temp_list = [11.2, 11.0, 13.7, 16.0, 16.1, 16.2, 16.1]

temp_arr = np.array(temp_list)
mean_temp = temp_arr.mean()
min_temp = temp_arr.min()
max_temp = temp_arr.max()
print("The average value is %f" % (mean_temp, ))
print("The minimum is %f" % (min_temp, ))
print("The maximum is %f" % (max_temp, ))

# plot creation
plt.plot(temp_arr, color='orange', marker='o', label='data')
plt.axhline(y=mean_temp, color='green', linestyle='dashed', label='avg')
plt.axhline(y=min_temp, color='blue', linestyle='dotted', label='min')
plt.axhline(y=max_temp, color='red', linestyle='dotted', label='max')
plt.title("Temperature")  
plt.xlabel("Samples") 
plt.ylabel("Temperature[Celsius]")
plt.legend()
plt.grid()
plt.show()

In [None]:
temp_list = [11.2, 11.0, 13.7, 16.0, 16.1, 16.2, 16.1]

***

## Array Indexing and Slicing

### One-dimensional Arrays

The cell below creates an one-dimensional array:

In [None]:
sal_arr = np.array([34.4, 34.1, 33.6, 31.7, 31.3, 31.2, 31.0])
print("1D array: %s" % (sal_arr, ))

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

Once created, you can directly access one of their values by putting its index in square brackets (i.e., `[]`).

In [None]:
index = 3
value = sal_arr[index]
print("value at index %d: %f" % (index, value))

You can also use the indexing to change a specific value: 

In [None]:
index = 3
sal_arr[index] = 32.1
print("modified 1D array: %s" % (sal_arr, ))

Similarly to [list indexing](../python_basics/002_Lists_of_Variables.ipynb#Creation-of-a-List:-Approach-#1), you can use a negative index to access values from the end of the array:

In [None]:
index = -2
value = sal_arr[index]
print("value at index %d: %f" % (index, value))

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

You can access sub-arrays using square brackets and the colon character (i.e.,`:`). This mechanism is called **slice notation**.

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The slice notation syntax is: `x[start:stop:step]`. When unspecified, the default values are `start=0`, `stop=size of axis`, `step=1`.

Thus, if you want to select the first 4 elements, you may write:

In [None]:
start = 0
stop = 4
step = 1
values = sal_arr[start:stop:step]
print("sub-array[%d:%d:%d] = %s" % (start, stop, step, values))

However, given that `start=0` and `step=1` are also the default values, you can also write to obtain the same sub-array:

In [None]:
stop = 4
values = sal_arr[:stop]
print("sub-array[:%d] = %s" % (stop, values))

You can combine the three elements of the slice notation to obtain a number of different results. 

For instance, the sub-array in the cell below contains only every third element, starting at index 1:

In [None]:
start = 1
step = 3
values = sal_arr[start::step]
print("sub-array[%d:%d] = %s" % (start, step, values))

### Multi-dimensional Arrays

The cell below creates an two-dimensional array:

In [None]:
sal_list = [34.4, 34.1, 33.6, 31.7, 31.3, 31.2, 31.0]
temp_list = [11.2, 11.0, 13.7, 16.0, 16.1, 16.2, 16.1]
data_arr = np.array([sal_list, temp_list])

print("array shape: %s" % (data_arr.shape,))  # nr. of dimensions: 2
print(data_arr)

Adopting with `data_arr` the indexing approach discussed for one-dimensional arrays, you will not get back a single value, but a whole row (i.e., all the salinity values):

In [None]:
index = 0
values = data_arr[index]
print("sub-array[%d] = %s" % (index, values))

<img align="left" width="6%" style="padding-right:10px;" src="images/key.png">

The indexing for accessing single values in multi-dimensional arrays uses several comma-separated indices.

The first value for salinity is accessed in the below cell:

In [None]:
row = 0
col = 0
value = data_arr[row, col]
print("value[%d, %d] = %s" % (row, col, value))

Similarly, we can access the last value of the temperature values:

In [None]:
row = -1
col = -1
value = data_arr[row, col]
print("value[%d, %d] = %s" % (row, col, value))

Slicing for multi-dimensional arrays work in the same way, with multiple slices separated by commas. 

For example, to access the first 4 values of salinity:

In [None]:
row = 0
col_stop = 4
values = data_arr[row, :col_stop]
print("sub-array[%d, :%d] = %s" % (row, col_stop, values))

At the beginning of this section, we have seen a way to access a specific row in an array (i.e., `values = data_arr[index]`). Another way to obtain the same result is using slicing and putting a single colon (`:`) for the column dimension:

In [None]:
print("first row = %s" % (data_arr[0, :]))

By putting a single colon (`:`) in the row dimension, you can retrieve a specific column:

In [None]:
print("third column = %s" % (data_arr[:, 2]))

In the output above, `33.6` is the third salinity value, while `13.7` is the third temperature value.

***

<img align="left" width="6%" style="padding-right:10px; padding-top:10px;" src="images/refs.png">

## Useful References

* [The official Python 3.6 documentation](https://docs.python.org/3.6/index.html)
  * [Tuple](https://docs.python.org/3.6/library/stdtypes.html?#tuples)
* [Programming Basics with Python](https://github.com/hydroffice/python_basics)
* [Introduction to Ocean Data Science - Scientific Computing slides](https://bitbucket.org/hydroffice/hyo_epom/downloads/ePOM.Intro_to_Ocean_Data_Science.Scientific_Computing.pdf)
* The NumPy Package:
  * [Website](https://www.numpy.org/)
  * [`zeros()`](https://www.numpy.org/devdocs/reference/generated/numpy.zeros.html?#numpy.zeros)
  * [`full()`](https://www.numpy.org/devdocs/reference/generated/numpy.full.html?#numpy.full)
  * [`array()`](https://www.numpy.org/devdocs/reference/generated/numpy.full.html?#numpy.array)
  * [`arange()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html)
  * [`linspace()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html)
  * [`copy()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.copy.html)

<img align="left" width="5%" style="padding-right:10px;" src="images/email.png">

*For issues or suggestions related to this notebook, write to: epom@ccom.unh.edu*

<!--NAVIGATION-->
| [Contents](index.ipynb) | [Adopting NumPy in Class Methods >](COMP_001_Adopting_NumPy_in_Class_Methods.ipynb)