# W5 - September 20 - NumPy: Basics

NumPy is a third-party library (not in the Standard Libraray), included with the Anaconda Distribution. Hence, you don't need to install it in the default Anaconda environment (`base`), and can import is as follows

In [1]:
import numpy as np

`np` is the recommended alias for `numpy`, and it's best to stick to convention.

**Why use NumPy?**

NumPy provides memmory-efficient containers that provide fast numerical operations for multi-dimensional arrays. It's specifically designed for scientific computation.

https://numpy.org/doc/

https://scipy-lectures.org/

## The NumPy Array object

The NumPy array is the basic NumPy object, and can be used to store a variety of engineering data.

### Creating arrays

Arrays can be 1D

In [None]:
array_1d = np.array([10, 20, 30])
array_1d

In [None]:
type(array_1d)

In [None]:
array_1d.shape

In [None]:
len(array_1d)

Arrays can also be multi-dimensional, by passing a list of lists to the `np.array()` function

In [None]:
array_2d = np.array([[10, 20, 30], [100, 200, 300]])
array_2d

In [None]:
array_2d.ndim

In [None]:
array_2d.shape

In [None]:
len(array_2d) # only returns the size of the first dimension

There are some helpful functions to create ordered arrays, which may feel familiar (as there are similar ones in MATLAB)

In [None]:
array_linear = np.arange(10) # Similar to python's range function
array_linear

In [None]:
array_spaced = np.linspace(10, 50, 9) # start, end, num-points
array_spaced

In [None]:
array_spaced = np.linspace(10, 50, 8, endpoint=False)
array_spaced

Some other useful functions

In [None]:
array_ones = np.ones(8)
array_ones

In [None]:
array_zeros = np.zeros((4, 4)) # Note that the dimension is passed as a tuple, not as two arguments
array_zeros

In [None]:
array_eye = np.eye(3)
array_eye

In [None]:
array_diag = np.diag(np.array([55, 66, 77, 88]))
array_diag

#### NumPy data types

NumPy automatically detects the data type used

In [None]:
array_diag.dtype

In [None]:
array_decimals = np.array([1, 3.14, 7.42, 0.58])
array_decimals.dtype

But the default is float

In [None]:
array_eye.dtype

You can explicitly specify if you want a different data type

In [None]:
array_eye_int = np.eye(3, dtype=int)
array_eye_int.dtype

### Indexing and slicing

These work the same as in python, with zero indexing

In [None]:
array_linear = np.arange(10,21)
array_linear

In [None]:
array_linear[4]

In [None]:
array_linear[2:8:2] # start, end, step

Defaults: start=0, end=last, step=1

In [None]:
array_linear[:5]

In [None]:
array_linear[::-1]

For multidimensional arrays, multiple indexes are required

In [None]:
array_random = np.random.random((3,3))
array_random

In [None]:
array_random[1, 1]

In [None]:
# The first index corresponds to the row, and the second to the column
array_random[0, 2] # row 1, column 3

New values can be assigned using any of these techniques

In [None]:
array_linear = np.arange(10,21)
array_linear

In [None]:
array_linear[0] = 100
array_linear

You can also index with an array of integers

In [None]:
array_linear[[3, 6, 3, 8, 2, 3]] # indexes can be repeated

In [None]:
array_linear[[3, 6, 8]] = -100
array_linear

You can even create a new array from indexes. The new array has the same shape as the array of integers

In [None]:
array_linear = np.arange(10,21)
array_linear

In [None]:
indexes = np.array([[2, 8], [7, 3]])
array_linear[indexes]

#### Casting arrays

**Assignment will never change the dtype!** To add an element of a different data type, you must first ***cast*** the elements of the array to the new data type.

In [None]:
array_linear = np.arange(10,21)
array_linear

In [None]:
array_linear[0] = 111.111
array_linear

The float is truncated to an integer.

To get around this, you need to use the `astype()` method

In [None]:
array_linear = array_linear.astype(float)
array_linear

In [None]:
array_linear[0] = 1234.5678
array_linear

To go back to `int`, remember to round first, or it will truncate every value, as seen below

In [None]:
array_linear = array_linear.astype(int)
array_linear

So instead, we use the `np.around()` function, that rounds every element in the array:

In [None]:
#Re-creating the array
array_linear = np.arange(10,21, dtype=float)
array_linear[0] = 1234.5678
array_linear

In [None]:
array_linear = np.around(array_linear).astype(int)
array_linear

#### Copies and views

Unlike with lists, slicing creates a ***view*** of the original array. This means that the array is not copied in the computer memory.

In [None]:
array_original = np.arange(10)
array_original

In [None]:
array_sliced = array_original[:5]
array_sliced

In [None]:
np.may_share_memory(array_original, array_sliced)

The `np.may_share_memory()` function checks if the two arrays share comuter memory.

In [None]:
array_sliced[0] = 100
array_sliced

When modifying the view, the original array is modified as well

In [None]:
array_original

#### Boolean masks

*Fancy indexing* with Boolean masks creates ***copies*** of the original array

In [2]:
array_original = np.arange(10)
array_original

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

A boolean mask is an array with `dtype('bool')`

In [None]:
mask = (array_original % 2 == 0)
mask

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

In [None]:
mask.dtype

With this mask, you can extract values that correspong to `True`

In [None]:
array_masked = array_original[mask]
array_masked

array([3, 4, 5, 6, 7, 8, 9])

### Array shape manipulation

**Flattening**

In [None]:
a = np.array([[1, 2, 3], [4, 5, 6]])
a

In [None]:
b = a.ravel()
b

**Reshaping**

In [None]:
b.reshape(3,2)

`-1` can be used to infer the last value

In [None]:
b.reshape(2,-1)

**Adding a dimension**

In [None]:
a = np.array([1, 2, 3])
a

In [None]:
a[:, np.newaxis]

In [None]:
a[np.newaxis, :]

### NumPy with data files

**CSV files**

In [None]:
data = np.genfromtxt("data.csv", delimiter=",", skip_header=1)
data

In [None]:
type(data)

**Text files**

You can write a NumPy array to file

In [None]:
np.savetxt("data_saved.txt", data)

And load it using the `np.loadtxt()` function

In [None]:
data = np.loadtxt("data_saved.txt")
data

But the `np.genfromtxt()` function is generally a better option, as it lets's you specify `dtype`, `delimiter`, `skip_header`, `usecols`, `missing_values`, etc.

In [None]:
data = np.genfromtxt("data.txt", skip_header=1)
data

**NumPy's own format**

NumPy has its own binary format, not portable but with efficient I/O:

In [None]:
np.save("data_saved.npy", data)

In [None]:
data = np.load("data_saved.npy")
data