## CS102-4 - Further Computing

Prof. Götz Pfeiffer<br>
School of Mathematics, Statistics and Applied Mathematics<br>
NUI Galway

### 1. Aspects of Scientific Computing

# Week 3: `numpy` arrays: Attributes, Indexing, Reshaping

* `numpy` arrays improve on `python` lists in many ways.
* `numpy` arrays are **homogeneous** **multi-dimensional** collections of data.
* As such, a `numpy` array has:
    * a **shape**, specifying its size in each dimension;
    * a common **data type** for all its elements.
* These (and related) **attributes** of an array can be directly accessed.
* Basic manipulation of the data in an array involves **indexing** to access single elements
  and **slicing** to access subarrays.
* `numpy` extends `python`s set of indexing and slicing operators.
* The **shape** of a `numpy` array can be **modified**, without affecting the data it contains.
* Multiple arrays can be **combined** into one.
* Conversely, and array can be **split** into multiple parts.  

In [None]:
import numpy as np

## Array Attributes

* a `numpy` array is a **multi-dimensional** **homogeneous** collection of data.
* in **mathematics** and **physics**, such an object is often called a **tensor**.
* a `numpy` array has a **shape** and a **dtype**.
* let's investigate these in some simple examples.
* start with three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.
* `np.random.randint` constructs such random arrays of integers in a given range.

In [None]:
x0 = np.random.randint(10) # A single random integer
x0

In [None]:
x1 = np.random.randint(10, size=4)  # One-dimensional array; size means shape
x1

In [None]:
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array: shape 3 x 4
x2

In [None]:
x3 = np.random.randint(10, size=(2, 3, 4))  # Three-dimensional: shape 2 x 3 x 4
x3

In [None]:
print(x3)

Each array has the attributes 
* `dtype`: the **data type** of the array.
* `shape`: the **size in each dimension**, and

In [None]:
print("x2 dtype:", x2.dtype)
print("x2 shape:", x2.shape)

In [None]:
print("x3 dtype:", x3.dtype)
print("x3 shape:", x3.shape)

Further attributes of interest are
* `ndim`: the **number of dimensions**, and
* `size`: the **total number** of elements.

In [None]:
print("x3 ndim: ", x3.ndim)
print("x3 size: ", x3.size)

Obviously, `size` is the product of the numbers in the list `shape`, and `ndim` is the length of that list.

In [None]:
len(x3.shape) == x3.ndim

In [None]:
# from math import prod (available from python 3.8)
x3.size == np.prod(x3.shape)

Other attributes include 
* `itemsize`, which lists the size (in bytes) of each array element, and 
* `nbytes`, which lists the total size (in bytes) of the array.

In [None]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

* In general, `nbytes` should be equal to `itemsize` times `size`.

## Indexing: Accessing Single Elements

* In a one-dimensional array, the $i^{th}$ value (counting from **zero**) can be accessed by specifying the desired index in square brackets, just as with `python` lists:

In [None]:
x1

In [None]:
x1[0]

In [None]:
x1[3]

* To index from the end of the array, you can use **negative** indices:

In [None]:
x1[-1]

In [None]:
x1[-2]

* **NEW:** In a **multi-dimensional** array, items can be accessed using **comma-separated indices**:

In [None]:
x2

In [None]:
x2[0, 0]

In [None]:
x2[2, -1]

* Values can also be **modified** using the above index notation:

In [None]:
x2[0, 0] = 12
x2

* **NOTE:** `numpy` arrays have a **fixed type**.
* Assigning objects of a different type into an array will either result in an **error**, or
  in a **silent type conversion**.

In [None]:
x1[0] = 3.14159  # this will be truncated!
x1

In [None]:
a = np.array(["ABC", "CDE"])

In [None]:
a[0] = 1

In [None]:
a

The command
```python
x1[1] = "abc"
```
would result in an error ...

## Slicing: Accessing Subarrays

* The `numpy` slicing syntax follows that of the standard `python` list.
* To access a slice of an array ``x``, use
``` python
x[start:stop:step]
```
where the `:step` part is optional.
* If any of these are unspecified, they default to the values 
$0$ for `start`, the size (of the dimension) for `stop`, and $1$ for `step`.

### One-dimensional slicing

In [None]:
x = np.arange(10)
x

In [None]:
x[:5]  # first five elements

In [None]:
x[5:]  # elements after index 5

In [None]:
x[4:7]  # middle sub-array

In [None]:
x[::2]  # every other element

In [None]:
x[1::2]  # every other element, starting at index 1

* **Note:** When the `step` value is **negative**, the defaults for `start` and `stop` are **swapped**.
* This gives a convenient way to reverse an array

In [None]:
x[::-1]  # all elements, reversed

In [None]:
x[7::-2]  # reversed every other from index 7 down to 0

### Multi-dimensional slicing

* **NEW:** Multi-dimensional slices work similar, with multiple **slices separated by commas**.

In [None]:
x2

In [None]:
x2[:2, :3]  # two rows, three columns

In [None]:
x2[:, ::2]  # all rows, every other column

In [None]:
x2[::-1, ::-1]  # reversing both rows and cols

### Accessing array rows and columns

* Single rows or columns of an array can be accessed by **combining indexing and slicing**.

In [None]:
x2[:, 0]  # first column of x2

In [None]:
x2[0, :]  # first row of x2

* Trailing empty slices can be omitted.

In [None]:
x2[0]  # equivalent to x2[0, :]

### Subarrays are no-copy views!

* Recall that, for a `python` list `l`, the slice `l[:]` is a convenient way of making a copy of the list `l`.
* **CAUTION:** Array slices are **views** rather than **copies** of the array data.
* This means that they refer to (and modify) the same underlying data as the original array.

In [None]:
print(x2)

* Let's extract a $2 \times 2$ subarray from this:

In [None]:
x2_sub = x2[:2, :2]
print(x2_sub)

* Now we modify this subarray.

In [None]:
x2_sub[0, 0] = 99
print(x2_sub)

* As a side-effect, the original array is changed, too!

In [None]:
print(x2)

* When working with **large datasets**, this behaviour allows us to access and process pieces of these datasets without the need to copy the entire underlying data buffer.

### Creating copies of arrays

* To make an explicit copy of the data within an array or a subarray use the `copy()` method:

In [None]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

* Modify this copied subarray:

In [None]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

* Now the original array is not affected:

In [None]:
print(x2)

## Reshaping of Arrays

* Another useful type of operation is reshaping of arrays.
* The most flexible way of doing this is with the `reshape` method.
* For example, to put the numbers 1 through 9 into a $3 \times 3$ matrix grid, you can do the following:

In [None]:
grid = np.arange(1, 10).reshape(3, 3)
print(grid)

* Note that for this to work, the **size** of the initial array **must match** the size of the reshaped array. 
* **CAUTION:** Where possible, the ``reshape`` method will use a **no-copy view** of the initial array.

### `newaxis`

* A shape of `(3)` is **different** from a shape of `(1,3)` or `(3,1)`.
* Thus, a common reshaping pattern is the **conversion** of a one-dimensional array into a two-dimensional **row** or **column matrix**.
* This can be done with the `reshape` method, or by making use of the `np.newaxis` keyword within a slice operation:

In [None]:
x = np.array([1, 2, 3])
print(x.shape)
x

In [None]:
# row vector via reshape
x.reshape(1, 3)

In [None]:
# row vector via newaxis
x[np.newaxis, :]

In [None]:
# column vector via reshape
x.reshape(3, 1)

In [None]:
# column vector via newaxis
x[:, np.newaxis]

* We will make frequent use of this type of transformation.

## Array Concatenation and Splitting; Axis

* It's also possible to combine multiple arrays into one.
* Conversely, it's possible to split a single array into multiple arrays.
* Usually, the **direction** of such an operation needs to be specified as an `axis` parameter.

### Concatenation of arrays

* Concatenation, or joining of two arrays in `numpy`, is accomplished by using `np.concatenate`.
* `np.concatenate` takes a **tuple or list of arrays** as its first argument.

In [None]:
x = np.array([1, 2, 3, 4])
y = np.array([3, 2, 1])
np.concatenate([x, y])

* You can also concatenate more than two arrays at once:

In [None]:
z = [99, 99]
np.concatenate([x, y, z])

* Two-dimensional arrays can be concatenated **horizontally** or **vertically**.
* The (optional) `axis` keyword argument specifies the **axis**.

In [None]:
matrix = np.arange(1,7).reshape(2,3)
matrix

In [None]:
row = np.array([[0,0,0]])
row

In [None]:
print("mat: ", matrix.shape)
print("row: ", row.shape)

In [None]:
# concatenate along the first axis (default: 0)
vert = np.concatenate([row, matrix])
vert

* In the **axis dimension**, the size of resulting matrix is the **sum** of the sizes of the input matrices.
* In all other dimensions, the input matrices must have the **same size**.

In [None]:
vert.shape

* Horizontally:

In [None]:
col = np.array([99,99])[:, np.newaxis]
col

In [None]:
print("mat: ", matrix.shape)
print("col: ", col.shape)

In [None]:
# concatenate along the second axis (zero-indexed)
horiz = np.concatenate([col, matrix], axis=1)
horiz

In [None]:
horiz.shape

* There are a few related methods with similar functionality in `numpy`.

### Splitting of arrays

* The opposite of concatenation is splitting, implemented by the `np.split` function.  
* Again we need to indicate an `axis` for the direction of the splitting.
* Additionally, we need to give the number of parts, or we can pass a list of indices giving the split points:

In [None]:
x = np.arange(12)
x

In [None]:
np.split(x, 3)

In [None]:
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

* Notice that $N$ split-points, leads to $N + 1$ subarrays.

Two-dimensional split:

In [None]:
x = x.reshape(3,4)
x

In [None]:
np.split(x, 3)

In [None]:
np.split(x, 2, axis=1)

* There are a few similar functions in the `numpy` package.

## References

### `python`

* `l[i]`: indexing [[doc]](https://docs.python.org/3/library/stdtypes.html?highlight=mutable%20sequence#sequence-types-list-tuple-range)
* `l[start:stop:step]`: slicing [[doc]](https://docs.python.org/3/library/stdtypes.html?highlight=mutable%20sequence#sequence-types-list-tuple-range)
* `slice` [[doc]](https://docs.python.org/3/library/functions.html#slice)

### `numpy`

* indexing, slicing and `newaxis`: [[doc]](https://numpy.org/doc/stable/reference/arrays.indexing.html)
* `reshape`: [[doc]](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html)
* `concatenate`: [[doc]](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)
* `split`: [[doc]](https://numpy.org/doc/stable/reference/generated/numpy.split.html)

## Exercises

1. Make a few `numpy` arrays, with random entries
   or ranges of integers, of varying dimensions.
2. Determine the basic attributes of these arrays.
3. Apply some indexing and slicing to the arrays.
4. Starting with a $1$-dimensional array of length $60$,
   reshape it into a $3$-dimensional array with dimensions
   of sizes $5$, $4$ and $3$, respectively.
5. Then split the array along the second dimenson,
   the one of size $4$, into two halves.
6. What does `np.newaxis` mean, and what is it used for?