## CS102-4 - Further Computing

Prof. Götz Pfeiffer<br>
School of Mathematics, Statistics and Applied Mathematics<br>
NUI Galway

### 1. Aspects of Scientific Computing

# Week 3: `NumPy` arrays: Attributes, Indexing, Reshaping

* `numpy` arrays are **homogeneous** **multi-dimensional** collections of data.
* Some **attributes** of such an array can be directly accessed.
* Basic manipulation of the data involves **indexing** to access single elements
  and **slicing** to access subarrays.
* `numpy` extends `python`s set of indexing and slicing operators.
* The **shape** of a `numpy` array can be modified, without affecting the data it contains.

In [None]:
import numpy as np

## Array Attributes

* start with three random arrays, a one-dimensional, two-dimensional, and three-dimensional array.
* `np.random.randint` constructs such random arrays of integers in a given range.

In [None]:
x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [None]:
x1

In [None]:
x2

In [None]:
x3

Each array has the attributes 
* `dtype`: the data type of the array.
* `ndim`: the number of dimensions,
* `shape`: the size of each dimension, and
* `size`: the total size of the array.

In [None]:
print("x3 dtype:", x3.dtype)
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

* Obviously, `size` is the product of the numbers in the list `shape`, and `ndim` is the length of that list.

Other attributes include 
* `itemsize`, which lists the size (in bytes) of each array element, and 
* `nbytes`, which lists the total size (in bytes) of the array.

In [None]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

* In general, `nbytes` should be equal to `itemsize` times `size`.

## Indexing: Accessing Single Elements

* In a one-dimensional array, the $i^{th}$ value (counting from zero) can be accessed by specifying the desired index in square brackets, just as with `python` lists:

In [None]:
x1

In [None]:
x1[0]

In [None]:
x1[4]

* To index from the end of the array, you can use **negative** indices:

In [None]:
x1[-1]

In [None]:
x1[-2]

* In a **multi-dimensional** array, items can be accessed using a comma-separated tuple of indices:

In [None]:
x2

In [None]:
x2[0, 0]

In [None]:
x2[2, 0]

In [None]:
x2[2, -1]

* Values can also be modified using any of the above index notation:

In [None]:
x2[0, 0] = 12
x2

* `NumPy` arrays have a fixed type.
* Assigning objects of a different into an array will either result in an error, or
  in a silent type conversion.

In [None]:
x1[0] = 3.14159  # this will be truncated!
x1

In [None]:
a = np.array(["ABC", "CDE"])

In [None]:
a[0] = 1

In [None]:
a

The command
```python
x1[1] = "abc"
```
would result in an error ...

## Slicing: Accessing Subarrays

* The `NumPy` slicing syntax follows that of the standard `python` list.
* To access a slice of an array ``x``, use
``` python
x[start:stop:step]
```
where the `:step` part is optional.
* If any of these are unspecified, they default to the values 
$0$ for `start`, the size of the dimension for `stop`, and $1$ for `step`.

### One-dimensional slicing

In [None]:
x = np.arange(10)
x

In [None]:
x[:5]  # first five elements

In [None]:
x[5:]  # elements after index 5

In [None]:
x[4:7]  # middle sub-array

In [None]:
x[::2]  # every other element

In [None]:
x[1::2]  # every other element, starting at index 1

* When the `step` value is negative, the defaults for `start` and `stop` are swapped.
* This gives a convenient way to reverse an array

In [None]:
x[::-1]  # all elements, reversed

In [None]:
x[7::-2]  # reversed every other from index 7

### Multi-dimensional slicing

* Multi-dimensional slices work similar, with multiple slices separated by commas.

In [None]:
x2

In [None]:
x2[:2, :3]  # two rows, three columns

In [None]:
x2[:, ::2]  # all rows, every other column

* Subarray dimensions can even be reversed together:

In [None]:
x2[::-1, ::-1]

#### Accessing array rows and columns

* Single rows or columns of an array can be accessed by combining indexing and slicing.

In [None]:
print(x2[:, 0])  # first column of x2

In [None]:
print(x2[0, :])  # first row of x2

* In the case of row access, the empty slice can be omitted.

In [None]:
print(x2[0])  # equivalent to x2[0, :]

### Subarrays as no-copy views

* Array slices return **views** rather than **copies** of the array data.
* This is in contrast to `python` lists, where `l[:]` gives a copy of the list `l`.

In [None]:
print(x2)

* Let's extract a $2 \times 2$ subarray from this:

In [None]:
x2_sub = x2[:2, :2]
print(x2_sub)

* Now we modify this subarray.

In [None]:
x2_sub[0, 0] = 99
print(x2_sub)

* the original array is changed!

In [None]:
print(x2)

* When we work with large datasets, this behaviour allows us to access and process pieces of these datasets without the need to copy the entire underlying data buffer.

### Creating copies of arrays

* To make an explicit copy of the data within an array or a subarray use the `copy()` method:

In [None]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

* Now modify this copied subarray

In [None]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

* the original array is not affected:

In [None]:
print(x2)

## Reshaping of Arrays

* Another useful type of operation is reshaping of arrays.
* The most flexible way of doing this is with the `reshape` method.
* For example, if you want to put the numbers 1 through 9 in a $3 \times 3$ grid, you can do the following:

In [None]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

* Note that for this to work, the size of the initial array must match the size of the reshaped array. 
* Where possible, the ``reshape`` method will use a no-copy view of the initial array, but with non-contiguous memory buffers this is not always the case.

* Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix.
* This can be done with the ``reshape`` method, 
* or more easily by making use of the ``newaxis`` keyword within a slice operation:

In [None]:
x = np.array([1, 2, 3])

# row vector via reshape
x.reshape((1, 3))

In [None]:
# row vector via newaxis
x[np.newaxis, :]

In [None]:
# column vector via reshape
x.reshape((3, 1))

In [None]:
# column vector via newaxis
x[:, np.newaxis]

* We will make frequent use of this type of transformation.

## Array Concatenation and Splitting

* It's also possible to combine multiple arrays into one
* conversely, it's possible to split a single array into multiple arrays.

### Concatenation of arrays

* Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines `np.concatenate`, `np.vstack`, and `np.hstack`.
* `np.concatenate` takes a tuple or list of arrays as its first argument.

In [None]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

* You can also concatenate more than two arrays at once:

In [None]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

* It can also be used for two-dimensional arrays:

In [None]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [None]:
# concatenate along the first axis
np.concatenate([grid, grid])

In [None]:
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

* For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions:

In [None]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

* Similary, ``np.dstack`` will stack arrays along the third axis.

### Splitting of arrays

* The opposite of concatenation is splitting, 
* This is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``.  
* For each of these, we can pass a list of indices giving the split points:

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

* Notice that $N$ split-points, leads to $N + 1$ subarrays.
* The related functions ``np.hsplit`` and ``np.vsplit`` are similar:

In [None]:
grid = np.arange(16).reshape((4, 4))
grid

In [None]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

In [None]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

* Similarly, ``np.dsplit`` will split arrays along the third axis.

## References

### `python`

* `l[i]`: indexing [[doc]](https://docs.python.org/3/library/stdtypes.html?highlight=mutable%20sequence#sequence-types-list-tuple-range)
* `l[start:stop:step]`: slicing [[doc]](https://docs.python.org/3/library/stdtypes.html?highlight=mutable%20sequence#sequence-types-list-tuple-range)
* `slice` [[doc]](https://docs.python.org/3/library/functions.html#slice)

### `numpy`

* indexing, slicing and `newaxis`: [[doc]](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html)

## Exercises

1. Make a few `numpy` arrays, with random entries
   or ranges of integers, of varying dimensions.
2. Determine the basic attributes of these arrays.
3. Apply some indexing and slicing to the arrays.
4. Starting with a $1$-dimensional array of length $60$,
   reshape it into a $3$-dimensional array with dimensions
   of sizes $5$, $4$ and $3$, respectively.
5. Then split the array along the second dimenson,
   the one of size $4$, into two halves.
6. What does `np.newaxis` mean, and what is it used for?