In [None]:
import numpy as np

# Accessing and manipulating arrays

In the previous notebook we explored the creation of NumPy arrays. Now we'll
learn how to access and manipulate elements within arrays, and also
how to extract specific elements out of arrays.

## $ \S 1 $ Accessing and modifying individual array elements

Recall that lists in Python are __mutable__, meaning that we may modify
the individual elements of a list:

In [None]:
primes = [199, 1_999, 19_999]
primes[2] = 19
print(primes)

In contrast, a Python tuple is __immutable__. We can still access its
elements through `[]`, but we can't modify them:

In [None]:
fruits = ('🍎', '🍊', '🍍')
print(fruits[0])

In [None]:
fruits[0] = '🍉'

NumPy arrays are also mutable, like lists. Consider the following vector $ \mathbf{a} $:

In [None]:
a = np.array([1, 2, 3])
print(a)

To access or modify the $ 0 $-th element of $ \mathbf a $ (recall that we always
count from $ 0 $ in Python), we use the same syntax as we would if it were a
list:

In [None]:
print(a[0])  # Access 0-th element of `a`
a[0] = -1    # Modify this element
print(a)     # Print the result

If we are dealing with a $ 2D $ array, we use `[i, j]` to access its $ (i, j)
$-th entry, that is, the element in row $ i $ and column $ j $.

__Exercise:__ Modify the off-diagonal elements in the following matrix $ A $ to
turn it into the $ 2 \times 2 $ identity matrix.

In [None]:
A = np.ones((2, 2))
print("Before modifications:")
print(A, '\n')

# ...
# ...
print("After modifications:")
print(A)

In general, when dealing with an $ n $-dimensional array, use
`[k_1, k_2, ..., k_n]` to access the element having indices
$ k_1, k_2, \cdots, k_n $, respectively.

__Exercise:__ Build a $ 3D $ identity array $ M $ of shape $ (5, 5, 5) $ by
first populating it with zeros, then setting all elements whose indices
have the form $ (i, i, i) $ to $ 1 $ in the following two ways: 

(a) Using a `for` loop.

(b) With the single call `fill_diagonal(M, 1)`.

In [None]:
# Populate M with zeros:
# M = ...

# Set diagonal elements equal to 1:
# ...

# Print the result:
# print(M)


Like Python lists, NumPy arrays support negative indexing. The index $ -1 $
refers to the last element, $ -2 $ to the second-to-last, and so on:

__Exercise:__ Modify the last coordinate of the vector
$ \mathbf{v} \in \mathbb{R}^6 $ below so that its length becomes $ 3 $. _Hint:_
Recall that the norm (length) of a vector $ \mathbf{v} = (v_1, v_2, \cdots, v_n) $
in $ \mathbb{R}^n $ is given by $ \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} $.

In [None]:
v = np.array([1, 1, 1, 1, 1, 1])

print(f"Vector: {v}")
print(f"Norm of v = {np.linalg.norm(v):.3f}")

Negative indexing works with multi-dimensional arrays as well:

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print(A[-1, -1])  # Bottom-right element (value: 9)
print(A[-2, -3])  # Element at second-to-last row, third-to-last column (value: 4)

__Exercise:__ Given the $ 3D $ array below, access and print:

(a) The element at position $ (0, 1, 1) $.

(b) The last element of the entire array.

(c) The element at row $ 0 $, column $ 1 $ of "layer" $ 2 $.

In [None]:
B = np.array([[[1, 2], [3, 4]], 
              [[5, 6], [7, 8]], 
              [[9, 10], [11, 12]]])
print(B)
# (a)

# (b)

# (c)

## $ \S 2 $ Slicing $ 1D $ arrays

__Slicing__ is an operation that allows us to extract a subarray from an array.
The syntax for $ 1D $ arrays is exactly the same as for slices of Python lists,
namely, `a[start:stop:step]`, where `a` denotes an array and:

* `start` is the index where the slice begins (included).
* `stop` is the index where the slice ends (excluded).
* `step` is the step size between elements.

If any of these are omitted, they default to their natural boundaries (start of
the array, end of the array, and step size $ 1 $).

__Exercise:__ Let $ \mathbf a = (0, 1, \cdots, 10) $.

(a) Instantiate this array using `arange` or with `linspace` together with the instruction `a.astype(int)`.

(b) Take a slice of $ \mathbf a $ resulting in $ (0, 1, \cdots, 4) $.

(c) Slice $ \mathbf a $ to obtain the array $ (5, 6, \cdots, 10) $.

(d) Construct a slice to retrieve the subarray $ (3, 5, 7, 9) $.

(e) Obtain a new array consisting of all elements of $ \mathbf a $ in reverse order.

(f) Take a full slice of $ \mathbf a $, call it $ \mathbf b $, and modify its $ 0
$-th element. Is $ \mathbf a $ affected?

As seen in the previous exercise, _slicing an array only creates a view of
the original array, not a copy_.  A __view__ is a new array object that refers to
the same data as the original array.  This means that:
* Changes made through the view affect the original array, and vice versa.
* No memory duplication occurs when creating a view.

This behavior is by design, for efficiency reasons.

In [None]:
a = np.array([1, 2, 3, 4, 5])
b = a[1:4]  # This creates a view, not a copy
print("Original a:", a)
print("View b:      ", b)

b[0] = 8  # Modifying the view
print("After modifying b, a becomes:", a)

📝 To create a copy rather than a view in NumPy, use the `copy()` method:

In [None]:
a = np.array([1, 2, 3, 4, 5])
c = a.copy()  # This creates an explicit copy
c[1] = 8

print("Copy c:    ", c)
print("Original a:", a)  # Remains unchanged

__Exercise:__ Create a vector $ \mathbf{a} = (1, 2, 3) $ and a full slice $
\mathbf{b} $ of $ \mathbf{a} $. Check whether $ \mathbf{a} $ and
$ \mathbf{b} $ point to the same data with `np.shares_memory(a, b)`.

⚠️⚡ Note in contrast that slices of Python lists or tuples _are_ independent
copies from their originals. However, these are _shallow_ copies: only
the containers themselves are copies, not their elements. This may lead to
unexpected behavior, as in the following example:

In [None]:
original = [[1, 2, 3], [4, 5, 6]]
shallow_1 = original[:]
shallow_2 = original[:]

shallow_1[0] = [-1, -2, -3]
print(original)  # original is not affected

shallow_2[0][0] = 8
print(original)  # original is affected!

## $ \S 3 $ Slicing general arrays

Slicing becomes more interesting with higher-dimensional arrays. For a $ 2D $
array, in principle we need to specify slices for both dimensions, separated by
a comma. If we use a single slice, then we are indexing into full rows.

In [None]:
M = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
print(M, '\n')

# Accessing a specific row:
print(M[1, :], '\n')  # "First" row

# Accessing a specific column:
print(M[:, 2], '\n')  # "Second" column

# Sub-array slicing:
print(M[0:2, 1:3], '\n')  # Top right 2x2 sub-array

# If we use only one slice, then entire rows are extracted:
print(M[0:2])  # First 2 rows

__Exercise:__ 
Given the $ 3 \times 4 $ matrix below, use slicing to extract:


<img src="notebook_2_slicing.png" alt="Slicing exercise" width="870" height="174">

(a) The first two rows.

(b) The last two columns.

(c) A $ 2 \times 2 $ submatrix containing the elements at the intersection of
    the last two rows and the first two columns.

In [None]:
M = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12]])
# (a)

# (b)

# (c)

All of the principles we have seen extend to arrays of higher dimensions.

__Exercise:__ Consider a $ 3D $ array representing an RGB image of shape
$ (3, \text{height}, \text{width}) $. Write slices to:

(a) Extract the green channel only (index $ 1 $ in the $ 0 $-th dimension).

(b) Extract the central part of the image (a rectangle with half the height and
    half the width of the original image). _Hint:_ Use the interval
    $$ \texttt{[height // 4 : (3 * height) // 4]} $$
    for slicing the height dimension and similarly for the width.

(c) Downsample the image by taking every second pixel in both the height and
    width dimensions, as in the (exaggerated) example below.

<p align="center">
  <img src="notebook_2_panda_hi.JPG" width="40%" />
  <img src="notebook_2_panda_lo.JPG" width="40%" />
</p>

In [None]:
# Let's create a random image with 3 channels (R, G and B)
# with 8 bits = 256 values per channel (color intensities between 0 and 255):
image = rng.integers(0, 256, size=(3, 1280, 720))
# (a)

# (b)

# (c)

## $ \S 4 $ Other types of indexing

Let's now consider a couple of additional indexing mechanisms that allow for
more flexible selections and manipulations of arrays.

### $ 4.1 $ Integer array indexing

__Integer array indexing__ (also known as __fancy indexing__) is very simple
despite its name;  it allows us to use arrays of indices to
select elements:

In [None]:
a = np.array([0, 10, 20, 30, 40])
indices = np.array([0, 2, 3])  # Select elements at indices 0, 2, and 3
print(a[indices])

We can also supply the indices to be selected in the form of lists or tuples
(instead of arrays). Moreover, for multi-dimensional arrays, we can select
specific combinations of rows and columns. In the following example
we illustrate both of these points:

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Select diagonal elements:
rows = [0, 1, 2]  # here we are using a _list_ (not array) of indices
cols = (0, 1, 2)  # here we use a _tuple_ of indices
print(A[rows, cols])  # diagonal elements of A

Note how the indices for the rows and columns are paired in order, instead of
being combined in all possible ways.

__Exercise:__ For the same matrix $ A $ as above, use integer array indexing
to select the elements in the anti-diagonal.

In [None]:
# Select elements (0,2), (1,0), (2,1) (the anti-diagonal):
rows = # ...
cols = # ...
print(A[rows, cols])

__Exercise:__ Given the $ 4 \times 4 $ matrix below, use integer array indexing
to:

(a) Extract the four corner elements.

(b) Extract the elements on both diagonals (main diagonal and anti-diagonal).

In [None]:
M = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12],
              [13, 14, 15, 16]])
print(M)
# (a)

# (b)

### $ 4.2 $ Combining different types of indices

We can mix different types of indices (slices, integers, arrays) to create more
complex selections. Consider the following matrix $ M $:

In [None]:
M = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12],
              [13, 14, 15, 16]])
print(M)

One alternative is to specify a list of the indices that we want to index into
for some of the dimensions.  This allows us to capture sets of indices that do
not follow the pattern of a slice.

In [None]:
# Select rows 0, 1 and 3, all columns:
print(M[[0, 1, 3], :])

In [None]:
# Select all rows, columns 1 and 3:
print(M[:, [1, 3]])

If instead of using a slice or list of indices, we specify a single index for
some dimension, then that dimension "collapses". In particular, the resulting
array will have a smaller rank. This is illustrated by the following example:

In [None]:
# Extracting row 1 through a double slice:
print(M[1:2, :])  # The result is still a 2D array

# Extracting row 1 by indexing into it:
print(M[1, :])  # The result is a 1D array

__Exercise:__ Create a $ 2D $ array of shape $ (4, 4) $ filled with
random integers and print it for reference.

(a) Extract the last row to produce a $ 1D $ array.

(b) Extract the last column, as a $ 2D $ array.

(c) Extract a $ 2\times 2 $ sub-array from the center of this array.

(d) Extract the $ 3 \times 2 $ lower-left corner of the array.

(e) Extract the subarray consisting of columns indexed by $ 1 $ and $ 3 $
    in two ways: by using a slice with step size $ 2 $ and by using a list of
    indices.

## $ \S 5 $ Boolean indexing

### $ 5.1 $ Creating boolean masks and filtering arrays with masks

__Boolean indexing__ is a powerful feature that allows us to select elements
from an array based on _conditions_ instead of indices.
* First we create a __boolean mask__, i.e., an array of `True` and `False`
  values, having the same shape as the original array.
* Then we use this mask to filter those elements out of our array where the mask
  is `True`.

Here's an example where we wish to extract all the components of an array that
are greater than $ 2 $:

<img src="notebook_2_boolean_indexing.png" alt="Boolean indexing" width="796" height="131">

In [None]:
v = np.array([5, 1, 3, 2, 4])
print("Original array:", v)

# Create a boolean mask:
mask = v > 2
print("Filter: ", mask)

# Use boolean indexing to filter the elements that are greater than 2:
selected_elements = v[mask]
print("Selected elements:", selected_elements)

📝 In NumPy, and Python in general, `True` is treated as
equivalent to $ 1 $ in numerical contexts and `False` is equivalent to
$ 0 $. Thus, in the preceding example, we can compute the number of elements
greater than $ 2 $ by taking the sum of the entries in the mask (we will
return to the `np.sum` function later):

In [None]:
print("# of elements greater than two:", np.sum(filter))

Boolean indexing is particularly useful for data cleaning.  We can also filter
elements in one step; this syntax is very common in the NumPy ecosystem,
although it may seem somewhat strange when one first encounters it:

In [None]:
u = np.array([-3, 3, -2, 2, -1, 1, 0])
result = u[u <= 0]
print(result)

__Exercise:__ Consider the matrix $ A $ given below.

(a) Extract its elements that are $ \ge 2 $ using boolean indexing.

(b) What is the shape of your mask? What is the shape of the resulting array? In
    what order are the elements filtered?

In [None]:
A = rng.integers(0, 5, size=(3, 4))
print(A)

📝 When using boolean indexing on a multi-dimensional array, the result is
always a $ 1D $ array containing the elements where the mask is true.

⚠️ Boolean indexing always generates a new array, even when the result is otherwise identical to the original array. Therefore, it must be used with caution on very large data.

In [None]:
v = np.array([1, 2, 3, 4, 5])
u = v[v > 0]  # Filters the elements of v > 0, which happens to be all of v

# Checking the memory addresses of u and v to see whether they match:
print(id(u))
print(id(v))

### $ 5.2 $ Combining boolean operations

We can craft more complex masks by using the familiar boolean operators
__negation__, __and__, __or__ and __exclusive or__ (__xor__). However, the Python
versions of the first three, `not`, `and`, `or`, respectively, do not work with
boolean arrays. Instead we should use `~`, `&` and `|`, respectively. The
xor operator is denoted by `^`. Their truth tables are given below.

<img src="notebook_2_truth_tables.png" alt="Truth tables" width="752" height="168">

In [None]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Select elements greater than 2 AND less than 7:
print(a[(a > 2) & (a < 7)])  # Result: [3, 4, 5, 6]

# Select elements less than 3 OR greater than 7:
print(a[(a < 3) | (a > 7)])  # Result: [1, 2, 8, 9]

# Negate a condition with ~:
print(a[~(a % 3 == 0)])  # Select numbers NOT divisible by 3: [1, 2, 4, 5, 7, 8]

# Extract all numbers that multiples of 2 OR 3, but NOT multiples of 6:
print(a[(a % 2 == 0) ^ (a % 3 == 0)])

📝 Make sure to put parentheses around each condition when combining it.

__Exercise:__ Given the array below, use boolean indexing to extract:

(a) Numbers that are both positive and even.

(b) Numbers that are either negative or greater than $ 20 $.

(c) Numbers that are either divisible by $ 3 $ or positive, but not both.

In [None]:
data = np.array([15, 23, -10, 0, 42, -7, 8, 12, -25, 30])
# (a)

# (b)

# (c)

### $ 5.3 $ The `where` function

The `np.where` function serves two primary purposes.  When called on a
single argument as `np.where(condition)`, the result consists of the
_indices_ where this condition is true.  This provides a powerful way to locate
elements of interest within an array.

In [None]:
A = np.array([[1, 2, 4], 
              [5, -3, 9],
              [-7, 6, -8]])

negative_indices = np.where(A < 0)
print("Indices of negative elements:", negative_indices)

Indices of negative elements: (array([1, 2, 2]), array([1, 0, 2]))


In this example, note that the first array in the tuple yields the row indices and
the second the column indices of the negative elements in $ A $. When we
pair these indices in order, we get the coordinates of these elements inside
$ A $, e.g., the entry `A[2, 0]` is $ -7 $. If we want the elements themselves
for which the condition holds, we can then use these indices exactly as in $ \S 4.1 $:

In [9]:
print("Negative elements in A: ", A[negative_indices])

Negative elements in A:  [-5 -7 -9]


📝 In the single-parameter version of `where`, the argument must be a Boolean array, and the result
will simply be the positions (indices) of the elements that are `True`.

__Exercise:__ Referring to the same matrix $ A $ as above:

(a) Determine the row and column indices of the elements in $ A $
that leave a remainder of $ 1 $ when divided by $ 3 $.

(b) Use the indices obtained in (a) to construct an array consisting of the
elements of $ A $ that satisfy the stated condition.

To illustrate the second use of the `where` function, consider the following simple array.

In [2]:
numbers = np.array([1, 2, -3, 4, -5, 6, -7, 8])
print(numbers)

[ 1  2 -3  4 -5  6 -7  8]


Suppose that we want to replace those elements in `numbers` that are negative by $ 0 $.
We can do this as follows:

In [4]:
# Replace negative values with zeros, keep positive values:
result = np.where(numbers > 0, numbers, 0)
print("After replacing negatives with zeros:", result)

After replacing negatives with zeros: [1 2 0 4 0 6 0 8]


Thus, in the form `np.where(condition, x, y)`, this function can work as a
vectorized ternary operator (similar to `? :` in C), allowing us to make
element-wise choices between two arrays based on a condition. 

__Exercise:__ Suppose that we are building a weather forecast app. We need
to be able to display the temperatures in Fahrenheit or Celsius degrees
according to whether the user is US-based or not. Use `where` to accomplish
this for the following set of $ 10 $ temperatures:

In [None]:
# Array of temperatures in Celsius:
celsius = np.array([20, 25, 30, 35, 40, 15, 10, 5, 0, -5])
# Corresponding array of temperatures in Fahrenheit (F = C * 9/5 + 32):
fahrenheit = celsius * 9/5 + 32
# Boolean mask that tells whether the user is US-based (True) or not (False):
is_us_visitor = np.array([True, True, False, False, True, False, True, True, False, True])

# Use np.where to display temperatures in appropriate units based on visitor location:
# display_temps = ???

print("Temperatures to display (F for US, C for others):\n", np.round(display_temps))

Temperatures to display (F for US, C for others):
 [ 68.  77.  30.  35. 104.  15.  50.  41.   0.  23.]
