In [1]:
import numpy as np

# Array attributes and methods

## $ \S 1 $ Attributes of arrays

### $ 1.1 $ Shape and number of dimensions

Recall from the previous notebook that, just as in Linear Algebra, a central
property of a $ 2D $ array is its **shape**, which is the element
count along each of its **axes** (vertical and horizontal).  Referring to the
example below, the shape of our matrix $ A $ is $ (3, 4) $, or $ 3 \times 4 $,
since it has three rows and four columns:

In [2]:
A = np.array([[1., 2., 3., 4.],
              [1., 4., 9., 16.],
              [1., 8., 27., 64.]])
print(A)

[[ 1.  2.  3.  4.]
 [ 1.  4.  9. 16.]
 [ 1.  8. 27. 64.]]


The __number of dimensions__, also known as the __rank__, of an arbitrary array
is stored in its `ndim` attribute:

In [3]:
print(A.shape)   # Print the shape of A
print(A.ndim)    # Print the number of dimensions of A

(3, 4)
2


The number of dimensions of an array is a positive integer, while its shape is always a tuple, even when the array is one-dimensional:

In [4]:
a = np.array([11, 13, 17])
print(a.shape)

print("Note that the shape is not '3', but rather the tuple '(3, )'")
print(type(a.shape))

(3,)
Note that the shape is not '3', but rather the tuple '(3, )'
<class 'tuple'>


__Exercise:__ What are the rank and the shape of an empty $ 1D $ array? An empty
$ 2D $ array?

### $ 1.2 $ Attributes and related terminology

An instance, or object, of a specific class, such as the array $ A $ of type `ndarray`,
is equipped with a set of predefined **attributes**. Attributes are 
_properties inherent to every instance of the class_ (in this case, that of
ndarrays). The **state** of an object is _the set of current values of all
of its attributes._

📝 To access an attribute `a` of an object `x`, the syntax is `x.a`.

For example, suppose that we want to design a Python class to represent cars.
An instance of this class would then correspond to one specific car
in the real world.  Some plausible attributes of this class could be:
* Its color (say, `color`, of type `str`).
* The year in which it was manufactured (say, `year`, of type `int`).
* Whether it is electric or not (say, `electric`, of type `boolean`).
* The fuel efficiency of the car (say, `kilometers_per_liter`, of type `float`).

And so on for any other relevant property of cars that we might want to include
in our model. Note that the values of these attributes for different car
instances will vary, in general. However, from this example we can easily imagine
a situation where two cars have exactly the same state, as defined by their
set of attribute values, and yet they are distinct objects, i.e., have different
_identities_.

### $ 1.3 $ The main attributes of arrays

Although arrays come with several attributes, most of them relate to the array's
internal representation or low-level utilities. The five most frequently used and
conceptually important are:

| Attribute   | Description                                    | Type      |
|-------------|------------------------------------------------|-----------|
| `ndim`      | Number of dimensions (rank) of the array  | `int`   |
| `shape`     | Number of elements that lie along each axis | `tuple`     |
| `size`      | Total number of elements in the array     | `int`   |
| `dtype`     | Data type of the _elements_ of the array    | `dtype`  |
| `T`         | Transpose of the array                    | `ndarray`    |


__Exercise:__ For the following arrays $ A $, $ B $ and $ \mathbf v $:

(a) Check their attributes. Is the datatype of $ \mathbf v $ what you expected?

(b) Verify that the type of each attribute matches the one described in the table
using the Python built-in function `type`.

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

B = np.array([[True, False, True],
              [False, True, False]])

v = np.array([1, 2, 3.])

NumPy arrays are designed to store elements of a single datatype, and _these
datatypes are specific to NumPy_. Many of these are based on C's native types.
This ensures high performance and a lighter memory footprint. As an example, the
datatype of the matrix $ A $ above is `int64`, which represents integers using $
64 $ bits. In contrast, the built-in Python type `int` can hold arbitrarily
large integers.

It is also possible to work
with arrays of arbitrary Python objects by using the datatype `object`. Because
in this case what is stored in the array is only a reference to each object (not
the object itself), the entries don't even need to have the same Python type:

In [13]:
complex_array = np.array(["pandas",  [1, 2, 3], {"name": "Alice"}], dtype=object)

for i, item in enumerate(complex_array):
    print(f"Item {i}: {item} (type: {type(item)})")

Item 0: pandas (type: <class 'str'>)
Item 1: [1, 2, 3] (type: <class 'list'>)
Item 2: {'name': 'Alice'} (type: <class 'dict'>)


__Exercise:__ If an array has shape $ (2, 3, 4) $, what is its size? More generally, what is the size of an array of shape $ (n_1, n_2, \cdots, n_d) $?

__Exercise:__ A _square matrix_ has the same number of rows and columns.

(a) Write a function `is_square(matrix)` that accepts a $ 2D $ array as its
argument and returns `True` or `False` depending on whether the given matrix is
square or not.  _Hint:_ Use the transpose of the matrix.

(b) How would you generalize to multidimensional arrays?

⚠️ When applied to two arrays of the same shape, the `==` operator performs an
element-wise comparison and returns a Boolean array of the same shape. Instead,
use `np.array_equal(A, B)` to check if two arrays $ A $ and $ B $ have the same
shape and elements.

In [18]:
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])

print(a == b)
print(np.array_equal(a, b))

[ True  True  True]
True


__Exercise:__ Write a Python function to determine whether a given matrix $ A $ is
symmetric (i.e., whether $ A^T = A $), anti-symmetric ($ A^T = -A $) or neither.
Does your function work correctly when $ A $ is not square?

__Exercise:__ What is the transpose of a $ 1D $ array? What about $ 3D $ arrays?
Test your conjectures on some examples. More generally, what would be the most
reasonable definition of the transpose of an $ n $-dimensional array?

## $ \S 2 $ Array methods

### $ 2.1 $ Methods and related terminology

While attributes describe the properties of an object, **methods** define what
an object can do: they are functions that are bound to the object and can access
or modify its state. Methods encapsulate behavior that is appropriate for the
class of objects they belong to, just as attributes encapsulate state.

📝 To invoke a method `m` on an object `x`, the syntax is `x.m(<arguments>)`.

Continuing the example from $ \S 1.2 $, we could think about implementing
the following methods for our class representing cars:
* Starting the engine (say, `start_engine()`, which returns nothing but changes the car's state).
* Computing the range based on current fuel level (say, `get_range()`, which returns a `float`).
* Checking if maintenance is needed (say, `needs_maintenance()`, which returns a `boolean`).

In the context of NumPy arrays, methods like `max()` or `mean()` perform
calculations on the array's data and return results, while methods like `sort()`
modify the array in place, without returing anything.

In [13]:
C = np.array([[2, 1, 3],
              [3, 2, 1]])
print(f"Sum of the values in C: {C.sum()}")

C.sort()
print(f"C after sorting:\n", C)

Sum of the values in C: 12
C after sorting:
 [[1 2 3]
 [1 2 3]]


As an optional argument to `sum`, we can designate an axis __along which__ the
sum should take place. As always in Python, indexing is zero-based, meaning that
for the matrix $ C $ above, the rows lie along axis $ 0 $ and the columns along
axis $ 1 $.

In [15]:
C = np.array([[2, 1, 3],
              [3, 2, 1]])
print(C.sum(axis=0))

[5 3 4]


If we think of $ C $ as the matrix $ C = (c_{ij}) $, where $ i $ is the index
for axis $ 0 $ (i.e., the index of rows), then taking the sum along this axis
means that, for each $ j $, NumPy computes $ \sum_{i} c_{ij} $, resulting in the
preceding vector since $ C $ has three columns. To put it another way, _the axis
specified as the argument to `sum` is the one that gets collapsed_, in this case
by the summation.

### $ 2.2 $ Reduction and accumulation methods

The main array methods that involve mathematical operations are the following:

| Method    | Description of its output                            |
|-----------|------------------------------------------------------|
| `min`     | Minimum value of the array             |
| `max`     | Maximum value of the array             |
| `argmin`  | Indices of the minimum value along an axis |
| `argmax`  | Indices of the maximum value along an axis |
| `sum`     | Sum of all the elements in the array   |
| `prod`    | Product of all the elements in the array|
| `mean`    | Arithmetic mean of the array           |
| `median`  | Median of the array                    |
| `percentile`  | $ n $-th percentile of the array   |
| `std`     | Standard deviation of the array        |
| `var`     | Variance of the array                  |

All of these can be applied to a specific axis or axes by setting a value for
the `axis` parameter.

__Exercise:__ Understanding how the axis parameter works is crucial;
this exercise will help you master this.
Given the matrix $ A $ in the code cell below:

(a) Find the maximum value in each row (i.e., along the columns).

(b) Calculate the mean (average) of each column (i.e., along the rows).

(c) Determine the index of the maximum element in each row (i.e., along the columns).

(d) Compute the standard deviation of values in each row.

(e) Find the cumulative sums for each row using the `cumsum` method. Unlike all
of the methods in the table, its result has the same shape as the
original array. It is a type of _accumulation_ method.

In [43]:
A = np.array([[1, 2, 3, 4],
              [3, 3, 3, 3]])
print(A)

[[1 2 3 4]
 [3 3 3 3]]


📝 Arrays have several more methods besides the ones listed above. To get a
listing of all attributes and methods associated to an object $ a $ (not
necessarily an array), you can run the Python command `dir(a)`. 
Also, in Jupyter notebooks, typing `?` before or after an object or function name
(e.g., `A?` or `?np.array`) displays detailed documentation about it:

In [32]:
A.sum?

[0;31mDocstring:[0m
a.sum(axis=None, dtype=None, out=None, keepdims=False, initial=0, where=True)

Return the sum of the array elements over the given axis.

Refer to `numpy.sum` for full documentation.

See Also
--------
numpy.sum : equivalent function
[0;31mType:[0m      builtin_function_or_method

__Exercise:__ Given a $ 2D $ array `data` of shape $ 100 \times 5 $ representing a
dataset where each row is an entry and each column corresponds to a different variable:

(a) Calculate the mean and standard deviation of each variable.

(b) Using `argmin`, identify the variable with the smallest variance, suggesting it has the
least spread.

In [37]:
rng = np.random.default_rng(seed=0)
data = rng.normal(size=(100, 5))

print(data[:5])  # Display first five rows of the table

[[ 0.12573022 -0.13210486  0.64042265  0.10490012 -0.53566937]
 [ 0.36159505  1.30400005  0.94708096 -0.70373524 -1.26542147]
 [-0.62327446  0.04132598 -2.32503077 -0.21879166 -1.24591095]
 [-0.73226735 -0.54425898 -0.31630016  0.41163054  1.04251337]
 [-0.12853466  1.36646347 -0.66519467  0.35151007  0.90347018]]


__Exercise:__ Given the $ 1D $ array below, calculate the mean of all numbers in
it that are greater than $ 50 $.

In [40]:
v = rng.integers(1, 101, size=100)
# First 20 elements:
print(v[:20], "...")

[94 12 13 43 59 63 87 38 95 71 58 24 75 15  6 75 25 67 57 43] ...


__Exercise (moving averages):__ A _moving average_ is obtained by calculating the
average of subsets of data for each point in time over a "window" of fixed
length moving along the array. This is useful for smoothing out time-series
data. Using a for loop, calculate the moving average for the following $
1D $ array consisting of $ 1000 $ elements, where the window has length $ 50 $.

In [41]:
time_series = np.random.rand(1000)
window_size = 50
# The result will be an array of length 1000 - 50 + 1 = 951:
moving_averages = np.zeros(len(time_series) - window_size + 1)
# for ...

## $ \S 3 $ Reshaping and changing the datatype of arrays

### $ 3.1 $ Setting and changing the type

Recall that the `dtype` attribute of an array stores its datatype. We can create
a new array with a different datatype from a given array by using the method
`astype`, provided that the datatype conversion makes sense:

In [58]:
# Let's begin by creating an array of strings:
A = np.array([["1", "-2"],
              ["3", "-4"]])
print(A, '\n')

# Convert the datatype to double-precision (64 bit) floating-point numbers:
A_double = A.astype("float64")
print(A_double.dtype, '\n', A_double)

[['1' '-2']
 ['3' '-4']] 

float64 
 [[ 1. -2.]
 [ 3. -4.]]


📝 Note that the datatype is passed as a string. Actually these are just
convenient aliases for the true type names, such as `np.float64` and `np.bool_`,
which can also be passed directly.


__Exercise:__ What happens if you try to convert a numeric array to one whose
datatype is `bool`? What about the converse, i.e., from `bool` to, say, `int32`?

📝 Although we have been speaking about the "conversion" from one type to another,
`astype` doesn't actually modify the original array; instead, it always creates a
_new copy_ of the original array having the prescribed datatype. This occurs
even when the target datatype is the same as the original.

__Exercise:__ Prove that the two arrays $ \mathbf{u} $ and $ \mathbf{v} $ below
are independent by modifying one of them and checking that the other one
is unaffected.

In [None]:
u = np.array([1, 2, 3])
print(f"Original u: {u} of type {u.dtype}")

v = u.astype("int64")
print(f"Original v: {v} of type {v.dtype} \n")

# ...

print(f"u after modification of v: {u}")
print(f"v after modification of v: {v}")

Original u: [1 2 3] of type int64
Original v: [1 2 3] of type int64 

u after modification of v: [1 2 3]
v after modification of v: [1 2 3]


To prescribe the datatype of an array at the moment of its creation, we can
use the `dtype` argument available in almost all array creation functions:

In [60]:
A = np.array([1, 2, 3, 4], dtype="float64")
print(A)

B = np.ones((2, 2), dtype="bool")
print(B)

[1. 2. 3. 4.]
[[ True  True]
 [ True  True]]


__Exercise:__ Create arrays 
$$ \mathbf a = (2, 2, 2, 2)\,, \quad \mathbf b = (2, 4, 6, 8) \quad \text{and} \quad
\mathbf c = (2, 3, 4, 5) $$
of type `float64` using `full`, `arange` and `linspace`, respectively.

### $ 3.2 $ Modifying the shape of arrays

Reshaping arrays is a common and fundamental operation in NumPy. There is both
a function and a method named `reshape` that can accomplish this:

In [62]:
a = np.array([1, 2, 3, 4, 5, 6])
print(a, end='\n\n')

A = np.reshape(a, (3, 2))  # Here we use the _function_ `reshape`

B = a.reshape((2, 3))   # Here we use the `reshape` _method_

print(A, end="\n\n")  # Here a has been reshaped into a 3 by 2 matrix
print(B, end="\n\n")  # Here a has been reshaped into a 2 by 3 matrix

[1 2 3 4 5 6]

[[1 2]
 [3 4]
 [5 6]]

[[1 2 3]
 [4 5 6]]



Note that when we reshape an array, the new shape must be compatible with the
size of the original array. For example, the following results in an error (why?):

In [None]:
C = np.reshape(a, (2, 2))

When reshaping, we may also specify $ -1 $ in a dimension to instruct
NumPy to infer the number of elements along that dimension from the size of the
array and that of the remaining dimensions. This is especially useful when an
array is passed to us by the user as an argument in a function call, but we do
not know in advance how many entries it has:

In [None]:
a = np.array([[1, 2],
              [3, 4]])
A = a.reshape((-1, 1))  # Reshape into a column vector
print(A)

In this example we wanted to reshape our array so that the result would have
one column, but didn't want to figure out how many rows it should have for that
to happen. Here's another example, in which we reshape a $ 1D $ array into a
matrix and then to a row vector:

In [None]:
x = np.arange(1, 13)
X = x.reshape((3, -1))
x_row = X.reshape((1, -1))

print(x, end='\n\n')
print(X, end='\n\n')
print(x_row, end='\n\n')

📝 There is no essential difference between the function and the method versions
of `reshape`. In both cases, NumPy returns a _new_ array, while the original
array remains unchanged. However, these operations provide a view of the
original array's data whenever possible, meaning that they do not copy the
array's _data_ unless necessary. Thus, modifications to the data in the reshaped
array can affect the original array and vice versa. Let's use the previous
example to illustrate this:

In [None]:
X[0, 0] = -23  # Modify the top left element of X
print(x)  # The 0th element of x has also been affected!

To create an independent copy of a NumPy array, we can use the `copy` method.
This method generates a new array object with the same data as the original
array, but stored in a separate memory location.

In [None]:
y = np.arange(3)  # y is the 1D array with entries 0, 1, 2
Y = y.copy().reshape((1, -1))  # Reshape y into an independent 2D row vector
y[0] = 10  # Modify 0th element of y
print(y)
print(Y)  # The 0th element of Y is not affected, since Y is an independent copy

There is also a _function_ `np.copy` that is essentially equivalent to the `copy` method.

The `flatten` method takes a multi-dimensional array and returns a new,
independent _one-dimensional_ array containing all of the elements of the original
array. 

In [None]:
A = np.array([[1, 2],
              [3, 4]])
a = A.flatten()
print(a)

The order in which the elements are placed in the flattened array is based on
the lexicographic ordering of their indices in the original array. For example,
if we are dealing with a $ 3D $ array, then the entry at position $ (0, 0, 2) $
comes before the entry at $ (0, 1, 0) $, which will be placed before the entry
at $ (1, 0, 0) $.

In [None]:
A = np.arange(6).reshape((2, 3))
print(A)
a = A.flatten()
print(a)

__Exercise:__ The $ 2D $ of shape $ 28 \times 28 $ in the code cell below
represents a grayscale image. Write a function to rotate this image $ 90 $ degrees
clockwise.

_Hint_: This can be accomplished by first transposing the array and then
flipping it vertically (i.e., reversing the order of the entries along each
column) with an appropriate slice. This is illustrated in the $ 3 \times 3 $
example below, but the same idea works in any dimension.
$$
\begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i \\
\end{bmatrix}
\overset{\text{transpose}}{\longrightarrow}
\begin{bmatrix}
a & d & g \\
b & e & h \\
c & f & i \\
\end{bmatrix}
\overset{\text{flip ver.}}{\longrightarrow}
\begin{bmatrix}
g & d & a \\
h & e & b \\
i & f & c \\
\end{bmatrix}
$$

In [None]:
image = np.array(np.random.randint(0, 256, size=(28, 28)))
print(image[:5, :5])

__Exercise:__ Given a $ 1D $ array of $ 100 $ elements, reshape it into a $ 10
\times 10 $ matrix. Then, normalize the matrix so that all values are scaled to
lie between $ 0 $ and $ 1 $ (inclusive). _Hint:_ Determine the maximum $ M $ and
minimum $ m $ of all entries, write a linear function $ f $ that takes $ [m, M] $ into $ [0, 1] $
and then apply it to the entire array.

### $ 3.3 $ Other array methods

Some additional methods and functions dealing with array shapes are:
* `A.sort()` method: Sorts the array in-place, along the last axis (by default).
* `A.squeeze()` method: removes axes of length one from array $ A $. For example, an array
  of shape $ (1, 3, 1) $ becomes $ (3,) $ after squeezing.
* `A.swapaxes(axis1, axis2)` method: Swaps two axes of an array. It only
   provides a view (not a copy) of the original array.
* `A.transpose(*axes)` method: Reorders the axes of an array according to the
  tuple `axes`. If no axes are provided, it reverses the order of the axes.
* `expand_dims(A, axis=i)` function: Inserts a new axis, expanding the array's
  shape by one dimension at the specified position ($ i $ in this case).


In [None]:
A = np.array([[9, 2, 1],
              [4, 8, 6],
              [3, 5, 7]])

A.sort()  # Sort A in-place along its last axis (the column axis)
print(A)

In [None]:
a = np.array([1, 2, 3])
print(a, '\n')

B = np.expand_dims(a, axis=1)  # Insert a new axis at position 1
print(B)
print(B.shape)

In [None]:
A = np.random.randn(2, 3, 4)  # Array of shape 2x3x4 having random elements
print(A.shape)
print(A, '\n') 

# Reorder the axis (0, 1, 2) in the new order (1, 2, 0):
B = A.transpose(1, 2, 0)  # The new shape is 3x4x2
print(B.shape)
print(B)

__Exercise:__ For each of these methods/functions, verify whether the result is a view or a copy of the original
array by applying it to an array $ A $, assigning the result to $ B $, modifying an entry of $ A $ and then
checking whether $ B $ was affected.
