# Array attributes and methods

## $ \S 1 $ Attributes of arrays

### $ 1.1 $ Shape and number of dimensions

Recall from the previous notebook that, just as in Linear Algebra, an extremely
important property of a $ 2D $ array is its **shape**, which is the element
count along each of its **axes** (vertical and horizontal).  Referring to the
example below, the shape of our matrix $ A $ is $ (3, 4) $, or $ 3 \times 4 $,
since it has three rows and four columns:

In [19]:
import numpy as np

A = np.array([[1., 2., 3., 4.],
              [1., 4., 9., 16.],
              [1., 8., 27., 64.]])
print(A)

[[ 1.  2.  3.  4.]
 [ 1.  4.  9. 16.]
 [ 1.  8. 27. 64.]]


The number of dimensions and shape of an arbitrary array are stored in its `ndim` and `shape` attributes, respectively:

In [20]:
print(A.ndim)    # Print the number of dimensions of A
print(A.shape)   # Print the shape of A

2
(3, 4)


The number of dimensions of an array is a positive integer, while its shape is always a tuple, even when the array is one-dimensional:

In [21]:
a = np.array([11, 13, 17])
print(a.shape, "<-- Note that the shape is not '3', but rather the tuple '(3, )'")
print(type(a.shape))

(3,) <-- Note that the shape is not '3', but rather the tuple '(3, )'
<class 'tuple'>


__Exercise:__ What are the dimension and shape of an empty array?

### $ 1.2 $ Attributes and related terminology

An instance (object) of a specific class, such as the array $ A $ of type `ndarray`,
is equipped with a set of predefined **attributes**. Attributes are simply
_properties inherent to every instance of the class_ (in this case, that of
ndarrays). The **state** of an object is _the set of current values of all
of its attributes._

📝 To access an attribute of an object `x`, the syntax is `x.<attribute name>`.

For example, suppose that we want to design a Python class to represent cars.
An instance of this class would then be a representation of one specific car
in the real world.  Some plausible attributes of this class could be:
* Its color (say, `color`, of type `str`).
* The year in which it was manufactured (say, `year`, of type `int`).
* Whether it is electric or not (say, `electric`, of type `boolean`).
* The fuel efficiency of the car (say, `kilometers_per_liter`, of type `float`).

And so on for any other relevant property of cars that we might want to include
in our model. Note that the values of these attributes for different car
instances will vary, in general. Moreover, from this example we can easily imagine
a situation where two cars have exactly the same state, as defined by their
set of attribute values, and yet they are distinct objects, i.e., have different
_identities_.

### $ 1.3 $ The main attributes of arrays

Although ndarrays come with several attributes, most of them relate to the array's
internal representation or low-level utilities. The five most frequently used and
conceptually important are:

| Attribute   | Description                                    | Type      |
|-------------|------------------------------------------------|-----------|
| `ndim`      | The number of dimensions (axes) of the array  | `int`   |
| `shape`     | The number of elements that lie along each axis | `tuple`     |
| `size`      | The total number of elements in the array     | `int`   |
| `dtype`     | The data type of the elements of the array    | `dtype`  |
| `T`         | The transpose of the array                    | `ndarray`    |



__Exercise:__ For the following arrays $ A $, $ B $ and $ \mathbf v $:

(a) Check their attributes. Is the datatype of $ \mathbf v $ what you expected?

(b) Verify that the type of each attribute matches the one described in the table
using the Python built-in function `type`.

In [22]:
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

B = np.array([[True, False, True],
              [False, True, False]])

v = np.array([1, 2, 3.])

📝 NumPy arrays are designed to store elements of a single datatype, and _these
datatypes are specific to NumPy_; this ensures high performance and a lighter
memory footprint. As an example, the datatype of the matrix $ A $ above is
`int64`, which represents integers using $ 64 $ bits. In contrast, the built-in
Python type `int` can hold arbitrarily large integers. It is also possible to work
with arrays of arbitrary Python objects by using the datatype `object`. Because
in this case what is stored in the array is only a reference to each object (not
the object itself), the entries don't even need to have the same Python type:

In [23]:
array_of_objects = np.array([1, 'two', 3.0], dtype=object)

print(type(array_of_objects[0]))
print(type(array_of_objects[1]))
print(type(array_of_objects[2]))

<class 'int'>
<class 'str'>
<class 'float'>


__Exercise:__ If an array has shape $ (2, 3, 4) $, what is its size? More generally, what is the size of an array of shape $ (n_1, n_2, \cdots, n_d) $?

__Exercise:__ A _square matrix_ has the same number of rows and columns. Write a
function `is_square(matrix)` that accepts a 2D array as its argument and returns
`True` or `False` depending on whether the given matrix is square or not. How
would you generalize to multidimensional arrays?

__Exercise:__ Write a Python function to determine whether a given matrix $ A $ is
symmetric (i.e., whether $ A^T = A $), anti-symmetric ($ A^T = -A $) or neither.
Does your function work correctly when $ A $ is not square?

_Hint:_ When applied to two arrays of
the same shape, the `==` operator performs an element-wise comparison and
returns a Boolean array of the same shape. Instead, use `np.array_equal(A, B)` to
check if two arrays $ A $ and $ B $ have the same shape and elements.

In [24]:
a = np.array([1, 2])
b = np.array([1, 2])
a == b

array([ True,  True])

__Exercise:__ What is the transpose of a $ 1D $ array? What about $ 3D $ arrays?
In general, what would be the most reasonable definition of the transpose of an
$ n $-dimensional array?

## $ \S 2 $ Array methods

Besides attributes, objects of a certain class usually come with predefined
**methods**. _Methods are functions associated to each instance of the class which
have direct access to that instance's state_. The
syntax for calling a method named `f` of object `x` is `x.f(<arguments>)`. For
instance, the `sum` method associated to each array returns the sum of all of
its entries:

In [25]:
C = np.array([[-1.0, 2.3, 3.7],
              [-4.5, 2.7, -0.7]])
print(C.sum())

2.5


As an optional argument to `sum`, we can designate an axis over which the sum should take place. As always in Python, indexing is zero-based, meaning that for matrix $ C $ above, the rows lie along axis $ 0 $ and the columns along axis $ 1 $.

In [26]:
print(C.sum(axis=0))

[-5.5  5.   3. ]


If we think of $ C $ as the matrix $ C = (c_{ij}) $, where $ i $ is the index
for axis $ 0 $ (i.e., the index of rows), then taking the sum along this axis
means that for each $ j $, NumPy computes $ \sum_{i} c_{ij} $, resulting in the
preceding vector since $ C $ has three columns. To put it another way, the axis
specified as the argument to `sum` is the one that gets collapsed, in this case
by the summation.

__Exercise:__ Compute the sum of the entries of $ C $ along the column index.

The main array methods that involve mathematical operations are:

| Method    | Description                                          |
|-----------|------------------------------------------------------|
| `sum`     | Returns the sum of all the elements in the array.   |
| `prod`    | Returns the product of all the elements in the array.|
| `mean`    | Returns the arithmetic mean of the array.           |
| `std`     | Returns the standard deviation of the array.        |
| `var`     | Returns the variance of the array.                  |
| `min`     | Returns the minimum value of the array.             |
| `max`     | Returns the maximum value of the array.             |
| `argmin`  | Returns the indices of the minimum value along an axis. |
| `argmax`  | Returns the indices of the maximum value along an axis. |
| `cumsum`  | Returns the cumulative sum of the elements along a given axis. |
| `cumprod` | Returns the cumulative product of the elements along a given axis. |


_All of these can be applied to a specific axis or axes by setting a value for
the `axis` parameter_. One can also apply them to more complex subarrays using
slices.

In [27]:
A = np.array([[1, 2, 9],
              [4, 8, 6],
              [7, 5, 3]])

# Maximum of each row, i.e., along axis 1:
max_rows = A.max(axis=1)
print(f"Maximum of each row: {max_rows}\n")

# Mean (average) of each column, i.e., along axis 0:
mean_cols = A.mean(axis=0)
print(f"Mean (average) of each column: {mean_cols}\n")

# Standard deviation of each row, i.e., along axis 1:
std_rows = A.std(axis=1)
print(f"Standard deviation of each row: {std_rows}\n")

# Cumulative sum of each column, i.e., along axis 1:
cumulative_sum = A.cumsum(axis=1)
print(f"Cumulative sum of each column:\n{cumulative_sum}\n")

# Index of the maximum element along each row:
argmax_row = A.argmax(axis=0)
print(f"Index of the maximum element along each row: {argmax_row}")

Maximum of each row: [9 8 7]

Mean (average) of each column: [4. 5. 6.]

Standard deviation of each row: [3.55902608 1.63299316 1.63299316]

Cumulative sum of each column:
[[ 1  3 12]
 [ 4 12 18]
 [ 7 12 15]]

Index of the maximum element along each row: [2 1 0]


__Exercise:__ Given a $ 2D $ array `data` of shape $ 100 \times 5 $ representing a
dataset where each row is an entry and each column is a different variable:

(a) Calculate the mean and standard deviation of each variable.

(b) Using `argmin`, identify the variable with the smallest variance, suggesting it has the
least spread.

In [28]:
np.random.seed = 12345
data = np.random.randn(100, 5)
print(data[:5])

[[ 0.12443996  1.52710121 -1.94918188 -0.88968523  1.28665732]
 [ 1.0514502  -2.45293608 -0.10476106 -1.2794813   1.21423755]
 [-0.2234047  -0.87370139  0.13108493 -0.90115016  1.35420928]
 [ 0.10541909  1.38634366  0.17261146  0.28991732 -0.8948302 ]
 [-1.03102206  1.58682602  1.04707002  0.96455033  0.04842482]]


__Exercise (rolling window):__ A _rolling window average_ is obtained by
calculating the average of subsets of data for each point in time over a
"window" of fixed length moving along the array. This is useful for smoothing
out time-series data. Using a for loop, calculate the moving window average
for the following $ 1D $ array consisting of $ 1000 $ elements, where the window
has length $ 50 $.

In [29]:
time_series = np.random.rand(1000)
window_size = 50

## $ \S 3 $ Reshaping arrays

Recall that the `dtype` attribute of an array stores its datatype. We can create
a new array with a different datatype from a given array by using the method
`astype`, provided that the datatype conversion makes sense:

In [31]:
# Let's begin by creating an array of strings:
A = np.array([['1', '-2'],
              ['3', '-4']])
print(A, '\n')

# Convert the datatype to 32 bit integers:
A_int = A.astype("int32")
print(A_int.dtype, '\n', A_int, '\n')

# Convert the datatype to double-precision (64 bit) floating-point numbers:
A_double = A.astype("float64")
print(A_double.dtype, '\n', A_double, '\n')

[['1' '-2']
 ['3' '-4']] 

int32 
 [[ 1 -2]
 [ 3 -4]] 

float64 
 [[ 1. -2.]
 [ 3. -4.]] 



__Exercise:__ What happens if you try to convert a numeric array to one whose
datatype is `bool` (NumPy's type to represent boolean values)? What about the
converse, i.e., from `bool` to, say, `float32`?

📝 Although we have been speaking about the "conversion" from one type to another,
`astype` doesn't actually modify the original array; instead, it always creates a
_new copy_ of the original array having the prescribed datatype. This occurs
even when the target datatype is the same as the original one:

In [40]:
u = np.array([1, 2, 3])
print(f"Original u: {u} of type {u.dtype}")

v = u.astype("int64")
print(f"Original v: {v} of type {v.dtype} \n")

v[0] = -10

print(f"u after modification of v: {u}")
print(f"v after modification of v: {v}")

Original u: [1 2 3] of type int64
Original v: [1 2 3] of type int64 

u after modification of v: [1 2 3]
v after modification of v: [-10   2   3]


To prescribe the data type of an array at the moment of its creation, we can
use the `dtype` argument available in almost all array creation functions
provided by NumPy.

In [44]:
# Using np.array:
A = np.array([1, 2, 3, 4], dtype=np.float64)
print(A.dtype)  # Output will be an array of float64 type elements.
print(A, '\n')

# Using np.zeros:
B = np.zeros((2, 2), dtype=np.int32)
print(B.dtype)  # Output will be a 2x2 array of int32 type zeros.
print(B, '\n')

# Using np.ones:
C = np.ones((3,), dtype=bool)
print(C.dtype)  # Output will be an array of complex128 type ones.
print(C)

float64
[1. 2. 3. 4.] 

int32
[[0 0]
 [0 0]] 

bool
[ True  True  True]


__Exercise:__ Create arrays 
$$ \mathbf a = (2, 2, 2)\,, \quad \mathbf b = (2, 4, 6) \quad \text{and} \quad
\mathbf c = (2, 3, 4) $$
of type `int32` using `full`, `arange` and `linspace`, respectively.

Reshaping arrays is a common and fundamental operation in NumPy. There is both
a function and a method named `reshape` that can accomplish this:

In [9]:
a = np.array([1, 2, 3, 4, 5, 6])
print(a, end='\n\n')

A = np.reshape(a, (3, 2))  # Here we use the _function_ `reshape`
B = a.reshape((2, 3))   # Here we use the `reshape` _method_

print(A, end='\n\n')  # Here a has been reshaped into a 3 by 2 matrix
print(B, end='\n\n')  # Here a has been reshaped into a 2 by 3 matrix


[1 2 3 4 5 6]

[[1 2]
 [3 4]
 [5 6]]

[[1 2 3]
 [4 5 6]]



Note that when reshaping an array, _the new shape must be compatible with the
size of the original array_. For example, the following results in an error:

In [None]:
C = np.reshape(a, (2, 2))

When reshaping an array, we may also specify $ -1 $ in a dimension to instruct
NumPy to infer the number of elements along that dimension from the size of the
array and that of the remaining dimensions. This is especially useful when an
array is passed to us by the user as an argument in a function call, but we do
not know in advance how many entries it has:

In [None]:
a = np.array([[1, 2],
              [3, 4]])
A = a.reshape((-1, 1))  # Reshape into a column vector
print(A)

[[1]
 [2]
 [3]
 [4]]


In this example we wanted to reshape our array so that the result would have
one column, but didn't want to figure out how many rows it should have for that
to happen. Here's another example, in which we reshape a $ 1D $ array into a
matrix and then to a row vector:

In [None]:
x = np.arange(1, 13)
X = x.reshape((3, -1))
x_row = X.reshape((1, -1))

print(x, end='\n\n')
print(X, end='\n\n')
print(x_row, end='\n\n')

[ 1  2  3  4  5  6  7  8  9 10 11 12]

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

[[ 1  2  3  4  5  6  7  8  9 10 11 12]]



📝 There is no essential difference between the function and the method versions
of `reshape`. In both cases, NumPy returns a _new_ array, while the original
array remains unchanged. However, these operations provide a view of the
original array's data whenever possible, meaning that they do not copy the
array's data unless necessary. Thus, _modifications to the data in the reshaped
array can affect the original array and vice versa_. Let's use the previous
example to illustrate this:

In [None]:
X[0, 0] = -23  # Modify the top left element of X
print(x)  # The 0th element of x has also been affected!

[-23   2   3   4   5   6   7   8   9  10  11  12]


To create an independent copy of a NumPy array, we can use the `copy` method.
This method generates a new array object with the same data as the original
array, but stored in a separate memory location.

In [None]:
y = np.arange(3)  # y is the 1D array with entries 0, 1, 2
Y = y.copy().reshape((1, -1))  # Reshape y into an independent 2D row vector
y[0] = 10  # Modify 0th element of y
print(y)
print(Y)  # The 0th element of Y is not affected, since Y is an independent copy

[10  1  2]
[[0 1 2]]


📝 There is also a _function_ `np.copy` that is essentially equivalent to the `copy` method.

The `flatten` method takes a multi-dimensional array and returns a new,
independent _one-dimensional_ array containing all of the elements of the original
array, while preserving their order. 

In [None]:
A = np.array([[1, 2],
              [3, 4]])
a = A.flatten()
print(a)

[1 2 3 4]


The order in which the elements are placed in the flattened array is based on
the lexicographic ordering of their indices in the original array. For example,
if we are dealing with a $ 3D $ array, then the entry at position $ (0, 0, 2) $
comes before the entry at $ (0, 1, 0) $, which will be placed before the entry
at $ (1, 0, 0) $.

In [None]:
A = np.arange(6).reshape((2, 3))
print(A)
a = A.flatten()
print(a)

[[0 1 2]
 [3 4 5]]
[0 1 2 3 4 5]


__Exercise:__ The $ 2D $ of shape $ 28 \times 28 $ in the code cell below
represents a grayscale image. Write a function to rotate this image 90 degrees
clockwise without using any special image processing libraries.

_Hint_: This can be accomplished by first transposing the array and then
flipping it horizontally (i.e., reversing the order of
the entries along each row) with an appropriate slice. This is illustrated in the $ 3 \times 3 $ example
below, but the same idea works in any dimension.
$$
\begin{bmatrix}
a & b & c \\
d & e & f \\
g & h & i \\
\end{bmatrix}
\overset{\text{transpose}}{\longrightarrow}
\begin{bmatrix}
a & d & g \\
b & e & h \\
c & f & i \\
\end{bmatrix}
\overset{\text{flip hor.}}{\longrightarrow}
\begin{bmatrix}
g & d & a \\
h & e & b \\
i & f & c \\
\end{bmatrix}
$$

In [14]:
image = np.array(np.random.randint(0, 256, size=(28, 28)))
print(image[:5, :5])

[[109  40 228 147 149]
 [127 218  59  33  29]
 [ 82 182  78  36 143]
 [ 39  85  17 189 133]
 [ 52 129 112 134 200]]


__Exercise:__ Given a $ 1D $ array of $ 100 $ elements, reshape it into a $ 10
\times 10 $ matrix. Then, normalize the matrix so that all values are scaled to
lie between $ 0 $ and $ 1 $ (inclusive). _Hint:_ Determine the maximum $ M $ and
minimum $ m $ of all entries, write a linear function $ f $ that takes $ [m, M] $ into $ [0, 1] $
and then apply it to the entire array.

Some additional methods and functions dealing with array shapes are:
* `A.squeeze()` method: removes axes of length one from array $ A $. For example, an array
  of shape $ (1, 3, 1) $ becomes $ (3,) $ after squeezing.
* `A.swapaxes(axis1, axis2)` method: Swaps two axes of an array. It only
   provides a view (not a copy) of the original array.
* `A.transpose(*axes)` method: Reorders the axes of an array according to the
  tuple `axes`. If no axes are provided, it reverses the order of the axes.
* `expand_dims(A, axis=i)` function: Inserts a new axis, expanding the array's
  shape by one dimension at the specified position ($ i $ in this case).


In [2]:
a = np.array([1, 2, 3])
print(a, '\n')

B = np.expand_dims(a, axis=1)  # Insert a new axis at position 1
print(B)
print(B.shape)

[1 2 3] 

[[1]
 [2]
 [3]]
(3, 1)


In [8]:
A = np.random.randn(2, 3, 4)  # Array of shape 2x3x4 having random elements
print(A.shape)
print(A, '\n') 

# Reorder the axis (0, 1, 2) in the new order (1, 2, 0):
B = A.transpose(1, 2, 0)  # The new shape is 3x4x2
print(B.shape)
print(B)

(2, 3, 4)
[[[-0.39656662  0.8937953   0.72287123 -0.78803847]
  [ 1.37170286 -0.3461183   0.94240061 -1.40695644]
  [-1.9162974  -0.95437456  0.64405515 -0.67456699]]

 [[-1.30497949 -0.05582722 -0.81778443 -0.83087778]
  [-1.12070382  0.62800488 -1.60885764 -0.39301776]
  [-1.00715499  0.20393228 -1.70642381  0.42211023]]] 

(3, 4, 2)
[[[-0.39656662 -1.30497949]
  [ 0.8937953  -0.05582722]
  [ 0.72287123 -0.81778443]
  [-0.78803847 -0.83087778]]

 [[ 1.37170286 -1.12070382]
  [-0.3461183   0.62800488]
  [ 0.94240061 -1.60885764]
  [-1.40695644 -0.39301776]]

 [[-1.9162974  -1.00715499]
  [-0.95437456  0.20393228]
  [ 0.64405515 -1.70642381]
  [-0.67456699  0.42211023]]]


__Exercise:__ For each of these methods/functions, verify whether the result is a view or a copy of the original
array by applying it to an array $ A $, assigning the result to $ B $, modifying an entry of $ A $ and then
checking whether $ B $ was affected.
