# Introduction to NumPy arrays

## $ \S 1 $ Motivation

Suppose that we wish to represent vectors such as $ \mathbf{u}
= (1, 2, 3) $ or $ \mathbf{v} = (-1, 0, 1) $ in Python. It is natural to think
that either lists or tuples might be a good choice for this task.

In [None]:
u = [1, 2, 3]   # Create a list whose elements are 1, 2, 3
v = [-1, 0, 1]

However, at some point we will probably wish not only to store, but to manipulate
these vectors. For instance, how can we add $ \mathbf u $ and $ \mathbf v $ or
take a multiple of $ \mathbf v $? It is reasonable to try the following code:

In [None]:
s = u + v
multiple = 3 * v

print(s)
print(multiple)

These unexpected results can be explained by recalling that for either lists or
tuples (or strings), the `+` operator denotes _concatenation_, not addition; and
accordingly, `*` denotes _repetition_, not multiplication. This behavior is not
so strange at all if we take into account that lists and tuples are _generic_
sequential types, capable of holding objects of arbitrary types, for which
addition and multiplication might not make sense.

__Exercise:__ What happens if you represent $ \mathbf u $ and $ \mathbf v $ as
tuples and try to take their difference $ \mathbf u - \mathbf v $? What if they
are represented as lists?

Vectors and matrices are fundamental objects in engineering, science
and machine learning. There is thus a clear need for a library that
extends Python by providing efficient ways to operate on these objects.

## $ \S 2 $ Arrays

<img src="notebook_1_NumPy.png" width="105" height="38" alt="NumPy">, which stands for _Numerical Python_, is a foundational package
for scientific computing in Python. It is almost universally imported with the
`np` alias, as follows:

In [None]:
import numpy as np

We could also import every object/function in NumPy with:

In [None]:
from numpy import *

We would thereby avoid having to type the `np.` prefix everytime. However,
this is not recommended, because it may lead to conflicts with names in pure
Python (for example `max` and `min`) or those used by other modules (such as
`exp` and `sqrt` from the Math module).

The central feature in NumPy is a new data structure called **ndarray** (an
abbreviation of $n$-dimensional array), or simply **array**. An array is
essentially a multi-dimensional table. For example, a $ 1 $-dimensional array is
another version of a list; it is just a row of data. A $ 2 $-dimensional array can be
seen as a spreadsheet or matrix. And $ 3 $-dimensional arrays are stacks of
tables, for example having the shape of a cube.

An ndarray is a grid of values _of the same datatype_; in other words, arrays
must be **homogeneous**. In most applications this datatype is numerical (say, the
elements of the array are all integers or all floating-point numbers). However,
it is also possible to create an array whose elements are booleans or strings,
for example.

A $ 1 $-dimensional numerical array is similar to a vector in the sense of
Linear Algebra, as in the discussion in $ \S 1 $.  Arrays can be instantiated
with the `array` function:

In [None]:
u = np.array([1, 2, 3])     # Calling `array` on the list [1, 2, 3]
print(u)
print(f"The type of an array such as u is: {type(u)}")

Note the absence of commas separating the entries of the array when it is displayed
(in contrast to lists).

__Exercise:__ Print the list `primes = [2, 3, 5, 7]` and its type. Then generate
an array `primes_arr` from this list and print it, together with its type.

__Exercise:__ Define the vector
$ \mathbf a = \frac{1}{2} \big(1, 1, 1, 1 \big) \in \mathbb R^4 $ as a NumPy array.
Compute the length (norm) of $ \mathbf a $ by hand and compare your answer to
the result obtained through the function `np.linalg.norm`.

__Exercise:__ Print and determine the types of the following arrays.

In [None]:
b = np.array([True, False, True, False])
s = np.array(list("test"))

📝 To recap, `ndarray` is the official name of the data type provided by NumPy, and `array` is both
the informal name of this data type and the name of the function that we can use
to create ndarrays.

The argument of the `array` function can be a list, a tuple, a range, another
array or any array-like object.

In [None]:
v = np.array((-1, 0, 1))    # The argument of `array` can also be a tuple
print(v)

__Exercise:__ Can one convert a string such as "hello" directly to an array of characters with `array`? Try it below.

Notice that the type of an array as a whole is always the same (`numpy.ndarray`)
and should not be confused with its __datatype__, which is the type of the
_elements_ it holds. We can determine the datatype of an array $ \mathbf a $ through `a.dtype`:

In [None]:
v.dtype

As we can see, the array $ \mathbf v = (-1, 0, 1) $ defined above holds $ 64 $-bit integers.

__Exercise:__ Determine the datatype of the arrays $ \mathbf b $ and
$ \mathbf c $ defined below.

In [None]:
b = np.array([True, False, True, False])
c = np.array((1, 2, 3.))

We may also use convert an _existing_ list or tuple or range to an
array:

In [None]:
odds = range(1, 11, 2)  # Create a range containing odd numbers from 1 to 9
print(np.array(odds))   # Create an array from `odds` and print it

## $ \S 3 $ Multi-dimensional arrays

The number of dimensions of an array is also called its
**rank**. A $ 2 $-dimensional array, or array of rank $ 2 $ can be seen
as a a table or matrix. 

In [None]:
A = np.array([[1, 1, 1, 1],   # first row of matrix A
              [2, 2, 2, 2],   # second row
              [3, 3, 3, 3]])  # third row
print(A)
print(type(A))  # Print the type of object A

Notice the use of _double_ brackets here: the first opening bracket `[` serves
to delimit the array itself, while the second one is being used to delimit the elements
of the first row. The rows are separated by commas, as are elements within each row.

📝 In the code of the preceding example, each row of the matrix appears by
itself on a line to improve legibility, but the newline characters do not
actually delimit the lines (the commas and brackets do).
Hence, the following produces the same result:

In [None]:
A = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]])
print(A)

__Exercise:__ Create and display the matrix
$$
    T = \begin{bmatrix}
    \phantom{-}0 & \phantom{-}1 & \phantom{-}0 \\
    -1 & \phantom{-}0 & \phantom{-}1 \\
    \phantom{-}0 & -1 & \phantom{-}0
    \end{bmatrix}
$$

__Exercise:__ Create a $ 3\times 3 $ array to represent a tic-tac-toe board.
Use the string `' '` (single whitespace) for an empty cell, and the strings
`'X'` and `'O'` for the pieces placed by each of the two players.
Set up the board for the following game state:

<img src="notebook_1_tic_tac_toe.png" width="200">

In [None]:
game_board = np.array(...)

print(game_board)

⚡ Arrays do not have to be rectangular; they can also be
_jagged_ (meaning that the rows may have different lengths), as in the example
below. However, jagged arrays lose many of NumPy's performance benefits, which
rely on consistent shapes for efficiency.

In [None]:
jagged = np.array([
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9]
], dtype=object)
print(jagged)

Notice that this array is storing Python lists, rather than a
contiguous block of memory with integers.

The __shape__ of an array is a _tuple_ of integers indicating the size of each of
its dimensions. The array $ A $ below has shape $ (3, 4) $ since it has three
rows and four columns.

In [None]:
A = np.array([[42, 17, 99, 3], [-2, -3, -5, -7], [0, 0, 0, 0]])
print(A)
print(A.shape)  # Print the shape of A

__Exercise:__ What is the shape of a one-dimensional array, for instance the
array $ \mathbf b $ below? Can you explain the result of `b.shape` (why is there
a comma when you print it)?

In [None]:
b = np.array([True, False, False, True, False])

__Exercise:__ How would you create the matrix
$$
\mathbf B = \begin{equation*}
\left[ \begin{array}{cc}
b_{11} & b_{12} \\
b_{21} & b_{22} \\
b_{31} & b_{32} \\
b_{41} & b_{42}
\end{array} \right]
\end{equation*}
$$
where $ b_{ij} = i \cdot j $?

⚡ __Exercise:__ Recall the matrix $ \mathbf{B} = \big(b_{ij}\big) = (i \cdot j) $ from
a previous exercise. Use a double list comprehension to generate the $ b_{ij} $
as a list of lists and then convert this to an array. _Hint:_ You will need a
double comprehension, of the form `[[... for j in ...] for i in ...]`. 

A $ 3D $ array is to a matrix as a solid block is to a rectangle. In other words,
a rank $ 3 $ array is one having three axes instead of just two.

<img src="notebook_1_array_3D.png" alt="drawing" width="400"/>

__Exercise:__ What is the rank and shape of the array depicted above?

Here's a concrete example of a $ 3D $ array of shape $ 2 \times 2 \times 2 $.
Think of it as an array having $ 2 $ layers, each of which is a $ 2 \times 2 $
matrix.

In [None]:
A = np.array([[[1, 2],   # The first layer is a 2x2 matrix
               [3, 4]], 

              [[5, 6],   # The second layer is also a 2x2 matrix
               [7, 8]]])
print(A)

__Exercise:__ What happens if you delete the blank line inside the definition of $ A $ above?
What happens if you insert an additional blank line?

Note that a $ 3D $ array need not be a cube (i.e., have all three dimensions
of the same length) as in the previous example.

__Exercise:__ Build a rank three array of shape $ (2, 3, 4) $. 
_Hint:_ Think of this as a pair of $ 3 \times 4 $ matrices.

There is no bound on the number of dimensions that an array can have. Even
though most applications only require arrays of ranks $ 1 $ and $ 2 $,
higher-dimensional arrays do arise naturally in some areas.  For instance, in
image processing, RGB pictures can be seen as $ 3D $ arrays with dimensions for
height, width, and color channels (red, green, blue). That is, an image such as
the one below can be decomposed into three $ 2D $ arrays, each representing the
intensity of one color at every pixel.

<img src="notebook_1_RGB_separation.jpg" alt="RGB separation" width="200" height="600">


Similarly, video processing uses $ 4D $ arrays (time, channels, height, width)
and machine learning often uses multi-dimensional arrays.

## $ \S 4 $ Other ways to create arrays

### $ 4.1 $ Filling arrays automatically

There are other ways of creating arrays that are often more convenient than through use of the `array` function. For instance,
to instantiate an array of a desired shape filled with $ 0\text{s} $, we can use the function `zeros`:

In [None]:
Z = np.zeros((2, 5))  # The argument of `zeros` is the shape you want
print(Z)

📝 Note the necessary double parentheses in this call. The outermost pair
delimits the arguments of the function `array` and the other innermost pair
specifies the shape, which is always a tuple. However, there's an exception for 
$ 1D $ arrays. If we want a vector having, say, $ 10 $ coordinates equal to
$ 0 $, then we can use either `np.zeros((10,))` or `np.zeros(10)`.

In [None]:
origin = np.zeros(3)
print(origin)

Note that by default, the resulting array has floating-point datatype.

__Exercise:__ Create and print a $ 3D $ array of shape $ (3, 4, 5) $ filled with zeros.

Arrays can also be automatically populated with $ 1\text{s} $ by means of the function `ones`: 

In [None]:
U = np.ones((2, 3))
print(U)

Generalizing `zeros` and `ones`, the `full` function creates an array of a
specified shape, filled entirely with a prescribed value:

In [None]:
# Create a 3x5 array where every element equals 3.14:
P = np.full((3, 5), 3.14)
print(P)

__Exercise:__ Create a $ 1D $ array having $ 50 $ coordinates, all of them equal to $ 1 $, in two different ways.

We can also create a new array of specified shape and data type, but without
initializing the entries, by means of the function `empty`. The uninitialized
values are whatever happens to be in the allocated memory location at that time,
which can lead to unpredictable results if the array is used without first
assigning proper values. On the other hand, this approach can be more efficient,
since it avoids the computational cost of setting each element to some default
value only to modify it later.

In [6]:
# Create an uninitialized array:
uninitialized_array = np.empty((2, 2), dtype=float)
print("Uninitialized array:")
print(uninitialized_array)

# Initialize the array with new values:
uninitialized_array.fill(5.0)
uninitialized_array[0, 1] = uninitialized_array[1, 0] = 3
print("\nAfter initialization:")
print(uninitialized_array)

Uninitialized array:
[[5. 5.]
 [5. 5.]]

After initialization:
[[5. 3.]
 [3. 5.]]


Finally, we can create a new array _of the same shape and datatype_ as a
given array $ A $ but filled with zeros, ones, some other specified value or
uninitialized garbage values with the functions `zeros_like`, `ones_like`,
`full_like` and `empty_like`, respectively:

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])

Z = np.zeros_like(A)
print(Z, '\n')

T = np.full_like(A, 3)
print(T)

### $ 4.2 $ Generating sequences with `arange`

The `arange` function is much like the Python built-in `range`, but it returns an array:

In [None]:
digits = np.arange(10)
print(type(digits), digits)

The full syntax is `arange(<start>, <stop>, <stride>)`. Note that the starting
value is included, but the stopping value is not (exactly as in vanilla `range`).

In [None]:
y = np.arange(4, 10, 2)
print(y)

One advantage of `arange` over `range` is that _it accepts arguments of type float_. For instance:

In [None]:
x = np.arange(0.1, 1, 0.1)
print(x)

However, this feature must be used with caution, because sometimes rounding
errors may lead to unexpected results, as in the following example:

In [None]:
print(np.arange(1, 1.3, 0.1))
print("Note that the value 1.3 was included!")

__Exercise:__ For each item, create a $ 1D $ array containing the elements described:

(a) The sequence of even numbers between $ 2 $ and $ 19 $.

(b) All integers from $ 10 $ down to $ 1 $.

(c) All integers from $ 5 $  to $ 15 $ (inclusive), but represented as
floating-point numbers.


(d) All numbers from $ -3.14 $ to $ 2.86 = -3.14 + 6 $, with a stride of $ 2 $.

### $ 4.3 $ Generating sequences with `linspace` and `logspace`

With `linspace` we can construct an array containing evenly spaced
values inside a specified interval. The syntax is similar to that of `arange`,
except that _the stop value in the second argument is included_ in the result, and
_the third argument gives the number of values to be generated_, instead of the
step size:

In [None]:
z = np.linspace(0, 10, 11)
print(z, '\n')

w = np.linspace(0, 10, 10)
print(w)

__Exercise:__ Dividing the interval $ [0, 1] $ into three equal parts, we obtain three subintervals of length $ \frac{1}{3} $. How many subdivision points are needed? Use `linspace` to obtain them.

__Exercise:__ How many bounded intervals are determined by $ n + 1 $ equally spaced points on a line?

To recap, the syntax is `linspace(<start>, <stop (inclusive)>, <# 
elements>)`. We can exclude the stop value in `linspace` to make it behave
more similarly to `arange` using `endpoint=False`:

In [None]:
print(np.linspace(0, 10, 10, endpoint=False))

__Exercise:__ What happens to the result of `linspace` if the starting value is greater than the stopping value? What if they are equal? And what if the third argument is zero or negative? 

The `logspace` function generates points that are evenly spaced on a
_logarithmic_ scale. For example:

In [None]:
powers = np.logspace(-1, 2, 4)
print(powers)

Formally, `logspace(start, stop, num)` creates an array of values where each
element is calculated as:
$$
    10^{\text{start}} \times 10^{\,i \times \frac{\text{stop}-\text{start}}{\text{num}-1}}
    \quad \text{for $ i = 0, 1, 2, ..., \text{num} - 1 $}\,.
$$
Here `num` is the number of elements in the resulting array.  Thus, it is the
_exponents_ of the numbers in the sequence that are evenly spaced, instead of
the numbers themselves.

__Exercise:__ Create a logarithmic space from $ 10^{-2} $ to $ 10^2 $ with $ 20
$ points.  Calculate and print the ratios between consecutive points to verify
they are constant (which is a property of logarithmic spacing).

This type of sequence is useful for plotting data that follow a power law or
that have a large range, for creating frequency axes for audio processing, and
for setting learning rates in ML algorithms that require exponential decay,
among other applications.

While the default base for `logspace` is $ 10 $, we can use any other positive
base by changing the `base` parameter. Base $ 2 $ is particularly useful
in computing applications.

__Exercise:__ Generate all powers of $ 2 $ from $ 1 = 2^0 $ to $ 2^9 = 512 $
by modifying the following code. Print the resulting array to check your answer.

In [None]:
# powers_of_two = np.logspace(<start>, <stop>, <num>, base=2)

## $ \S 5 $ Other NumPy features

As we will see later, arrays are far more efficient and convenient for numerical
computation than Python's built-in data types such as lists, both in memory and
in computational costs. NumPy is used in data analysis, machine learning,
engineering and any other field that requires intensive
numerical computation.  It also serves as the basis for higher-level libraries
such as SciPy, pandas, and scikit-learn.  Other features
supplied by NumPy include (but are not limited to):
* Basic statistical operations;
* Random number generation from various probability distributions;
* Fourier transforms and signal processing;
* Integration with several types of databases and file formats for data import/export.

We will meet and use some of these in other notebooks.

⚡️ __Exercise:__ Create a function `diamond` that returns a two-dimensional integer
array whose ones form a diamond shape, as in the following examples:

In [None]:
print(diamond(0))
# Should print:
# [[1]]

In [None]:
print(diamond(1))
# Should print:
# [[0 1 0]
#  [1 0 1]
#  [0 1 0]]

In [None]:
print(diamond(2))
# Should print:
# [[0 0 1 0 0]
#  [0 1 0 1 0]
#  [1 0 0 0 1]
#  [0 1 0 1 0]
#  [0 0 1 0 0]]

_Hint:_ Note that for each element equal to $ 1 $, the sum of the distances to
the central row and central column is constant.

In [None]:
def diamond(n):
    """
    Create a 2D array with ones forming a diamond shape.
    
    Parameters
    ----------
    n : int
        Size parameter determining diamond dimensions.
        
    Returns
    -------
    numpy.ndarray
        2D array with ones forming a diamond shape.
    """
    # Shape of the array will be (2 * n + 1) x (2 * n + 1)