# Introduction to NumPy arrays

## $ \S 1 $ Motivation

Suppose that we wish to represent three-dimensional vectors such as $ \mathbf{u}
= (1, 2, 3) $ or $ \mathbf{v} = (-1, 0, 1) $ in Python. It is natural to think
that either lists or tuples might be a good choice for this task.

In [7]:
u = [1, 2, 3]   # Create a list whose elements are 1, 2, 3
v = [-1, 0, 1]

However, at some point we will probably wish not only to store, but to manipulate
these vectors. For instance, how can we add $ \mathbf u $ and $ \mathbf v $ or
take a multiple of $ \mathbf v $? It is reasonable to try the following code:

In [8]:
s = u + v
multiple = 3 * v

print(s)
print(multiple)

[1, 2, 3, -1, 0, 1]
[-1, 0, 1, -1, 0, 1, -1, 0, 1]


These unexpected results can be explained by recalling that for either lists or
tuples (or strings), the `+` operator denotes _concatenation_, not addition; and
accordingly, `*` denotes _repetition_, not multiplication. This behavior is not
so strange at all if we take into account that lists and tuples are _generic_
sequential types, capable of holding objects of arbitrary types, for which
addition and multiplication might not make sense.

__Exercise:__ What happens if you represent $ \mathbf u $ and $ \mathbf v $ as
tuples and try to take their difference $ \mathbf u - \mathbf v $? What if they
are represented as lists?

Vectors and matrices are fundamental objects in engineering, data
science and machine learning. There is thus a clear need for a library that
extends Python by providing efficient ways to operate on these objects.

## $ \S 2 $ Arrays

**NumPy**, which stands for _Numerical Python_, is a foundational package
for scientific computing in Python. It is almost universally imported with the
`np` alias, as follows:

In [9]:
import numpy as np

📝 Although we could also import every object/function in NumPy with
`from numpy import *` and thereby avoid having to type the `np.` prefix
everytime, this is not recommended, because it may lead to conflicts with names
in pure Python (for example `max` and `min`) or those used by other modules (such
as `exp` and `sqrt` from the Math module).

The central feature in NumPy is a new data structure called **ndarray** (an
abbreviation of $n$-dimensional array), or simply **array**. An ndarray is a
grid of values _of the same datatype_; in other words, arrays must be
**homogeneous**. In most applications this datatype is numerical (say, the elements
of the array are all integers or all floating-point numbers). However,
it is also possible to create an array whose elements are booleans or strings,
for example.

A $ 1 $-dimensional numerical array is similar to a vector in the sense of
Linear Algebra, as in the discussion in $ \S 1 $.  Arrays can be instantiated
with the `array` function:

In [10]:
u = np.array([1, 2, 3])     # Calling `array` on the list [1, 2, 3]
print(u)
print(f"The type of an array such as u is: {type(u)}")

[1 2 3]
The type of an array such as u is: <class 'numpy.ndarray'>


Note the absence of commas separating the entries of the array when it is displayed
(in contrast to lists).

__Exercise:__ Define the vector
$ \mathbf a = \frac{1}{2} \big(1, 1, 1, 1 \big) \in \mathbb R^4 $ as a NumPy array.
What is the length (norm) of $ \mathbf a $?

__Exercise:__ Print and determine the type of the following array.

In [11]:
b = np.array([True, False, True, False])

The argument of the `array` function can be a list, a tuple, a range, another
array or any array-like object.

In [23]:
v = np.array((-1, 0, 1))    # The argument of `array` can also be a tuple
print(v)

[-1  0  1]


Notice that the type of an array as a whole is always the same (`numpy.ndarray`)
and should not be confused with its _datatype_, which is the type of the
_elements_ it holds. We can determine the datatype of an array $ \mathbf a $ through `a.dtype`:

In [13]:
u.dtype

dtype('int64')

As we can see, the array $ \mathbf u $ defined above holds $ 64 $-bit integers.

__Exercise:__ Determine the datatype of the array $ \mathbf b $ in the previous exercise
and of the array $ \mathbf c $ defined below.

In [14]:
c = np.array([1, 2, 3.])

## $ \S 3 $ Multi-dimensional arrays

The number of dimensions of an array is also called its
**rank**. A $ 2 $-dimensional array, or array of rank $ 2 $, is just a matrix. 

In [15]:
A = np.array([[1, 1, 1, 1],   # first row of matrix A
              [2, 2, 2, 2],   # second row
              [3, 3, 3, 3]])  # third row
print(A)
print(type(A))  # Print the type of object A

[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]
<class 'numpy.ndarray'>


Notice the use of _double_ brackets here: the first opening bracket `[` serves
to delimit the array itself, while the second one is being used to delimit the elements
of the first row. The rows are separated by commas, as are elements within each row.

📝 In the code of the preceding example, each row of the matrix appears by
itself on a line to improve legibility, but the newline characters do not
actually delimit the lines (the commas and brackets do).
Hence, the following produces the same result:

In [17]:
A = np.array([[1, 1, 1, 1], [2, 2, 2, 2], [3, 3, 3, 3]])
print(A)

[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]


__Exercise:__ Create and display the matrix
$$
    T = \begin{bmatrix}
    \phantom{-}0 & \phantom{-}1 & \phantom{-}0 \\
    -1 & \phantom{-}0 & \phantom{-}1 \\
    \phantom{-}0 & -1 & \phantom{-}0
    \end{bmatrix}
$$

__Exercise:__ Create a $ 3\times 3 $ array to represent a tic-tac-toe board
Use the string `' '` (single whitespace) for an empty cell, and the strings
`'X'` and `'O'` for the pieces placed by each of the two players.
Set up the board for the following game state:
~~~
 X |   | O 
---+---+---
   | X |   
---+---+---
 O |   | X 
~~~

In [None]:
game_board = np.array(...)

print(game_board)

The __shape__ of an array is a tuple of integers indicating the size of each of
its dimensions. The preceding array $ A $ has shape $ (3, 4) $ since it
has three rows and four columns.

In [14]:
print(A.shape)  # Print the shape of A

(3, 4)


__Exercise:__ What is the shape of a one-dimensional array, for instance the array
$ \mathbf b $ below? Can you explain the result of `b.shape`?

In [19]:
b = np.array([True, False, False, True, False])

__Exercise:__ How would you create the matrix
$$
\mathbf B = \begin{equation*}
\left[ \begin{array}{cc}
b_{11} & b_{12} \\
b_{21} & b_{22} \\
b_{31} & b_{32} \\
b_{41} & b_{42}
\end{array} \right]
\end{equation*}
$$
where $ b_{ij} = i \cdot j $?

We may also use `array` to convert an existing list or tuple (or any other
iterable like a range, set or dict) to a one-dimensional array:

In [21]:
pi = 3.14
e = 2.72
phi = 1.62

constants = [pi, e, phi]      # Create a list containing the values of three important numbers
names = ('pi', 'e', 'phi')    # Create a tuple containing their names and assign it to `names`

print(np.array(constants))    # Convert `constants` to an array and print the result
print(np.array(names))        # Convert `names` to an array and print the result

[3.14 2.72 1.62]
['pi' 'e' 'phi']


__Exercise:__ Can you make your solution to the previous exercise more efficient
by using a list comprehension to generate the $ b_{ij} $ and then
converting the list to an array? _Hint:_ You will need a double comprehension, of the form `[[... for j in ...] for i in ...]`. 

📝 To recap, `ndarray` is the official name of the data type provided by NumPy, and `array` is both
the informal name of this data type and the name of the function that we can use
to create ndarrays.


A $ 3D$ array is to a matrix as a solid block is to a rectangle. In other words,
a rank $ 3 $ array is one having three axes instead of only two.

<img src="array_3D.png" alt="drawing" width="400"/>

__Exercise:__ What is the rank and shape of the array depicted above?

Here's a concrete example of a $ 3D $ array of shape $ 2 \times 2 \times 2 $.
Think of it as an array having $ 2 $ "rows",
each of which is a $ 2 \times 2 $ matrix.

In [3]:
A = np.array([[[1, 2],   # The first "row" is a 2x2 matrix
               [3, 4]], 

              [[5, 6],   # The second "row" is also a 2x2 matrix
               [7, 8]]])
print(A)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


__Exercise:__ What happens if you delete the blank line inside the definition of $ A $ above?
What happens if you insert an additional blank line?

Note that a $ 3D $ array need not be a "cube" (i.e., have all three dimensions
of the same length) as in the previous example.

__Exercise:__ Build a three-dimensional array of shape $ (2, 3, 4) $. 

There is no bound on the number of dimensions that an array can have, although
for most applications, arrays of dimension greater than $ 3 $ are rarely used.

## $ \S 4 $ Other ways to create arrays

### $ 4.1 $ Filling arrays automatically

There are other ways of creating arrays that are often more convenient than through use of the `array` function. For instance,
to instantiate an array of a desired shape filled with $ 0\text{s} $, we can use the function `zeros`:

In [8]:
Z = np.zeros((4, 4))  # The parameter of `zeros` is the shape you want
print(Z)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


📝 Note the necessary double parentheses in this call, one to delimit the
arguments of the function and the other to specify the shape, which is always a
tuple. However, if you want to create a $ 1D $ array having, say, $ 10 $ coordinates
equal to $ 0 $, then you can use either `np.zeros((10,))` or `np.zeros(10)`.

__Exercise:__ Create and print a $ 3D $ array of shape $ (3, 4, 5) $ filled with zeros.

Arrays can also be automatically populated with $ 1\text{s} $ by means of the function `ones`: 

In [9]:
U = np.ones((2, 3))
print(U)

[[1. 1. 1.]
 [1. 1. 1.]]


Generalizing `zeros` and `ones`, the `full` function creates an array of a
specified shape, filled entirely with a prescribed value:

In [3]:
import numpy as np
# Create a 3x5 array where every element equals 3.14:
P = np.full((3, 5), 3.14)
print(P)

[[3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]
 [3.14 3.14 3.14 3.14 3.14]]


__Exercise:__ Create a $ 1D $ array having $ 50 $ coordinates, all of them equal to $ 1 $, in two different ways.

Finally, we can create a new array _of the same shape and datatype_ as a
given array $ A $ but filled with zeros, ones or some other specified value with the
functions `zeros_like`, `ones_like` and `full_like`, respectively:

In [15]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])

Z = np.zeros_like(A)
print(Z)

T = np.full_like(A, 3)
print(T)

[[0 0 0]
 [0 0 0]]
[[3 3 3]
 [3 3 3]]


### $ 4.2 $ Generating sequences with `arange` and `linspace`

The `arange` function is much like the Python built-in `range`, but it returns an ndarray:

In [10]:
digits = np.arange(10)
print(type(digits), digits)


<class 'numpy.ndarray'> [0 1 2 3 4 5 6 7 8 9]


The full syntax is `arange(<start>, <stop>, <step>)`. Note that the starting
value is included, but the stopping value is not (exactly as in vanilla `range`).

In [22]:
y = np.arange(4, 10, 2)
print(y)

[4 6 8]


One advantage of `arange` over `range` is that _it accepts arguments of type float_, for instance:

In [12]:
x = np.arange(0.1, 1, 0.1)
print(x)

[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]


However, this feature must be used with caution, because sometimes rounding
errors may lead to unexpected results, as in the following example:

In [13]:
print(np.arange(1, 1.3, 0.1))
print("Note that the value 1.3 was included!")

[1.  1.1 1.2 1.3]
Note that the value 1.3 was included!


__Exercise:__ For each item, create a $ 1D $ array containing the elements described:

(a) All integers from $ 5 $  to $ 15 $ (inclusive), but represented as
floating-point numbers.

(b) The sequence of even numbers between $ 2 $ and $ 19 $.

(c) All integers from $ 10 $ down to $ 1 $.

(d) All numbers from $ -3.14 $ to $ 2.86 = -3.14 + 6 $, with a stride of $ 2 $.

Alternatively, with `linspace` we can create an array containing linearly spaced
values inside a specified interval. The syntax is similar to that of `arange`,
except that the stop value in the second argument is included in the result, and
_the third argument gives the number of values to be generated_, instead of the
step size:

In [64]:
z = np.linspace(0, 10, 11)
print(z, '\n')

w = np.linspace(0, 10, 10)
print(w)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.] 

[ 0.          1.11111111  2.22222222  3.33333333  4.44444444  5.55555556
  6.66666667  7.77777778  8.88888889 10.        ]


__Exercise:__ Dividing the interval $ [0, 1] $ into three equal parts, we obtain three subintervals of length $ \frac{1}{3} $. How many subdivision points are needed? Use `linspace` to obtain them.

__Exercise:__ How many bounded intervals are determined by $ n + 1 $ equally spaced points on a line?

To recap, the syntax is `linspace(<start>, <stop (inclusive)>, <# 
elements>)`. We can exclude the stop value in `linspace` to make it behave
more similarly to `arange` using `endpoint=False`:

In [51]:
print(np.linspace(0, 10, 10, endpoint=False))

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


__Exercise:__ What happens to the result of `linspace` if the starting value is greater than the stopping value? What if they are equal? And what if the third argument is zero or negative? 

## $ \S 5 $ Accessing and modifying individual array elements

Recall that lists in Python are __mutable__, meaning that we may modify
individual elements of a list:

In [24]:
primes = [199, 1_999, 19_999]
primes[2] = 19

In contrast, tuples are _immutable_. We can still access its elements through
`[]`, but we can't modify them:

In [27]:
fruits = ('🍎', '🍊', '🍍')
print(fruits[0])

🍎


In [28]:
fruits[0] = '🍉'

TypeError: 'tuple' object does not support item assignment

NumPy arrays are mutable, like lists. Consider the following vector $ \mathbf{a} $:

In [37]:
a = np.array([1, 2, 3])
print(a)

[1 2 3]


To access or modify the $ 0 $-th element of $ \mathbf a $ (recall that we always
count from $ 0 $ in Python), we use the same syntax as we would if it were a
list:

In [38]:
print(a[0])  # Access 0-th element of `a`
a[0] = -1    # Modify this element
print(a)     # Print the result


1
[-1  2  3]


If we are dealing with a $ 2D $ array, we use `[i, j]` to access its $ (i, j) $-th entry, that is, the element in row $ i $ and column $ j $:

In [10]:
A = np.ones((2, 2))
print("Before modifications:")
print(A, '\n')

A[0, 1] = 0
A[1, 0] = 0 
print("After modifications:")
print(A)

Before modifications:
[[1. 1.]
 [1. 1.]] 

After modifications:
[[1. 0.]
 [0. 1.]]


In general, when dealing with an $ n $-dimensional array, use `[k_1, k_2, ..., k_n]` to access its element having indices $ k_1, k_2, \cdots, k_n $, respectively.

__Exercise:__ Build a "$ 3D $ identity array" $ M $ of shape $ (5, 5, 5) $ by
first populating it with zeros, then setting all elements with indices
of the form $ (i, i, i) $ equal to $ 1 $ in the following two ways: 

(a) Using a `for` loop.

(b) With the single call `fill_diagonal(M, 1)`.

In [11]:
# Populate M with zeros:
# M = ...

# Set diagonal elements equal to 1:
# ...

# Print the result:
# print(M)


## $ \S 6 $ Other NumPy features

As we will see later, arrays are far more efficient and convenient for numerical
computation than Python's built-in data types such as lists, both in memory and
in computational costs. NumPy is used in data analysis, machine learning,
engineering and any other field that requires intensive numerical computation.
It also serves as the basis for higher-level scientific computing libraries such
as SciPy, Pandas, and scikit-learn.  Other features supplied by NumPy include
(but are not limited to):
* Basic statistical operations;
* Random number generation;
* Fourier transforms and signal processing;
* Integration with various databases and file formats for data import/export.

We will meet and use some of these in other notebooks.