# Introduction to NumPy

## $ \S 1 $ Motivation

Suppose that we wish to represent three-dimensional vectors such as $ u = (1, 2,
3) $ or $ v = (-1, 0, 1) $ in Python. It is natural to think that either lists
or tuples might be a good choice for this task.

In [20]:
u = [1, 2, 3]   # Create a list whose elements are 1, 2, 3
v = [-1, 0, 1]

However, at some point we will probably wish not only to store, but to manipulate
vectors. For instance, how can we add $ u $ and $ v $ or take a multiple of $ v $? It
is reasonable to try the following code:

In [4]:
s = u + v
multiple = 3 * v
print(s)
print(multiple)

[1, 2, 3, -1, 0, 1]
[-1, 0, 1, -1, 0, 1, -1, 0, 1]


These unexpected results can be explained by recalling that for either lists or
tuples (or strings), the `+` operator denotes _concatenation_, not addition; and
accordingly, `*` denotes _repetition_, not multiplication. This behavior is not
so strange at all if we take into account that lists and tuples are _generic_
sequential types, capable of holding objects of arbitrary types, for which
addition or multiplication might not make sense.

__Exercise:__ What happens if you represent $ u $ and $ v $ as tuples and try to take their difference $ u - v $? What if they are represented as lists?

## $ \S 2 $ Arrays

### $ 2.1 $ NumPy arrays

**NumPy**, which stands for _Numerical Python_, is a foundational package
for scientific computing in Python. It is almost universally imported with the
`np` alias, as follows:

In [1]:
import numpy as np

The central feature in NumPy is a new data structure called an **ndarray**
(an abbreviation of $n$-dimensional array), or simply **array**. An ndarray is a
grid of values _of the same type_, usually numerical. In other words, arrays
must be **homogeneous**. For example, a $ 1 $-dimensional array is essentially a vector in
the sense of Linear Algebra, as in the discussion in $ \S 1 $.  Arrays can be
instantiated with the `array` function:

In [19]:
u = np.array([1, 2, 3])
print(u)
print("Note the absence of commas (,) separating the entries when an array is displayed.")
print(f"The type of an array such as u is: {type(u)}")

[1 2 3]
Note the absence of commas (,) separating the entries when an array is displayed.
The type of an array such as u is: <class 'numpy.ndarray'>


The number of dimensions of an array is also called its **rank**. A $ 2 $-dimensional
array, or array of rank $ 2 $, is just a matrix.

In [13]:
A = np.array([[1, 1, 1, 1],   # first row of matrix A
              [2, 2, 2, 2],   # second row
              [3, 3, 3, 3]])  # third row
print(A)
print(type(A))

[[1 1 1 1]
 [2 2 2 2]
 [3 3 3 3]]
<class 'numpy.ndarray'>


Note the use of _double_ brackets here: the first opening bracket `[` serves
to delimit the array, while the second one is being used to delimit the elements
of the first row. The rows are separated by commas, as are the elements within each row.


__Exercise:__ How would you create the matrix
$$
B = \begin{equation*}
\left[ \begin{array}{cc}
b_{11} & b_{12} \\
b_{21} & b_{22} \\
b_{31} & b_{32} \\
b_{41} & b_{42}
\end{array} \right]
\end{equation*}
$$
where $ b_{ij} = i \cdot j $?

📝 To recap, `ndarray` is the official name of the data type provided by NumPy, and `array` is both
the informal name of this data type and the name of the function that we can use
to create ndarrays by explicitily listing all of its entries.

A $ 3D$ array is to a matrix as a solid block is to a rectangle. In other words,
a rank $ 3 $ array is one having three axes, instead of just two. There is no
bound on the number of dimensions that an array can have, although for most
applications, arrays of dimension greater than $ 3 $ are rarely used. 

__Exercise:__ Build a three-dimensional array of shape $ (2, 3, 2) $. _Hint:_ You will need triple opening brackets to start. It may help to think of the array as having $ 2 $ "rows", each of which is a $ 3 \times 2 $ matrix.

There are other ways of creating arrays that are often more convenient than through use of the `array` function. For instance,
to create an array of a desired shape filled with $ 0\text{s} $, we can use `zeros`:

In [33]:
Z = np.zeros((4, 4))
print(Z)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


📝 Note the necessary double parentheses in this call, one to delimit the arguments of the function and the other to specify the shape.

Arrays can also be automatically populated with $ 1\text{s} $ by means of the function `ones`: 

In [36]:
U = np.ones((2, 3))
print(U)

[[1. 1. 1.]
 [1. 1. 1.]]


The `arange` function is like the Python built-in `range`, except that it returns an ndarray:

In [52]:
digits = np.arange(10)
print(type(digits), digits)


<class 'numpy.ndarray'> [0 1 2 3 4 5 6 7 8 9]


The full syntax is `arange(<start>, <stop>, <step>)`, just like for `range`:

In [39]:
y =np.arange(4, 10, 2)
print(y)

[4 6 8]


One advantage of `arange` over `range` is that it accepts arguments of type float, for instance:

In [None]:
x = np.arange(0.1, 1, 0.1)
print(x)

[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]


However, this feature must be used with care, because sometimes rounding errors may lead to unexpected results, as in the following example:

In [50]:
print(np.arange(1, 1.3, 0.1))
print("Note that the value 1.3 was included!")

[1.  1.1 1.2 1.3]
Note that the value 1.3 was included!


Alternatively, with `linspace` we can create an array containing linearly spaced values inside a specified interval. The syntax is similar, except that the stop value in the second argument is included in the result, and the third argument gives the _number of values_ to be generated, instead of the step size:

In [45]:
z = np.linspace(0, 10, 11)
print(z)
w = np.linspace(0, 10, 10)
print(w)

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]
[ 0.          1.11111111  2.22222222  3.33333333  4.44444444  5.55555556
  6.66666667  7.77777778  8.88888889 10.        ]


We can exclude the stop value in `linspace` using `endpoint=False`:

In [51]:
print(np.linspace(0, 10, 10, endpoint=False))

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


### $ 2.2 $ NumPy features

Besides arrays, other features provided by NumPy include (but are not limited to):
* Operations with, functions on and manipulation of arrays;
* Fourier transforms and signal processing;
* Basic statistical operations;
* Random number generation;
* Integration with various databases and file formats for data import/export.

In summary, NumPy provides a high-performance multidimensional array object,
along with tools for working with these arrays. As we will see, arrays
are far more efficient and convenient for numerical computation than Python's
built-in data types, such as lists. NumPy is widely used in data analysis,
machine learning, engineering, and other fields that require intensive numerical
computation. It also serves as the basis for higher-level scientific computing
libraries such as SciPy, Pandas, and scikit-learn. 

### $ 2.3 $ Attributes of arrays

Just as in Linear Algebra, a $ 2D $ array can have any **shape**, which is the
number of elements along each of its **axes** (horizontal and vertical).
Referring to the preceding example, the shape of our matrix $ A $ is $ 3 \times 4 $,
or $ (3, 4) $, since it has three rows and four columns:

In [14]:
print(A.ndim)    # Print the number of dimensions of A
print(A.shape)   # Print the shape of A

2
(3, 4)


The number of dimensions of an array is always a positive integer, while the shape is always a tuple, even when the array is one-dimensional:

In [35]:
a = np.array([11, 13, 17])
print(a.shape)
print(type(a.shape))

(3,)
<class 'tuple'>


An object such as $ A $ has several **attributes**, which are just certain
properties associated to any element of the same class (in this case, that of
ndarrays). To access an attribute of an object `x`, we use the syntax
`x.attribute`.

The two main attributes of ndarrays are `shape` and `ndim`. Other key
attributes include `size`, i.e., the total number of elements in the array, and
`dtype`, which yields the data type of its elements.

__Exercise:__ If an array has shape $ (2, 3, 4) $, what is its size? More generally, what is the size of an array of shape $ (n_1, n_2, \cdots, n_d) $? What is the type of the object returned by `size`?

__Exercise:__ Let
$$
B = \begin{bmatrix}
1 & 2 & 3 \\
-4 & -5 & -6 \\
\end{bmatrix}\,.
$$
Use NumPy to create this array and compute its dimension, shape, size and datatype.
Can you build an array whose datatype is `bool`?

### $ 2.4 $ Array methods

Besides attributes, objects of a certain class usually come with predefined
**methods**, which are simply functions associated to objects of that class. The
syntax for calling method `f` of object `x` is `x.f(<arguments>)`. For instance,
the `sum` method associated to each array returns the sum of all of its entries:

In [29]:
C = np.array([[-1.0, 2.3, 3.7],
              [-4.5, 2.7, -0.7]])
print(C.sum())

2.5


As an optional argument to `sum`, we can designate an axis over which the sum should take place. As always in Python, indexing is zero-based, meaning that for matrix $ C $ above, the rows lie along axis $ 0 $ and the columns along axis $ 1 $.

In [28]:
print(C.sum(axis=0))

[-5.5  5.   3.2]


We think of $ C $ as the matrix $ C = (c_{ij}) $, where $ i $ is the index for axis $ 0 $ (i.e., the index of rows), then taking the sum along this axis means that for fixed $ j $, NumPy computes $ \sum_{i} c_{ij} $, resulting in the preceding $ 3D $ vector since $ C $ has three columns.

__Exercise:__ Compute the sum of the entries of $ C $ along the column index.

## $ \S X $ Reshaping arrays

Reshaping arrays is a common and fundamental operation in NumPy. There is both
a function and a method named `reshape` that can accomplish this:

In [26]:
a = np.array([1, 2, 3, 4, 5, 6])
print(a, end='\n\n')

A = np.reshape(a, (3, 2))  # Here we use the _function_ `reshape`
B = a.reshape((2, 3))   # Here we use the `reshape` method

print(A, end='\n\n')  # Here a has been reshaped into a 3 by 2 matrix
print(B, end='\n\n')  # Here a has been reshaped into a 2 by 3 matrix


[1 2 3 4 5 6]

[[1 2]
 [3 4]
 [5 6]]

[[1 2 3]
 [4 5 6]]



Note that when reshaping an array, the new shape must be compatible with the
size of the original array. For example, the following results in an error:

In [None]:
C = np.reshape(a, (2, 2))

When reshaping an array, we may also specify $ -1 $ in a dimension to instruct
NumPy to infer the number of elements along that dimension from the size of the
array and that of the remaining dimensions. This is especially useful when an
array is passed to us by the user as an argument in a function call, but we do
not know in advance how many entries it has:

In [27]:
a = np.array([[1, 2],
              [3, 4]])
A = a.reshape((-1, 1))  # Reshape into a column vector
print(A)

[[1]
 [2]
 [3]
 [4]]


In this example we wanted to reshape our array so that the result would have
one column, but didn't want to figure out how many rows it should have for that
to happen. Here's another example, in which we reshape a $ 1D $ array into a
matrix and then to a row vector:

In [28]:
x = np.arange(1, 13)
X = x.reshape((3, -1))
x_col = X.reshape((1, -1))

print(x, end='\n\n')
print(X, end='\n\n')
print(x_col, end='\n\n')

[ 1  2  3  4  5  6  7  8  9 10 11 12]

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

[[ 1  2  3  4  5  6  7  8  9 10 11 12]]



📝 There is no essential difference between the function and the method versions
of `reshape`. In both cases, NumPy returns a _new_ array, while the original
array remains unchanged. However, these operations provide a view of the
original array's data whenever possible, meaning that they do not copy the
array's data unless necessary. Thus, _modifications to the data in the reshaped
array can affect the original array and vice versa_. Let's use the previous
example to illustrate this:

In [18]:
X[0, 0] = -23  # Modify the top left element of X
print(x)  # The 0th element of x has also been affected!

[-23   2   3   4   5   6   7   8   9  10  11  12]


To create an independent copy of a NumPy array, we can use the `copy` method.
This method generates a new array object with the same data as the original
array, but stored in a separate memory location.

In [25]:
y = np.arange(3)  # y is the 1D array with entries 0, 1, 2
Y = y.copy().reshape((1, -1))  # Reshape y into an independent 2D row vector
y[0] = 10  # Modify 0th element of y
print(y)
print(Y)  # The 0th element of Y is not affected, since Y is an independent copy

[10  1  2]
[[0 1 2]]


The `flatten` method takes a multi-dimensional array and returns a new,
independent one-dimensional array containing all of the elements of the original
array, while preserving their order. 

In [30]:
A = np.array([[1, 2],
              [3, 4]])
a = A.flatten()
print(a)

[1 2 3 4]


The order in which the elements are placed in the flattened array is based on
the lexicographic ordering of their indices in the original array. For example,
if we are dealing with a $ 3D $ array, then the entry at position $ (0, 0, 2) $
comes before the entry at $ (0, 1, 0) $, which will be placed before the entry
at $ (1, 0, 0) $.

In [37]:
A = np.arange(6).reshape((2, 3))
print(A)
a = A.flatten()
print(a)

[[0 1 2]
 [3 4 5]]
[0 1 2 3 4 5]
