In [2]:
import numpy as np

## $ \S 1 $ Array methods

Besides attributes, objects of a certain class usually come with predefined
**methods**, which are simply functions associated to objects of that class. The
syntax for calling method `f` of object `x` is `x.f(<arguments>)`. For instance,
the `sum` method associated to each array returns the sum of all of its entries:

In [None]:
C = np.array([[-1.0, 2.3, 3.7],
              [-4.5, 2.7, -0.7]])
print(C.sum())

2.5


As an optional argument to `sum`, we can designate an axis over which the sum should take place. As always in Python, indexing is zero-based, meaning that for matrix $ C $ above, the rows lie along axis $ 0 $ and the columns along axis $ 1 $.

In [None]:
print(C.sum(axis=0))

[-5.5  5.   3.2]


If we think of $ C $ as the matrix $ C = (c_{ij}) $, where $ i $ is the index for axis $ 0 $ (i.e., the index of rows), then taking the sum along this axis means that for fixed $ j $, NumPy computes $ \sum_{i} c_{ij} $, resulting in the preceding $ 3D $ vector since $ C $ has three columns. To put it another way, the axis specified as the argument to `sum` is the one that gets collapsed, in this case by the summation.

__Exercise:__ Compute the sum of the entries of $ C $ along the column index.

## $ \S 2 $ Basic terminology of probability theory through an example

Consider an experiment which consists of rolling a six-faced die. Then the set
of all possible _outcomes_ is given by $ \Omega = \{1, 2, \cdots, 6\} $.  This
is our _sample space_ in this case. An _event_, such as rolling an even number,
is simply a subset of $ \Omega $, in this case $ E = \{2, 4, 6\} $.  If the die
is perfectly balanced, so that each outcome is equally likely, then the
_probability distribution_ on our sample space is the function $ f \colon \Omega
\to [0, 1] $ that assigns the value $ \frac{1}{6} $ to each face.  Actually,
this is a very special distribution called the _uniform distribution_.  However,
in practice the die will never be perfectly balanced, so that some outcomes
(faces) may be more probable than others. It is the purpose of the probability
distribution to describe how probabilities are allocated among the possible
outcomes.

The preceding example illustrates a _finite_ sample space. However, many
interesting phenomena, such as measuring the luminosity of a star or the
height of an individual, are more adequately modeled by infinite sample spaces.
In this case, the probability of any single outcome is zero. The probability
of an event $ A $ is obtained by _integrating_ the probability distribution over
$ A $.  The formal definitions below will cover both cases.

### ⚡ 2.2 Formal definitions

Let $ \Omega $ be an arbitrary set (not necessarily finite nor discrete). An
element of $ \Omega $ will be called an __outcome__, and a subset of $ \Omega $
will be called an __event__.  A __probability function__ on $ \Omega $ is a real
function defined on the set of events of $ \Omega $ that satisfies the following
three properties (called the __Kolmogorov axioms__):

1. (_Non-negativity_) For any event $ A $ in the sample space $ S $, the
probability of $ A $ is non-negative; in symbols, $ P(A) \ge 0 $.

2. (_Unit Measure_) The probability of the entire sample space is 1, that is,
$ P(\Omega) = 1 $. This axiom implies that the probability that some outcome in
the sample space will occur is certain.

3. (_Countable Additivity_) For any countable or finite sequence of mutually
exclusive events $ A_1, A_2, A_3, \ldots $ (meaning no two events have any
outcomes in common), the probability of the union of these events is equal to
the sum of their individual probabilities. Symbolically, if $A_i \cap A_j =
\emptyset$ for $i \neq j$, then $P\left(\bigcup_{i=1}^{\infty} A_i\right) =
\sum_{i=1}^{\infty} P(A_i)$.

Formally, a __sample space__ is a pair consisting of a set $ \Omega $ and
a probability function $ P $ as above. (Actually, our definition is not
entirely correct because in the case where $ \Omega $ is not
dicrete, it may not be possible to define $ P $ consistently over the
set of _all_ subsets of $ \Omega $. Because of this, we only require that $ P $
be defined over a so-called $ \sigma $-algebra, but we will ignore these
difficulties here).

Usually a sample space is described not in terms of the probability function
$ P $ above (which is defined on a set of subsets of $ \Omega $), but in terms
of a simpler function $ f $ defined over $ \Omega $ itself, called a
__probability distribution__.

A __discrete__ (or countable) sample space is one whose outcomes $ \{x_1, x_2,
\cdots, x_n,\cdots \} $ are in one-to-one correspondence with set of natural
numbers $ \{1, 2, \cdots, \} $ or one of its subsets. In this case, $ P $ is
completely determined by a suitable assignment of probabilities to each of the
outcomes $ x_k $, by means of the distribution function $ f \colon \Omega \to
[0, 1] $. In this context, $ f $ is also called a _probability mass function_.
Then for an arbitrary event $ A $, we have
$$
P(A) = \sum_{x_k \in A} f(x_k)\qquad (A \subset \Omega).
$$
Note that this series is necessarily convergent because the total probability
is $ 1 $.

For a uncountable sample space $ \Omega \subset \mathbb R $, we can again
describe the probability function $ P $ by a simpler distribution function $ f
\colon \Omega \to [0, 1] $. However, this time we _integrate instead of summing_:
$$
P(A) = \int_A f(x)\,dx \qquad (A \subset \Omega).
$$
In this context $ f $ is also called a __probability density function__.
In most cases we assume that $ f $ is continuous, so that the integral above
exists for any reasonable event $ A $ of $ \Omega $.

## $ \S 2 $ Random number generation

A common way to create an array is by populating it randomly. This is useful for
example to initialize weights in a neural network; another common application is
to randomly shuffle a dataset.

NumPy provides a comprehensive toolkit for random number generation through its
`numpy.random` module. However, before discussing these tools, let us first
briefly explain some basic terminology from probability theory that are
necessary to grasp what is going on behind the scenes.


In [None]:
# To generate random floating-point numbers between 0 and 1, use `rand`:
random_vector = np.random.rand(5)  # 1D array of 5 random floats
random_matrix = np.random.rand(2, 3)  # 2x3 array of random floats

print(random_vector, '\n')
print(random_matrix)

[0.61408305 0.75204832 0.62026719 0.30220243 0.47913732] 

[[0.03751491 0.90405153 0.23338299]
 [0.99608328 0.78971066 0.50382658]]


The function `rand()` without arguments draws a random floating-point number in
$ [0.0, 1.0) $ according to a uniform distribution, meaning that the probability
that the result lies in an interval $ I \subset [0, 1) $ is exactly the length $
\vert{I}\vert $ of $ I $. More generally, the arguments of `rand` describe the
size of the resulting array of random numbers, all drawn from this same distribution.
In particular, note that every call to `rand` results in new numbers.

NumPy also provides functions to rearrange arrays randomly, which is especially useful in simulations and algorithms that require random sampling without replacement:

* Shuffle (`shuffle`): Modifies an array in-place by shuffling its contents.
* Permutation (`permutation`): Returns a new array that is a randomly permuted sequence of its input.

## $ \S X $ Reshaping arrays

Reshaping arrays is a common and fundamental operation in NumPy. There is both
a function and a method named `reshape` that can accomplish this:

In [None]:
a = np.array([1, 2, 3, 4, 5, 6])
print(a, end='\n\n')

A = np.reshape(a, (3, 2))  # Here we use the _function_ `reshape`
B = a.reshape((2, 3))   # Here we use the `reshape` method

print(A, end='\n\n')  # Here a has been reshaped into a 3 by 2 matrix
print(B, end='\n\n')  # Here a has been reshaped into a 2 by 3 matrix


[1 2 3 4 5 6]

[[1 2]
 [3 4]
 [5 6]]

[[1 2 3]
 [4 5 6]]



Note that when reshaping an array, the new shape must be compatible with the
size of the original array. For example, the following results in an error:

In [None]:
C = np.reshape(a, (2, 2))

When reshaping an array, we may also specify $ -1 $ in a dimension to instruct
NumPy to infer the number of elements along that dimension from the size of the
array and that of the remaining dimensions. This is especially useful when an
array is passed to us by the user as an argument in a function call, but we do
not know in advance how many entries it has:

In [None]:
a = np.array([[1, 2],
              [3, 4]])
A = a.reshape((-1, 1))  # Reshape into a column vector
print(A)

[[1]
 [2]
 [3]
 [4]]


In this example we wanted to reshape our array so that the result would have
one column, but didn't want to figure out how many rows it should have for that
to happen. Here's another example, in which we reshape a $ 1D $ array into a
matrix and then to a row vector:

In [None]:
x = np.arange(1, 13)
X = x.reshape((3, -1))
x_row = X.reshape((1, -1))

print(x, end='\n\n')
print(X, end='\n\n')
print(x_row, end='\n\n')

[ 1  2  3  4  5  6  7  8  9 10 11 12]

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

[[ 1  2  3  4  5  6  7  8  9 10 11 12]]



📝 There is no essential difference between the function and the method versions
of `reshape`. In both cases, NumPy returns a _new_ array, while the original
array remains unchanged. However, these operations provide a view of the
original array's data whenever possible, meaning that they do not copy the
array's data unless necessary. Thus, _modifications to the data in the reshaped
array can affect the original array and vice versa_. Let's use the previous
example to illustrate this:

In [None]:
X[0, 0] = -23  # Modify the top left element of X
print(x)  # The 0th element of x has also been affected!

[-23   2   3   4   5   6   7   8   9  10  11  12]


To create an independent copy of a NumPy array, we can use the `copy` method.
This method generates a new array object with the same data as the original
array, but stored in a separate memory location.

In [None]:
y = np.arange(3)  # y is the 1D array with entries 0, 1, 2
Y = y.copy().reshape((1, -1))  # Reshape y into an independent 2D row vector
y[0] = 10  # Modify 0th element of y
print(y)
print(Y)  # The 0th element of Y is not affected, since Y is an independent copy

[10  1  2]
[[0 1 2]]


The `flatten` method takes a multi-dimensional array and returns a new,
independent one-dimensional array containing all of the elements of the original
array, while preserving their order. 

In [None]:
A = np.array([[1, 2],
              [3, 4]])
a = A.flatten()
print(a)

[1 2 3 4]


The order in which the elements are placed in the flattened array is based on
the lexicographic ordering of their indices in the original array. For example,
if we are dealing with a $ 3D $ array, then the entry at position $ (0, 0, 2) $
comes before the entry at $ (0, 1, 0) $, which will be placed before the entry
at $ (1, 0, 0) $.

In [None]:
A = np.arange(6).reshape((2, 3))
print(A)
a = A.flatten()
print(a)

[[0 1 2]
 [3 4 5]]
[0 1 2 3 4 5]


## Vectorization


Suppose that we have three-dimensional arrays $ \mathbf u $ and $ \mathbf v $
and would like to take their inner (dot) product. Here's the most obvious way of doing
this in Python:

In [2]:
import numpy as np

dimension = 3
u = np.array([1, 2, 3])
v = np.array([-1, 0, 1])

product = 0
for i in range(dimension):
    product += u[i] * v[i]
print(product)

2


This approach's limitation is that it performs the required arithmetical
operations *in sequence*. That is, we first compute the product $ 1 \cdot (-1) $
and add it to our cumulative total; then we add $ 2 \cdot 0 $; and finally we add
$ 3 \cdot 1 $ to arrive at the result.

We can speed up computations of this kind significantly by processing the
entire arrays, or at least large chunks of the arrays, at once. This technique
is called **vectorization**. It leverages the optimized implementations of vector and
matrix operations provided by libraries such as NumPy, which make use of
multi-core CPUs or even GPUs (Graphics Processing Units) to execute tasks *in
parallel* whenever possible. 

The vectorized version of the previous example would be:

In [3]:
import numpy as np

u = np.array([1, 2, 3])
v = np.array([-1, 0, 1])

product = np.dot(u, v)  # Compute the dot product of u and v.
print(product)

2


This version is certainly more concise and legible. However, because the vectors
have only $ 3 $ dimensions, the performance boost is not noticeable. To better
appreciate the power of vectorization, let's consider the task of computing
the dot product of two vectors having $ 10 $ million dimensions. Here is the
vectorized version:

In [5]:
import time  # Module that will allow us to time the computations

# Generate two large vectors with random coordinates in [0, 1):
dimension = 10**7
u = np.random.rand(dimension)  
v = np.random.rand(dimension)

tic = time.time()
prod = np.dot(u, v)
toc = time.time()
vect_runtime = toc - tic

print(f"The inner product is {prod}.")
print(f"Runtime for vectorized version: {1000 * vect_runtime} ms.")

The inner product is 2499843.426373514.
Runtime for vectorized version: 4.890680313110352 ms.


📝 The function `time.time()` used above returns a floating point number that
represents the time in seconds since January 1st, 1970, 00:00:00 (UTC). The
duration of the computation was measured by taking the time before (`tic`) and
after (`toc`) it, and then taking the difference. 

Considering the dimension of the vectors, the computation was pretty fast. Let's
now contrast the performance to that of its non-vectorized counterpart.

In [8]:
import time  # A module that will allow us to time the computations

tic = time.time()
prod = 0
for i in range(dimension):
    prod += u[i] * v[i]
toc = time.time()
non_vect_runtime = toc - tic
print(f"The inner product is {prod}.")
print(f"Runtime for vectorized version: {non_vect_runtime} s.")
print(f"Or, in miliseconds: {1000 * (non_vect_runtime)}")

The inner product is 2499843.4263739116.
Runtime for vectorized version: 1.4287347793579102 s.
Or, in miliseconds: 1428.7347793579102


📝 The slight discrepancy in the value of the dot product between the two
versions arises because the non-vectorized method accumulates floating-point
arithmetic errors more prominently than the vectorized operation.

**Exercise:** Referring to the example above, compute the precise 
speedup factor relating the two implementations. Why does this value
change everytime you compute it?

This discussion shows that vectorization provides a simple way to improve the
performance of our code by several orders of magnitude. This is especially
crucial in machine learning and computer graphics. These fields frequently
involve dealing with vast datasets and complex numerical computations. In such
cases, code that does not make use of vectorization is simply not viable.

To summarize: *When performing operations on arrays, avoid loops whenever
possible; instead, use the built-in vectorized implementations provided by
NumPy.*

## Vector- and matrix-valued functions

To illustrate another aspect of vectorization, let $ \mathbf u = (1, 2, 3) $ and
suppose that we need to apply the exponential function to each of the
coordinates of $ \mathbf u $. That is, suppose that we wish to compute $ \exp(u)
= (e^1, e^2, e^3) $. We could simply use a for-loop:

In [40]:
u = np.array([1, 2, 3])
v = np.zeros(3)  # will hold the result
for i in range(3):
    v[i] = np.exp(u[i])
print(v)

[ 2.71828183  7.3890561  20.08553692]


However, there is an alternative way that is both simpler and more efficient,
namely, to use vectorization, in this case by leveraging to NumPy to apply the
exponential to the vector $ u $ as a whole:

In [41]:
v = np.exp(u)
print(v)

[ 2.71828183  7.3890561  20.08553692]


This approach also works with basic operations that are built into Python, such as:

In [45]:
squared_u = u**2
print(squared_u)

[1 4 9]


In [46]:
reciprocal_u = 1 / u
print(reciprocal_u)

[1.         0.5        0.33333333]


__Exercise:__ Similarly to what we did for the inner product, compare the difference in performance
between the vectorized and non-vectorized approaches to computing $ e^u $, where $ u $ is a random
vector having a large number of dimensions. What is the speedup factor, approximately?

__Exercise:__ Let $ u = (1, 2, 3) $. Compute:

(a) $ \log(u) $ (that is, the vector whose coordinates are the logarithms of the coordinates of $ u $). *Hint*: Use the `np.log` function.

(b) $ \vert{u}\vert $ (that is, the vector whose coordinates are the absolute values of the coordinates of $ u $). *Hint:* Use the `np.abs` function.

(c) $ \max\{u, 3\} $. *Hint:* Use the `np.max` function.

(d) $ \sin\big(\tfrac{\pi u}{2}\big) $. *Hint:* Use the `np.sin` function.

How did you interpret the vectors of items (c) and (d)?