## The NumPy package

### Coding lecture

#### What is NumPy?

NumPy is the Python package of reference for numerical computations. It provides the *array* data structures, the equivalent in Python of mathematical vectors, matrices, and n-dimensional arrays. On top of this, it defines a number of mathematical operators and functions, starting from the basic scalar addition between two vectors, up to linear algebra operations like computing the determinant of a matrix.

#### Importing NumPy

Before using NumPy, we must *import* it. This is the first time that we use a Python package in the course, so let's see how this works! We will first import the NumPy package, and then we will use one of its functions to create a random number.

In [1]:
# importing the NumPy package... yes, it is that easy!
import numpy

# creating a random number
numpy.random.normal(0, 1)

-0.6094558733737435

The `import numpy` line loaded all functionalities of the package at once. After that, we can use the `normal` function of the `random` module for generating a random value from a normal distribution with zero mean and unitary standard deviation.

It is common to use aliases so that not to type the whole name of the package each time we need it.

In [2]:
# importing NumPy and giving the alias np
import numpy as np

# one more random number!
np.random.normal(0, 1)

-0.632864619469811

Finally, you can import only the portion of the package you need.

In [3]:
# importing only the random module of NumPy
import numpy.random as npr

# one more random number!
npr.normal(0, 1)

0.17816613009516855

Now that we know how to import packages, we can start using them!

#### Using NumPy

Let's create our first array!

In [4]:
# first, we will create a list
L1 = [2, 3, 5, 4]
L1

[2, 3, 5, 4]

In [5]:
# and now, we will transform it in an array
v1 = np.array(L1)
v1

array([2, 3, 5, 4])

We can already see that arrays and lists *look* different. How this reflects on their behaviour?

In [6]:
# let's multiply our vector by 2
v1 * 2

array([ 4,  6, 10,  8])

When we multiply an array by 2 we obtain a new array where each component is exactly the same component of the previous array multiplied by two. This is called *scalar multiplication*. This behaviour was expected, since arrays are meant to represent mathematical vectors

In [7]:
# what happens when we multiply L1 by 2
L1 * 2

[2, 3, 5, 4, 2, 3, 5, 4]

When the `*` operator is applied to a list, it duplicates the elements in the list. This is a reminder of how different lists are from arrays: the first are ordered collections of items, not necessarily all of numeric type, while the second are meant to primarily represents vectors.

In [8]:
# one more example: exponentiation
v1 ** 2

array([ 4,  9, 25, 16])

In [9]:
L1 ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

This is even more apparent with exponentiation: while this again works as a regular scalar operation on arrays, there is no built-in behaviour for lists, and an error is generated

In general, all basic operations works at a scalar levels with arrays.

In [10]:
# let's create one more vector
v2 = np.array([7, 8, 10, 9])

In [11]:
v1

array([2, 3, 5, 4])

In [12]:
# addition
print(v1 + v2)

[ 9 11 15 13]


In [13]:
# subtraction
print(v1 - v2)

[-5 -5 -5 -5]


In [14]:
# multiplication
print(v1 * v2)

[14 24 50 36]


In [15]:
# division
print(v1 / v2)

[0.28571429 0.375      0.5        0.44444444]


In [16]:
# exponentiation
print(v1 ** v2)

[    128    6561 9765625  262144]


In [17]:
# and of course, this operation can be combined. Here we first perform the operation in parenthesis and then the division
print((v1 + v2) / v2)

[1.28571429 1.375      1.5        1.44444444]


As everything else in Python, NumPy arrays are *objets*. This means that they have *attributes* and *methods*. An important attribute is `shape`

In [18]:
# shape of v1
v1.shape

(4,)

The results, `(4,)`, indicates that the vector `v1` has only one dimension, and four elements in total. We can use the attribute `shape` for changing the number of dimension of the object.

In [19]:
# adding one more dimension to v1
v1.shape = (2, 2)
print(v1)

[[2 3]
 [5 4]]


We just obtained our first matrix in Python, by reshaping the vector `v1` in a two dimensional array. Notice that the total number of elements remained the same. We can obtain a similar results by using the `reshape` *method* instead.

In [20]:
# let's make v1 a vector again
v1.shape = (4,)
print(v1)

[2 3 5 4]


In [21]:
# and let's create a new matrix, m1
m1 = v1.reshape(2,2)
print(m1)

[[2 3]
 [5 4]]


We created a matrix, by using the `reshape` method that can be invoked by any array and returns a *new* array.

Not all array methods return a new object. Some of them operate *inplace*, meaning that they directly modify the array that invokes them.

In [22]:
# sorting the v1 vector
print(v1)
print('sorting...')
v1.sort()
print(v1)

[2 3 5 4]
sorting...
[2 3 4 5]


The `v1.sort()` statement did not return any value, however it changed the order to the elements inside `v1`

***

NumPy provides a truly huge amount of methods and functions. In this course we will barely scratch the surface of what NumPy can actually do. If you are wondering if NumPy offers a specific functionality that you need, you can consult the [official documentation](https://numpy.org/doc/stable/)

***


One important issue that should always be considered in numerical computation is the presence of non-finite values. These can be missing value or infinite quantities. NumPy provides way to represent and deal with these types of values.

In [23]:
# let's create a new vector with both infinite and missing values
v3 = np.array([2, np.NaN, 4, 8, np.Inf])
print(v3)

[ 2. nan  4.  8. inf]


Here, `np.NaN` stands for "not a number", and indicates a value that is missing. `np.Inf` instead indicates an infinite quantity.

What will happen if we try to compute the mean of such a vector?

In [24]:
# using the mean method on v3
v3.mean()

nan

The presence of non-finite values implies that the results will also be non-finite.

A common way to deal with this scenario is to exclude the non-finite values from the computation

In [25]:
# idenfiy the index of finite values
idx = np.isfinite(v3)
print(idx)

[ True False  True  True False]


The `idx` vector can now be used for single out the finite part of `v3`

In [26]:
# computing the mean on the finite part of v3
v3[idx].mean()

4.666666666666667

Finally, let's have a peak to some of NumPy more advanced functionalities. An important module is `linalg`.

In [27]:
# let's use the linalg module to compute m1 determinant
np.linalg.det(m1)

-2.0

In [28]:
# and now let's compute the inverse of m1
np.linalg.inv(m1)

array([[-2.5,  1.5],
       [ 2. , -1. ]])

As you can see, complext mathematical operations can be performed with a single line with NumPy. Whenever you need such types of computations in Python, check first the [online documentation](https://numpy.org/doc/stable/) first for methods or functions that directly implement the algorithm you need, or that at least allow you to implement it in a easier way.