<a href="https://colab.research.google.com/github/lfmartins/introduction-to-computational-mathematics/blob/main/11-TaylorSeries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this notebook, we will study how to solve some basic problems of matrix algebra computationally. We cover the following topics:

- Matrix and vector operations
- Solution of linear systems

Before running code in the notebook, please execute the following cell, which imports the modules that we will be using:

In [1]:
import numpy as np

# Matrix and vector operations

Matrices and vectors are represented, as expected, by `numpy` arrays. A matrix is just an two-dimensional array: 

In [3]:
A = np.array([[1,-1,3,4],[2,0,-3,1],[1,1,2,-2]], dtype=np.float64)
A

array([[ 1., -1.,  3.,  4.],
       [ 2.,  0., -3.,  1.],
       [ 1.,  1.,  2., -2.]])

Notice that we specify `dtype=np.float64` to force the matrix entries to be floating-point numbers, which is what we usually want when doing numerical computations.

Vectors can be represented as one-dimensional arrays:

In [5]:
u = np.array([1,2,0,-3], dtype=np.float64)
v = np.array([2,-3,1], dtype=np.float64)
u, v

(array([ 1.,  2.,  0., -3.]), array([ 2., -3.,  1.]))

There is a peculiarity about one-dimensional arrays: they are neither row nor column vectors, but can play the role of both, depending on the context they are used. If we need to guarantee that we have a column or row vector, we can use the `reshape` functions:

In [10]:
u_col = u.reshape(4, 1)
u_col

array([[ 1.],
       [ 2.],
       [ 0.],
       [-3.]])

In the code above, we are asking to reshape the array `u` to be a $4\times 1$ array, which is a column vector. If we want a row vector instead, we can use:

In [11]:
u_row = u.reshape(1, 4)
u_row

array([[ 1.,  2.,  0., -3.]])

Matrix addition and multiplication by scalar are performed by the operators `+` and `*`, respectively:

In [12]:
B = np.array([[3,1,-1,0],[2,-3,4,4],[3,-2,1,2]])
B

array([[ 3,  1, -1,  0],
       [ 2, -3,  4,  4],
       [ 3, -2,  1,  2]])

In [13]:
A + B

array([[ 4.,  0.,  2.,  4.],
       [ 4., -3.,  1.,  5.],
       [ 4., -1.,  3.,  0.]])

In [14]:
3 * A

array([[ 3., -3.,  9., 12.],
       [ 6.,  0., -9.,  3.],
       [ 3.,  3.,  6., -6.]])

Matrix multiplication _is not_ represented by `*`. This can be seen clearly if we try to multiply the matrix `A` by the vector `u`:

In [15]:
A * u

array([[  1.,  -2.,   0., -12.],
       [  2.,   0.,  -0.,  -3.],
       [  1.,   2.,   0.,   6.]])

Look carefully at the result. It should be clear that the result _is not_ what we expect when computing the matrix-vector product $Au$. Instead, each _row_ of `A` is multiplied, component by component, by the entries of vector `u`.

Behind the scenes, numpy is using what is called _broadcasting_. The rules for broadcasting are complex, so let's see how `numpy` deals with the present example.

- `numpy` realizes that we are trying to multiply the array `A` of shape (3,4) by the array `u` of shape (4,).
- Since the sizes don't match, `numpy` expands the array `u` by interpreting it as a row vector and repeating it to obtain an array of shape (3,4)
- Once this is done, the two arrays match shapes, and can be multiplied.

This could be accomplished explicitly with the following code:

In [17]:
u_stack = np.stack([u,u,u])
u_stack

array([[ 1.,  2.,  0., -3.],
       [ 1.,  2.,  0., -3.],
       [ 1.,  2.,  0., -3.]])

In [18]:
A * u_stack

array([[  1.,  -2.,   0., -12.],
       [  2.,   0.,  -0.,  -3.],
       [  1.,   2.,   0.,   6.]])

The take-home message is that, in general _we should not use the `*` operator to multiply matrices_, unless in very specific applications.

To compute the matrix-vector product $Au$ we have to options: use either the `dot()` method or the `@` operator. These two methods are illustrated in the following two cells:

In [19]:
A.dot(u)

array([-13.,  -1.,   9.])

In [20]:
A @ u

array([-13.,  -1.,   9.])

Notice that `numpy` is smart enough to realize that `u` should be interpreted as a column vector. Also notice that the returned vector is of shape (3,), which, if interpreted as a column vector, has the right dimension. If we try to compute $uA$ we get an error:

In [21]:
u.dot(A)

ValueError: shapes (4,) and (3,4) not aligned: 4 (dim 0) != 3 (dim 0)

We can, of course, left-multiply an array of shape (3,) by the matrix `A`:

In [22]:
v.dot(A)

array([-3., -1., 17.,  3.])

It should come as no surprise that matrix inversion _is not_ computed by raising to the power $-1$, as shown in the example below:

In [28]:
M = np.array([[1,2,-2,1],[2,-3,4,-5],[1,1,2,-1],[-3,2,1,2]], dtype=np.float64)
M

array([[ 1.,  2., -2.,  1.],
       [ 2., -3.,  4., -5.],
       [ 1.,  1.,  2., -1.],
       [-3.,  2.,  1.,  2.]])

In [29]:
M ** (-1)

array([[ 1.        ,  0.5       , -0.5       ,  1.        ],
       [ 0.5       , -0.33333333,  0.25      , -0.2       ],
       [ 1.        ,  1.        ,  0.5       , -1.        ],
       [-0.33333333,  0.5       ,  1.        ,  0.5       ]])

As can be seen, what `numpy` is doing here is to raise every entry of the matrix to the power $-1$, which is does not correspond to matrix inversion. To compute the inverse of a matrix, we can use the function `np.linalg.inv()`:

In [34]:
M_inv = np.linalg.inv(M)
M_inv

array([[-0.51612903, -0.48387097,  0.74193548, -0.58064516],
       [ 0.67741935,  0.32258065, -0.16129032,  0.38709677],
       [-0.64516129, -0.35483871,  0.67741935, -0.22580645],
       [-1.12903226, -0.87096774,  0.93548387, -0.64516129]])

In [35]:
M.dot(M_inv)

array([[ 1.00000000e+00, -5.55111512e-17,  2.22044605e-16,
        -1.11022302e-16],
       [ 0.00000000e+00,  1.00000000e+00, -8.88178420e-16,
         0.00000000e+00],
       [ 0.00000000e+00,  0.00000000e+00,  1.00000000e+00,
         2.22044605e-16],
       [-3.33066907e-16, -2.22044605e-16,  0.00000000e+00,
         1.00000000e+00]])

Notice that the product is not exactly equal to the identity matrix due to rounding errors.

Here is a summary of what we learned in this section:

- Matrix addition and scalar-matrix multiplication are done with the operators `+` and `*`, respectively.
- Matrix multiplication is done either with the method `dot()` or the operator `@`.
- Matrix inversion is doen with the `np.linalg.inv()` function.

# Solving linear systems

To solve a square linear system, the recommended method is to use the function `np.linalg.solve()`, as illustrated in the example below:

In [36]:
A = np.array([[1,2,1,-3],[2,-1,2,1],[0,1,2,1],[-3,2,1,1]], dtype=np.float64)
A

array([[ 1.,  2.,  1., -3.],
       [ 2., -1.,  2.,  1.],
       [ 0.,  1.,  2.,  1.],
       [-3.,  2.,  1.,  1.]])

In [37]:
b = np.array([1,1,1,1])
b

array([1, 1, 1, 1])

To solve the system $Ax=b$, we use:

In [38]:
x = np.linalg.solve(A, b)
x

array([-0.5, -0.5,  1. , -0.5])

We can verify that the solution is correct we can use matrix-vector multiplication:

In [39]:
A.dot(x)

array([1., 1., 1., 1.])

An important point to notice is that, to solve large systems, _matrix inversion should not be used_. The function `np.linalg.solve()` uses an $LU$ decomposition, a method based on Gaussian Elimination that is much more efficient than inversion. To see the point, let's create a $100\times 100$ random system and solve it both by inversion and using `np.linalg.solve()`:

In [43]:
A = np.random.rand(100, 100)
b = np.random.rand(100)

In [44]:
%%timeit
x = np.linalg.inv(A).dot(b)

213 µs ± 76.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [45]:
%%timeit
x = np.linalg.solve(A, b)

76.3 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


It can be seen that using `np.linalg.solve()` is much faster.

# Exercises



*1.* *Polynomial interpolation*. Suppose we are given the following four points on the plane: 
$$
\begin{matrix}
(x_0,y_0)=(1,-2)\\
(x_1,y_1)=(3,5)\\
(x_2,y_2)=(4,-10)\\
(x_3,y_3)=(-1, -3)
\end{matrix}
$$

We want to find a cubic polynomial:
$$
f(x) = a_0+a_1x+a_2x^2+a_3x^3
$$
such that $f(x_i)=y_i$ for $i=0,1,2,3$. Show that this problem can be formulated as a linear system with unknowns $a_0$, $a_1$, $a_2$ and $a_3$, and solve the system using `numpy`.
Then, plot the given points and the cubic polynomial to verify that the curve actually goes through the given points.