# Numerical Methods 1
### [Gerard Gorman](http://www.imperial.ac.uk/people/g.gorman), [Matthew Piggott](http://www.imperial.ac.uk/people/m.d.piggott), [Christian Jacobs](http://www.christianjacobs.uk)

# Lecture ?: Numerical Linear Algebra III

## Learning objectives:

* Ill-conditioned matrices
* Direct vs iterative/indirect methods
* An example iternative algorithm

## Ill-conditioned matrices

The conditioning (or lack of, i.e. the ill-conditioning) of matrices we are trying to invert (to obtain the inverse, or to find the solution to a linear matrix system) is incredibly important for the success of any algorithm.

When we started talking about matrices we noted that as long as the matrix is non-singular, i.e. $\det(A)\ne 0$ then an inverse exists, and a linear system with that $A$ has a unique solution.

But what happens when we consider a matrix that is nearly singluar, i.e. $\det(A)$ is very small?

Well smallness is a relative term and so we need to ask the question of how large or small $\det(A)$ is compared to something.

That something is the *norm* of the matrix.

#### Vector norms

Just as for vectors $\pmb{v}$ (assumed a $n\times 1$ column vector) where we have multiple possible norms to help us decide quantify the magnitude of a vector:

\begin{align}
\|\pmb{v}\|_2 & = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2} = \left(\sum_{i=1}^n v_i^2 \right)^{1/2}, &\quad{\textrm{the two-norm or Euclidean norm}}\\
\|\pmb{v}\|_1  & = |v_1| + |v_2| + \ldots + |v_n| = \sum_{i=1}^n |v_i|, &\quad{\textrm{the one-norm or taxi-cab norm}}\\
\|\pmb{v}\|_{\infty}  &= \max\{|v_1|,|v_2|, \ldots, |v_n| = \max_{i=1}^n |v_i|, &\quad{\textrm{the max-norm or infinity norm}}
\end{align}

#### Matrix norms

We can define measures of the size of matrices, e.g. for $A$ which for complete generality we will assume is of shape $m\times n$:

\begin{align}
\|A\|_F & = \left(\sum_{i=1}^m \sum_{j=1}^n A_{ij}^2 \right)^{1/2}, &\quad{\textrm{the matrix two-norm or Euclidean or Frobenius norm}}\\
\|A\|_{\infty} & = \max_{i=1}^m \sum_{j=1}^n|A_{i,j}|, &\quad{\textrm{the maximum absolute row-sum norm}}\\
\end{align}

Note that while the vector and matrix norms give different results, they are consistent or equivalent in that they are always within a constant factor of one another (a result that is true for finite-dimensional or discrete problems as here). This means we don't really need to worry too much about which norm we're using.

In [23]:
import numpy
from scipy import linalg
A=numpy.array([[10., 2., 1.],[6., 5., 4.],[1., 4., 7.]])
print(A)
print(linalg.norm(A))
print(linalg.norm(A,'fro'))        # the Frobenius norm - the default
print(linalg.norm(A,numpy.inf))    # the maximum absolute row-sum
print(linalg.norm(A,1))            # the maximum absolute column-sum
print(linalg.norm(A,2))            # the two-norm - note not the same as the Frobenius norm - also termed the spectral norm
print(numpy.sqrt(numpy.real((numpy.max(linalg.eigvals(numpy.dot(A.T,A)))))))

[[ 10.   2.   1.]
 [  6.   5.   4.]
 [  1.   4.   7.]]
15.748015748
15.748015748
15.0
17.0
13.7930910986
13.7930910986


### <span style="color:blue">Exercise: matrix norms</span>

Write some code to explicity compute the two matrix norms defined mathematically above and compare against the values found above using in-built scipy functions.

Based on the above code and comments, what is the mathematical definition of the 1-norm and the 2-norm?


### Matrix conditioning

The (ill-)conditioning of a matrix is measured with the matrix condition number:

$$\textrm{cond}(A) = \|A\|\|A^{-1}\|$$

If this is close to one then $A$ is well-conditioned, and it increases with the degree of ill-conditioning, reaching infinity for a singular matrix.

In [34]:
import numpy
from scipy import linalg
A=numpy.array([[10., 2., 1.],[6., 5., 4.],[1., 4., 7.]])
print(A)
print(numpy.linalg.cond(A))
print(linalg.norm(A,2)*linalg.norm(linalg.inv(A),2))  # so the default condition number uses the matrix two-norm
print(numpy.linalg.cond(A,'fro'))
print(linalg.norm(A,'fro')*linalg.norm(linalg.inv(A),'fro'))

[[ 10.   2.   1.]
 [  6.   5.   4.]
 [  1.   4.   7.]]
10.7133718813
10.7133718813
12.4636165619
12.4636165619


The condition number is expensive to compute, and so in practice the size of the determinant of the matrix is gauges based on the magnitude of the entries of the matrix.

We know that a singular matrix does not result in a unique solution to a corresponding linear matrix system. But what are the consequences of near-singularity (ill-conditioning)?



## Direct vs iterative methods

Two types/families of methods exist to solve matrix systems.  These are termed *direct* methods and *iterative* (or *indirect*) methods.

Direct methods perform operations on the linear equations (the matrix system), e.g. the substitution of one eqution into another which we performed last week for your example $2\times 2$ system considered in MM1. This (and the subsequent Gaussian elimination algorithm) transformed the equations making up the linear system into equivalent ones with the aim of eliminating unknowns from some of the equations and hence allowing for easy solution through back (or forward) substitution.

In MM1 you learnt Cramer's rule which gives an explicit formula for the inverse of a matrix, or for the solution of a linear matrix system.  It was pointed out that the computational cost (in terms of arithmetic operations required; also termed complexity) scaled like $(n+1)!$, whereas the Gaussian elimination (which is basically the susbtitution method done above) scaled like $n^3$.  For large $n$ Gaussian elimination will clearly be more efficient - you considered the case where $n=100$ in MM1 for example. $n$ here refers to the number of unknowns or equations, or sometimes termed the *degrees of freedom* of the problem.

However, as pointed out above $n$ could be billions for hard-core applications such as in weather forecasting. In this case the $n^3$ operations required of a direct algorithm such as Gaussian elimination is also prohibitive. 

In order to reduce this cost, ideally to a level that is (close to) linear in $n$, *iterative* algorithms were devised. 

These start with a guess at the solution ($\pmb{x}_0$), they calculate the residual vector ($A\pmb{x}_0 - \pmb{b}$), and its norm (a scalar measure of a vector's size - e.g. the vector *2-norm* is just the square root of the sum of the squares of the components) which will obviously not be zero unless you were very lucky with your initial guess, and then *iteratively* seek to improve on this solution to drive down this residual norm.  This iteration will stop at some small (non-zero) residual norm tolerance level, yielding an approximation to the solution, but not the exact solution we would obtain with direct methods.  The residual norm tolerance stopping criteria therefore needs to be thought about carefully, e.g. depending on how accurate a solution $\pmb{x}$ we require.

Last week we considered Gaussian elimination (and back substitution) as examples of direct solution methods. We'll look briefly as iterative methods this week.