# PHYS 325 Scientific Computing -- Fall 2018

# 2. Numerical methods

*Acknowledgements:* My lecture notes for this chapter draw from the "Numerical Mathematics for Physicists" course taught by [Martin Kerscher](https://homepages.physik.uni-muenchen.de/~Martin.Kerscher/), as well as lectures by [Lode Pollet](https://www.theorie.physik.uni-muenchen.de/lsschollwoeck/members/professors/pollet/) and the "Numerical Recipes" books.

## 2.1 Linear algebra

System of linear equations:

$$
\begin{array}{cl}
  a_{11} x_1 + a_{12} x_2 + \cdots + a_{1N} x_N & = b_1 \\
  \vdots & \vdots \\
  a_{M1} x_1 + a_{M2} x_2 + \cdots + a_{MN} x_N & = b_M
\end{array}
$$

$M$ equations für $N$ unknowns

- numerics: typically thousands of variables
- for linear problems or linear approximations to other problems
- needed for many numerical setups (for example splines etc.)
- many highly optimized libraries available

Matrix notation:

$$
A  \mathbf{x} = \mathbf{b} 
$$

where $A \in \mathbb{R}^{M \times N}; \quad \mathbf{x} \in \mathbb{R}^N; \quad \mathbf{b}  \in \mathbb{R}^M $:

$$
\begin{pmatrix}
  a_{11} & \cdots & a_{1N} \\ 
  \vdots &  & \vdots \\ 
  a_{M1} & \cdots & a_{MN}
\end{pmatrix} 
\begin{pmatrix}
  x_{1} \\ 
  \vdots  \\ 
  x_{N} 
\end{pmatrix} 
=
\begin{pmatrix}
  b_{1} \\ 
  \vdots  \\ 
  b_{M} 
\end{pmatrix} 
$$

How to solve a system of linear equations with Gaussian elimination (reminder):

$$
\begin{pmatrix}
  1 & 5 & 7 \\
  3 & 0 & 4 \\
  7 & 5 & 5 
\end{pmatrix}
\begin{pmatrix}
  x_1\\ x_2\\ x_3
\end{pmatrix}
=
\begin{pmatrix}
  1\\ 2\\ 3
\end{pmatrix}
$$

write as

$$
\Big( A^{(0)}|\mathbf{b}^{(0)}\Big)= 
\left(
  \begin{array}{ccc|c}
    1 & 5 & 7  & 1\\
    3 & 0 & 4  & 2\\
    7 & 5 & 5  & 3
  \end{array}
\right)
$$

We are allowed to:
- swap rows
- multiply a row by a (non-zero) constant
- add a multiple of one row to another row

Transform column by column until we arrive at upper triangular matrix:

$$
\begin{align*}
  \mathbf{II}^{(1)} &= \mathbf{II}^{(0)}-\tfrac{3}{1}\ \mathbf{I}^{(0)}\\
  \mathbf{III}^{(1)} &= \mathbf{III}^{(0)}-\tfrac{7}{1}\ \mathbf{I}^{(0)}\\
\end{align*}
$$

implies

$$
\Big(A^{(1)}|\mathbf{b}^{(1)}\Big)= 
\left(
  \begin{array}{ccc|c}
    1 & 5   & 7    & \ \ 1\\
    0 & -15 & -17  & -1\\
    0 & -30 & -44  & -4
  \end{array}
\right). 
$$

Next step:

$$
\begin{align*}
  \mathbf{III}^{(2)} &= \mathbf{III}^{(1)}-\tfrac{30}{15}\ \mathbf{II}^{(1)}\\
\end{align*}
$$

implies

$$
\Big(A^{(2)}|\mathbf{b}^{(2)}\Big)= 
\left(
  \begin{array}{ccc|c}
    1 & 5 & 7      & \ \ 1\\
    0 & -15 & -17  & -1\\
    0 &   0 & -10  & -2
  \end{array}
\right)
$$

This is equivalent to

$$
\begin{array}{llll}
x_1 & + 5\ x_2  & +7\ x_3  & = \ \ 1 \\
    & - 15\ x_2 & -17\ x_3 & =-1 \\
    &          & -10\ x_3 & =-2
\end{array} .
$$

Now it is easy to solve for $\mathbf{x}$:

$$
x_3 = \tfrac{1}{5}, \
x_2 = -\tfrac{4}{25}, \ 
x_1 = \tfrac{2}{5} .
$$

<br>
<tr>
<td><img src="images/infmanysolutions.png" alt="Infinitely many solutions" align="right"  style="width: 200px;"/></td>
<td><img src="images/uniquesolution.png" alt="Infinitely many solutions" align="right"  style="width: 200px;"/></td>
<td><img src="images/nosolution.png" alt="Unique solution" align="right" style="width: 200px;"/></td>
</tr>


Classification
- no solution
- unique solution (rank($A$)=$N$)
- infinitely many solutions

<br><br><br><br><br><br>
For quadratic matrices:

<center>$A$ is invertible</center>
$$\Leftrightarrow$$
<center>rank($A$)=$N$ </center>
$$\Leftrightarrow$$
<center>det$(A)\neq0$</center>
$$\Leftrightarrow$$
<center>$A\mathbf{x}=0$ has only the solution $\mathbf{x}=0$ </center>

<br>
Formally

$$A^{-1}\mathbf{b}=\mathbf{x}$$

so solving a system of linear equations is related to inverting a matrix!

> never use a direct implementation of matrix inversion => slow and numerically unstable

(but you can do the opposite: solve $A\mathbf{x}_j=\mathbf{e}_j$ for each unit vector $\mathbf{e}_j$ to obtain the inverse)

Tasks of Computational Linear Algebra:
- solving sets of linear equations
- matrix determinants
- matrix inversion
- singular value decomposition of a matrix
- linear least squares => see section on data fitting

Numerical problems:
- equations may formally have a unique solution, but some of the equations may be close to being linearly dependent<br> => roundoff errors in the solution process can make them linearly dependent <br>=> failure of the solution procedure
- large $N$: roundoff errors may swamp the true solution <br>=> numerical instability <br>=> wrong solution (need to check!)

Example:

$$
  \begin{pmatrix}
    10 & 7 & 8 & 7 \\
    7 & 5 & 6 & 5 \\
    8 & 6 & 10 & 9 \\
    7 & 5 & 9 & 10\\
  \end{pmatrix}
  \begin{pmatrix}
    x_1\\ x_2\\ x_3\\ x_4
  \end{pmatrix} = 
  \begin{pmatrix}
    32\\ 23\\ 33\\ 31
  \end{pmatrix}\Rightarrow
  \mathbf{x}=  \begin{pmatrix}
    1\\ 1\\ 1\\ 1
  \end{pmatrix}
$$

Very similar system:

$$
  \begin{pmatrix}
    10 & 7 & 8 & 7 \\
    7 & 5 & 6 & 5 \\
    8 & 6 & 10 & 9 \\
    7 & 5 & 9 & 10\\
  \end{pmatrix}
  \begin{pmatrix}
    x_1\\ x_2\\ x_3\\ x_4
  \end{pmatrix} = 
  \begin{pmatrix}
    32.1\\ 22.9\\ 33.1\\ 30.9
  \end{pmatrix}\Rightarrow
  \mathbf{x}=  \begin{pmatrix}
    9.2\\ -12.6\\ 4.5\\ -1.1
  \end{pmatrix}
$$

- a relative error of $0.1/23 \approx 0.00434$ in $\mathbf{b}$ results in a relative error of $12.6/1$ on the result
- relative error enhancement factor $\approx3000$.
- general problem in systems of linear equations, independent of numerical method

This problem can be quantified: **condition number**

$$\kappa (A) = \Vert A^{-1} \Vert \Vert A \Vert$$

Here $\Vert A \Vert$ is a **matrix norm**. There are many matrix norms:

$$
\begin{array}{ll}
\Vert A \Vert _{\rm rows} &:= \max_i \left( \sum_k \vert a_{ik} \vert \right)\\  
\Vert A \Vert _{\rm cols} &:= \max_k \left( \sum_i \vert a_{ik} \vert \right) \\  
\Vert A \Vert _{\rm Frobenius} &:= \left( \sum_i \sum_k a_{ik}^2 \right) ^{\frac{1}{2}}
\end{array}
$$

resulting in different condition numbers => order of magnitude matters

Relative error on the result:

$$\frac{\Vert \delta_\mathbf{x} \Vert}{\Vert \mathbf{x} \Vert} \lesssim \kappa(A) \left( \frac{\Vert \Delta_A \Vert}{\Vert A \Vert} 
+ \frac{\Vert \Delta_\mathbf{b} \Vert}{\Vert \mathbf{b} \Vert} \right)$$

Using the [NumPy Linear Algebra submodule](https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.linalg.html)

In [1]:
import numpy as np
a = np.array([10, 7, 8, 7, 7, 5, 6, 5, 8, 6, 10, 9, 7, 5, 9, 10])
b = a.reshape((4, 4))
b

array([[10,  7,  8,  7],
       [ 7,  5,  6,  5],
       [ 8,  6, 10,  9],
       [ 7,  5,  9, 10]])

In [2]:
from numpy import linalg as LA
LA.norm(b)          # Frobenius norm by default

30.54504869860253

In [3]:
LA.norm(b)*LA.norm(LA.inv(b))

3009.578708058694

In [4]:
LA.cond(b,'fro')    # condition number directly, specifying Frobenius norm

3009.578708058694

Gaussian elimination step by step:

$\ $ | $\ $  | $\ $ | $\ $ | $\ $
---|---|----|----|----
![start](images/Kap4Gauss0.png) | $\longrightarrow$ | ![step 1](images/Kap4Gauss1.png) | $\longrightarrow$ |![step 2](images/Kap4Gauss2.png) 

Triangular form:

![triangular](images/Kap4GaussN.png)

Step $(n-1)\rightarrow(n)$ explicitly:
- the first $n$ rows are unchanged:

  $$
  a_{ij}^{(n)} = a_{ij}^{(n-1)},\ b_i^{(n)} = b_i^{(n-1)}, 
  \quad \text{for } i=1,\ldots,n, \ j=1,\ldots,N
  $$
- the first $n$ columns are unchanged, too
- for the rest:

  $$
  \begin{align*} 
    a_{in}^{(n)} & = 0, \\
    a_{ik}^{(n)} & = a_{ik}^{(n-1)} - l_{in} a_{nk}^{(n-1)}, \
    \qquad \text{for } i,k = n + 1 , \ldots , N \\
    b_i^{(n)} & = b_i^{(n-1)} - l_{in} b_n^{(n-1)}.
  \end{align*}
  $$
  where $$l_{in} = \frac{a_{in}^{(n-1)}}{a_{nn}^{(n-1)}}$$ and under the condition $a_{nn}^{(n-1)}\neq0$.

> careful not to divide by zero or a "small" number!

Complexity of Guassian elimination:

- step $n$ of the process
     - $N-n$ divisions for $l_{in}$
     - $2(N-n)^2$ additions and multiplications for $a_{ik}^{(n)}$
     - $2(N-n)$ additions and multiplications for $b_i^{(n)}$
     
  Altogether ($i=N-n$):
  
  $$
  \begin{align*}
  \#\text{FLOPs} & = 3 \sum_{n = 1}^{N-1} (N - n) + 2 \sum_{n = 1}^{N-1}(N - n)^2 \\
  & = 3 \left[ (N-1)N - \sum_{n = 1}^{N-1} n \right] + 2 \sum_{i = N-1}^{1}i^2 \\
  & = 3 \left[ (N-1)N - \tfrac{1}{2} (N-1)N \right] + \tfrac{2}{6}(N-1)N(2(N-1)+1)\\
  & = \frac{2}{3} N^3 - \frac{1}{6} N^2 -\frac{1}{2} N 
  \end{align*}
  $$
- solving for each $x_i$ in the end
    - $2(N-(i+1))$ additions and multiplications in the sum
    - one multiplication and one addition outside the sum
    - one multiplication for $x_N$
    
  Altogether:
  
  $$
  \begin{align*}
  \#\text{FLOPs} & =  1 + \sum_{i = 1}^{N-1} ( 2(N-i-1)\ +\ 2) \\
  & = 1 + 2 \sum_{i = 1}^{N-1}  (N - i)\ = 1 + 2N(N-1) - \frac{2}{2} N(N-1) = \\
  & =  N^2 -N + 1
  \end{align*}
  $$
- memory and element access is also very important for matrix operations<br>
  => beyond the scope of this lecture

In total:

$$
\#\text{FLOPs} = \frac{2}{3} N^3 + \frac{5}{6} N^2 - \frac{3}{2} N + 1 =\mathcal{O}(N^3)
$$

### LU decomposition

$$
  A = L U 
  =
  \begin{pmatrix}
    1 &   &  &  0 \\ 
    l_{21} & 1 &  &  \\ 
    \vdots &  \ddots & \ddots &  \\ 
    l_{N1} & \cdots & l_{NN-1} & 1
  \end{pmatrix}
  \begin{pmatrix}
    u_{11} & u_{12} & \cdots & u_{1N} \\ 
    & u_{22} &  & \vdots \\ 
    &  & \ddots & \vdots \\ 
    0&  &  & u_{NN}
  \end{pmatrix}
$$

Solving a set of linear equations:
- first solve $L \mathbf{y} = \mathbf{b}$ for $\mathbf{y}=U \mathbf{x}$ 
- then solve $U \mathbf{x} = \mathbf{y}$ for $\mathbf{x}$ as before
- if we *already have* the LU decomposition we need $2(N^2-N+1)$ FLOPs to solve the set of equations for *any* $\mathbf{b}$, so $\mathcal{O}(N^2)$
- we get the determinant (almost) for free:

  $$\det(A) = \det(LU) = \det(L)\det(U) = \prod_{i=1}^{N}u_{ii}$$

How to compute the LU decomposition:
- same steps as Gaussian elimination for $U$
- we have already computed the elements of $L$ along the way

$$
L = \begin{pmatrix}
  1 &  &  &  \\
  l_{21} & \ddots & \emptyset &  \\
  \vdots & \ddots & \ddots &  \\
  l_{N1} & \cdots & l_{NN-1} & 1
\end{pmatrix}
$$

### Pivoting

- so far we have always assumed $a_{nn}^{(n-1)}\neq0$:

  $$
  a_{ik}^{(n)} = a_{ik}^{(n-1)} - l_{in} a_{nk}^{(n-1)}  \text{ with } \ 
  l_{in} = \frac{a_{in}^{(n-1)}}{a_{nn}^{(n-1)}} .
  $$
- if this is not the case => swap rows!
- swapping rows can be described by a permutation matrix $P$, for example:

  $$
  \begin{pmatrix}
    0&1&0\\
    0&0&1\\
    1&0&0
  \end{pmatrix}
  \begin{pmatrix}x_1\\x_2\\x_3\end{pmatrix}
  =
  \begin{pmatrix}x_2\\x_3\\x_1\end{pmatrix}
  $$
- then we need to solve $PA\mathbf{x}=P\mathbf{b}$ and decompose $PA=LU$

The matrix element $a_{nn}^{(n-1)}$ is called **pivot element**
- even though we can choose $a_{nn}^{(n-1)} \neq 0$ it can happen that $0<\vert a_{nn}^{(n-1)} \vert \ll 1$ <br>=> numerical problems
- subtractions in every step, so we can lose significant digits, e.g. in

  $$
  a_{ik}^{(1)}=a_{ik}^{(0)}-l_{i1} a_{1k}^{(0)}
  $$
- for example, if we have $|a_{nn}^{(1)}|\ll1$ then $l_{in}$ is very large and the roundoff error from before is amplified in

  $$
  a_{ik}^{(2)} = a_{ik}^{(1)} -
  \overset{\text{roundoff error}}{\underset{\text{very large}}
    {\underset{\uparrow}{l_{i2}\ } \overset{\downarrow}{a_{2k}^{(1)}}}}
  $$

=> pivoting is **essential** for numerical stability!

**Column maximization strategy**
- in every step of Gaussian elimination swap the remaining rows until

  $$
  \vert a_{nn}^{(n-1)} \vert = \max_{i \geq n} \vert a_{in}^{(n-1)} \vert 
  $$
- further improvements on column maximization possible
- book-keeping over raw swaps in terms of the permutation matrix

Extreme example:

$$
A = 
\begin{pmatrix}
  \epsilon & 1 \\
  1        & 1 
\end{pmatrix}
= L U =
\begin{pmatrix}
  1 & 0 \\
  \epsilon^{-1} & 1 
\end{pmatrix}
\begin{pmatrix}
  \epsilon & 1 \\
   0 & 1 - \epsilon^{-1}
\end{pmatrix}
$$

After pivoting:

$$
PA = A' = 
\begin{pmatrix}
  1        & 1 \\
  \epsilon & 1 
\end{pmatrix}
= L' U' =
\begin{pmatrix}
  1 & 0 \\
  \epsilon & 1 
\end{pmatrix}
\begin{pmatrix}
   1 & 1 \\
   0 & 1 - \epsilon
\end{pmatrix} .
$$

Assume $\epsilon<\epsilon_m$ (machine precision), then

$$1-\epsilon^{-1}\rightarrow-\epsilon^{-1}$$
and the component $u_{22}$ of the matrix $U$ is not exact. Instead of $LU$ we get

$$
\begin{pmatrix}
  1 & 0 \\
  \epsilon^{-1} & 1 
\end{pmatrix}
\begin{pmatrix}
   1 & 1 \\
   0 & -\epsilon^{-1}
\end{pmatrix} 
=
\begin{pmatrix}
  \epsilon & 1 \\
  1 & 0 
\end{pmatrix}
\ne A.
$$

**Special cases**:
- band diagonal matrices (especially tri-diagonal => spline interpolation)
    - only need to save diagonals (memory saving)
    - not solved with LU decomposition, but for each $\mathbf{b}$ with Gaussian elimination ($\mathcal{O}(N)$) for this special case
    - usually no pivoting necessary
- symmetric matrices => Cholesky decomposition
    - similar to LU decomposition
    - no pivoting necessary
    - also some memory saving
- existence and uniqueness of LU decomposition
    - LU decompositions are (in general) not unique
    - LU decompositions do not always exist
    - a square matrix always has an LU decomposition **with pivoting** (LUP decomposition)

### Matrix inversion

> essentially the same as solving sets of linear equations

Assume $\det(A) \neq 0$ and we have a way to solve a system of linear equations $A \mathbf{x} =\mathbf{b}$ (e.g. LU decomposition)

To obtain $A^{-1}$:
- solve the linear equation $A \mathbf{x}_i = \mathbf{e}_i$ for all unit vectors $\mathbf{e}_i$ and obtain the $\mathbf{x}_i$
- then

  $$
  \mathbf{x}_i = A^{-1} \mathbf{e}_i = 
  \begin{pmatrix}
    (a^{-1})_{11} & \cdots & (a^{-1})_{1N} \\
    &&\\ 
    \vdots &  & \vdots \\ 
    &&\\
    (a^{-1})_{N1} & \cdots & (a^{-1})_{NN}
  \end{pmatrix}  
  \begin{pmatrix}
    0 \\ 
    \vdots \\ 
    1 \\ 
    \vdots \\ 
    0
  \end{pmatrix} =
  \begin{pmatrix}
    (a^{-1})_{1i} \\ 
    \\ 
    \vdots \\ 
    \\ 
    (a^{-1})_{Ni}
  \end{pmatrix}  
  $$
  
  meaning that the $\mathbf{x}_i$ are the column vectors of $A^{-1}$:
  
  $$
  A^{-1} = ( \mathbf{x}_1 , \mathbf{x}_2 , \ldots , \mathbf{x}_N ) .
  $$

### Iterative improvement

- let $\mathbf{x}$ be the exact solution of $A \mathbf{x} = \mathbf{b}$ (which we don't know)
- let $\hat{\mathbf{x}} = \mathbf{x} +\delta \mathbf{x}$ be the numerical (non-exact) solution  

Then

$$ 
A \hat{\mathbf{x}} = A \mathbf{x} + A \delta \mathbf{x} = \mathbf{b} + \mathbf{r} 
$$

with the **residual**

$$
\mathbf{r} = A \delta \mathbf{x} = A \hat{\mathbf{x}} - \mathbf{b} 
$$

After calculating the LU decomposition with a **direct method** we know
- $A$
- $LU$ (not exactly the same as $A$ because of numerical errors!)
- $\mathbf{b}$
- $\hat{\mathbf{x}}$ (our numerical non-exact solution)  

Algorithm for iterative improvement:

- compute $\mathbf{r} = A \hat{\mathbf{x}} - \mathbf{b}$
- calculate $\delta \mathbf{x}$ as solution of $A \delta \mathbf{x} = \mathbf{r}$
  (since we already know $A = LU$ this is $\mathcal{O}(N^2)$)
- compute the improved solution

  $$ 
  \mathbf{x}_{\mathrm{new}} = \hat{\mathbf{x}} - \delta \mathbf{x}
  $$
  
- this can be repeated
- (there are also iterative methods to improve the LU decomposition itself => not covered here)

Additional cost:

- the residue $\mathbf{r}=A \hat{\mathbf{x}} - \mathbf{b}$ has to be calculated
- both $A$ and $LU$ have to be kept in memory
- additional computation time $\mathcal{O}(N^2)$ per iteration step is negligible compared to the $\mathcal{O}(N^3)$ of the direct method (for sufficiently large matrices)
- iterative methods are useful for very large $N$ and if we can already find a good approximation of $\mathbf{x}$ with  $k \ll N$ iterations (only $k N^2 \ll N^3$ FLOPs)


### Sherman-Morrison formula

- suppose you already have obtained $A^{-1}$ of a square matrix $A$, after $\mathcal{O}(N^3)$ operations
- now you want to make a small change in $A$ (change one element, or one row, or one column)

Do you need to start over and spend another $\mathcal{O}(N^3)$ operations?

> No, if the change in the matrix is of the form

$$
A'=A+u\otimes v,
$$

where $u\otimes v$ is a matrix whose ($i,j$) element is the product $u_iv_j$:

$$
u\otimes v = \begin{pmatrix}u_1\\u_2\\ \vdots\\ u_N \end{pmatrix}\begin{pmatrix}v_1& v_2& \ldots&v_N \end{pmatrix}=
\begin{pmatrix} u_1v_1 & \ldots & u_1v_N\\
\vdots & & \vdots\\
u_Nv_1 & \ldots & u_Nv_N
\end{pmatrix}
$$

Sherman-Morrison formula:

$$
(A+u\otimes v)^{-1}=A^{-1}-\frac{(A^{-1}u)\otimes(v\cdot A^{-1})}{1+v\cdot A^{-1}u}
$$

Derivation (see "Numerical Recipes" books):

$$
\begin{array}{lll}
(A+u\otimes v)^{-1}	&=& (1+A^{-1}\cdot u\otimes v)^{-1}\cdot A^{-1}\\
			&=& (1-A^{-1}\cdot u\otimes v+A^{-1}\cdot u\otimes v\cdot A^{-1}\cdot u\otimes v\mp\ldots)\cdot A^{-1}\\
			&=& A^{-1}-A^{-1}\cdot u\otimes v\cdot A^{-1}+A^{-1}\cdot u\otimes v\cdot A^{-1}\cdot u\otimes v\cdot A^{-1}\mp\ldots\\
			&=& A^{-1}-A^{-1}\cdot u\otimes v\cdot A^{-1}(1-\Lambda+\Lambda^2\mp\ldots)\\
			&=& A^{-1}-\frac{(A^{-1}\cdot u)\otimes(v\cdot A^{-1})}{1+\Lambda},
\end{array}
$$

where 
- $\Lambda\equiv v\cdot(A^{-1}u)$;
- Taylor expansion was used in the second line;
- the scalars $\Lambda$ were factored out in line 4;
- the series was written as $(1+\Lambda)^{-1}$ using Taylor expansion in the last line.

Complexity: $3N^2$ multiplications and $3N^2$ additions (even faster if $\mathbf{u}$ or $\mathbf{v}$ are unit vectors)

Uses:
- classes of sparse matrices
- changing matrices one row/column at a time (do not use if you need to change *every* row!)
- adding or removing one row/column

### Inversion by partitioning

Inverse of a partitioned square matrix $B$,

$$
B=\left(\begin{array}{cc}P&Q\\R&S\end{array}\right),
$$

has the form

$$
B^{-1}=\left(\begin{array}{cc}\tilde{{P}}&\tilde{{Q}}\\\tilde{{R}}&\tilde{{S}}\end{array}\right),
$$

where

$$
\begin{array}{lll}
\tilde{P}&=&P^{-1}+({P}^{-1}\cdot{Q})\cdot({S}-{R}\cdot{P}^{-1}\cdot Q)^{-1}\cdot({R}\cdot P^{-1}),\\
\tilde{Q}&=&-({P}^{-1}\cdot{Q})\cdot({S}-{R}\cdot{P}^{-1}\cdot{Q})^{-1},\\
\tilde{R}&=&-({S}-{R}\cdot{P}^{-1}\cdot{Q})^{-1}\cdot({R}\cdot{P}^{-1}),\\
\tilde{S}&=&({S}-{R}\cdot{P}^{-1}\cdot Q)^{-1}.
\end{array}
$$

(check by multiplying $B$ with its inverse)

Determinant of a partitioned matrix:

$$
\det B=\det{P}\det({S}-{R}\cdot{P}^{-1}\cdot{Q})=\det{S}\det({P}-{Q}\cdot{S}^{-1}\cdot{R})
$$

### QR decomposition

$$
A = Q\cdot R
$$

where $R$ is upper triangular and $Q$ is orthogonal: $Q^T\cdot Q=\mathbf{1}$

- used to solve linear least squares => later in the lecture
- basis of QR eigenvalue algorithm

Computed via

- Gram-Schmidt orthonormalization (numerically unstable)
- Householder transformations (mirror transformations)
- Givens rotations (more complicated to implement, but well parallelizable)

Can be used to solve linear equations, but needs twice as many operations as LU decomposition. However:

- LU decomposition is hard to update (=> Sherman-Morrison) because of pivoting
- QR decomposition on the other hand can be updated with a type of Sherman-Morrison formula
- when solving many similar linear systems QR decomposition can be better

### Singular Value Decomposition (SVD)

- powerful method for singular or near-singular systems
- can "diagnose" problems with LU decomposition
- needed for data analysis (linear least-squares) => later in the lecture
- useful for "inverting" non-square matrices, determining the matrix rank etc.

Any $M\times N$ matrix $A$ (no matter how singular!) can be written as

$$A=UDV^T$$

with 
- an $M\times N$ column orthogonal matrix $U$,
- an $N\times N$ diagonal matrix $D$ (diagonal elements $d_{i}\geq0$ are the **singular values**)
- an $N\times N$ orthogonal matrix $V^T$

Schematically

$$
\begin{pmatrix}
& & & & \\
& &   & & \\
& & {A}   & & \\
 & &  & & \\
& &   & & \\
& &   & & \\
& &   & &
\end{pmatrix} = 
\begin{pmatrix}
& &  & & \\
& &   & & \\
& & {U}  & & \\
 & &   & & \\
& &   & & \\
& &   & & \\
& &   & &
\end{pmatrix}
\begin{pmatrix}
\!d_1\! &  & & \\
 &\!d_2\!  & & \\
 &   &\!\ddots\! & \\
  &   & &\!d_N\!
\end{pmatrix}
\begin{pmatrix}
& &  & & \\
& &\!{V}^T\!  & & \\
& &   & & \\
 & &   & & 
\end{pmatrix}
$$

Orthogonality conditions, schematically:

$$
\begin{pmatrix}
& & &  & & & & & \\
& & &  & {U}^T & & & \\
 & & & & & & & & \\
& &  & & & &
\end{pmatrix}\cdot\begin{pmatrix}
& &  & & \\
& &   & & \\
& & {U}  & & \\
 & &   & & \\
& &   & & \\
& &   & & \\
& &   & &
\end{pmatrix}=\begin{pmatrix}
& &  & & \\
& &\!{V}^T\! & & \\
& &   & & \\
 & &   & & 
\end{pmatrix}\cdot\begin{pmatrix}
& &  & & \\
& & {V}  & & \\
& &   & & \\
 & &   & & 
\end{pmatrix}=
\begin{pmatrix}
\! 1&  & & \\
 & \!1  & & \\
 &   &\! \ddots& \\
 &   & &\!1 
\end{pmatrix}
$$

Visualization of SVD in 2D:
- start from disc with 2 unit vectors
- in this example, the original matrix distorts the disc to an ellipse (a matrix is a *linear transformation*)
- SVD decomposes the matrix into three simple transformations: 
     - an initial rotation $V^∗$ (in our notation this is $V^T$), 
     - a scaling $\Sigma$ along the coordinate axes (in our notation this is $D$)
     - a final rotation U
- The lengths $\sigma_1$ and $\sigma_2$ of the semi-axes of the ellipse are the singular values

![SVD illustration](images/Singular_value_decomposition.gif)

Image from [Wikipedia](https://en.wikipedia.org/wiki/Singular-value_decomposition), their notation is $M=U\Sigma V^*$ instead of $A=UDV^T$ and $\sigma_i$ instead of $d_i$

<img src="images/Singular_value_decomposition.png" alt="SVD illustration" align="center"  style="width: 500px;"/>

**SVD**
- can be done for matrices of any shape
- is almost unique
    - up to making the same permutation of the columns of $U$, elements of $D$, and columns of $V$
    - up to forming linear combinations of any columns of $U$ and $V$ whose corresponding elements of $D$ happen to be equal
- is very stable numerically
- allows us to easily pick a "representative" solution (the one with the smallest length) when there are infinitely many<br>=> needed for data analysis!
- allows us to find an "almost" solution when there are none<br>=> needed for data analysis!
- we will not discuss the algorithm, but it is part of all major libraries

**Matrix inversion (of a square matrix) with SVD**:
- for square matrices, $U$ is also square; $U$, $D$, $V$ have the same size
- inverse becomes easy:

$$
A = U\cdot[{\rm diag}(d_i)]\cdot V^T \ \ \Rightarrow\ \ A^{-1}=V\cdot[{\rm diag}(1/d_i)]\cdot U^T
$$

- solution of linear equations becomes easy:

$$A\mathbf{x}=\mathbf{b}\Rightarrow \mathbf{x}=V\cdot[{\rm diag}(1/d_i)]\cdot (U^T\mathbf{b})$$

- problems if some of the $d_i$ are zero (or close to zero...)
- the *condition number* (see previous lecture) is defined as: max$(|d_i|)/$min$(|d_i|)$
     - condition number infinite => matrix singular
     - condition number too large ($\gtrsim 1/\varepsilon_m$) => matrix ill-conditioned
     - in these cases we set (somewhat paradoxically) $1/d_i=0$ (remove singular values)

A nonsingular matrix $A$ maps a vector space into one of the same dimension:

![non-singular matrices](images/NRSVD_nonsing.png)

**Singular matrices**:
- if A is singular, then there is a subspace of $\mathbf{x}$ (**nullspace**) so that $A\mathbf{x}=0$
- there is also some subspace of $\mathbf{b}$ (**range of $A$**) that can be reached by $A$: i.e. there exists $\mathbf{x}$ such that $A\mathbf{x}=\mathbf{b}$
- the dimension of the range is the **rank** of $A$ (remember that for non-singular matrices the rank is $N$)
- for singular matrices the rank is smaller than $N$ and the nullspace has dimension greater than zero

> SVD explicitly constructs orthonormal bases for the *nullspace* and *range* of the matrix

- columns of $U$ belonging to $d_i\neq0$ span the range (orthonormal basis)
- columns of $V$ belonging to $d_i=0$ span the nullspace (orthonormal basis)
- solutions of $A\mathbf{x}=0$ are defined by the nullspace basis => read out from $V$
- solutions of $A\mathbf{x}=\mathbf{b}\neq0$:
    - if $\mathbf{b}$ is not in the range of $A$ => no solution
    - if $\mathbf{b}$ is in the range of $A$ => solution exists (and any linear combination of nullspace vectors can be added to it)
    
A singular matrix $A$ maps a vector space into one of lower dimensionality (the range of $A$): 
    
![SVD of singular matrices](images/NRSVD_sing.png)
    
- the nullspace of $A$ is mapped to zero
- the solutions of $A\cdot\mathbf{x} = \mathbf{d}$ consist of any one particular solution plus any vector in the nullspace
- SVD selects the particular solution closest to zero
- the point $\mathbf{c}$ lies outside of the range of $A$ ($A\cdot\mathbf{x} = \mathbf{c}$ has no solution)
- SVD finds the best "compromise solution", namely a solution of $A\cdot\mathbf{x} = \mathbf{c}'$ => see "linear least-squares" later

### Column and row major

$$A=\begin{pmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{pmatrix}$$

How is a matrix stored in computer memory?

- row major (C, C++, Python): $\ \ \ \ $ contiguously in memory as ```1 2 3 4 5 6```
- column major (Fortran, Matlab): contiguously in memory as ```1 4 2 5 3 6```

(this generalizes to higher dimensions)

### Linear algebra libraries

Equally important:
- small number of operations
- small memory requirement
- clever memory read-out

A basis for many modern libraries is the Basic Linear Algebra Subsystem ([BLAS](http://www.netlib.org/blas/)):
- 3 levels:
    - level 1: scalar, scalar-vector and vector-vector
    - level 2: matrix-vector
    - level 3: matrix-matrix
- huge increase in speed possible
- free implementations: MKL, ATLAS,...
- there are also machine specific optimized BLAS libraries (e.g. AMD: ACML, Apple: Accelerate, Intel: MKL)
- here is the [documentation](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms)

Some libraries and modules:
- [LAPACK](http://www.netlib.org/lapack/) uses BLAS, especially Level 3
- [GSL](https://www.gnu.org/software/gsl/) uses BLAS
- Mathematica, Matlab and Octave use BLAS
- NumPy and [SciPy](https://docs.scipy.org/doc/scipy/reference/linalg.html) use BLAS
- you can also call BLAS functions in SciPy directly

```scipy.linalg``` vs ```numpy.linalg```:
- ```scipy.linalg``` contains all functions in ```numpy.linalg``` plus some more advanced ones
- ```scipy.linalg``` is always compiled with BLAS/LAPACK support, while for NumPy this is optional<br> => SciPy might be faster depending on how your NumPy was installed

> Use ```scipy.linalg``` unless you don't want to import ```scipy```

Implementation on GPUs (graphics processing units)
- signficant improvement in GPUs, thanks to the gaming industry
- multicore GPUs are now affordable
- massively parallel
- libraries specific for GPUs (cuBLAS, Magma)

In [2]:
import scipy
import scipy.linalg as LA

myMatrix = scipy.array([[1., 2.], [3., 4.]])
myInvMatrix = LA.inv(myMatrix)
print(myInvMatrix)

[[-2.   1. ]
 [ 1.5 -0.5]]


In [3]:
print(scipy.dot(myMatrix, myInvMatrix))

[[1.0000000e+00 0.0000000e+00]
 [8.8817842e-16 1.0000000e+00]]


In [4]:
scipy.around(scipy.dot(myMatrix, myInvMatrix))

array([[1., 0.],
       [0., 1.]])

In [5]:
LA.det(myMatrix)

-2.0

In [6]:
1./LA.det(myInvMatrix)

-2.0000000000000013

In [7]:
# Singular Value Decomposition
m, n = 9, 5
a = scipy.random.randn(m, n) + 1.j*scipy.random.randn(m, n)
a

array([[-0.85892947-0.41963151j, -1.36745188+0.15815086j,
         0.77412713+1.69980048j,  0.35429503-0.6534544j ,
         0.26025711+1.01148851j],
       [ 0.03757626-0.43064671j,  1.50268845-1.70888921j,
         0.03250573-0.96992801j,  2.01194838+0.55773684j,
        -0.58813105+0.41700299j],
       [-2.0848362 +0.7576141j , -1.19080974-0.90352191j,
         1.27829808+0.53009j   ,  2.12501411-1.32698354j,
        -0.97281342+0.43931866j],
       [ 1.86450351-0.36182246j, -0.96730906+0.30861901j,
         0.37569513-2.39960954j,  0.4237144 +0.57683613j,
         1.18512586+1.09576293j],
       [ 2.43577561+1.1520762j ,  0.55215916+1.08061376j,
        -0.25474526+1.57588748j, -1.13956064+1.17906452j,
        -0.82017069+0.59983885j],
       [ 1.42485067+2.48962682j, -0.22983815+1.1072615j ,
         2.00736635-1.47809547j,  0.06288505+0.22943667j,
        -2.47592698-0.82358964j],
       [-0.78732941-1.02811623j, -1.15486985+0.41721049j,
         0.47523286+0.30381578j,  1.549562

In [8]:
LA.svd(a)

(array([[-0.26122327+0.03686052j, -0.05687967+0.06268343j,
          0.08722056-0.39214653j, -0.13223121+0.00993957j,
         -0.0878773 -0.26310947j,  0.51077867-0.39078752j,
         -0.10246714+0.03899696j,  0.34383674-0.27022293j,
         -0.00433383+0.22455965j],
        [ 0.09383014-0.21199077j, -0.24661407-0.42603546j,
         -0.0633234 +0.08639314j,  0.21566015-0.24290874j,
          0.09656165-0.04956588j,  0.20333456+0.07707197j,
          0.18375041+0.08819359j, -0.28312972-0.60048707j,
         -0.20503882-0.05955802j],
        [-0.4042696 +0.12852727j, -0.40249632-0.27204704j,
         -0.01771479-0.08913282j,  0.06235376-0.07282562j,
          0.05485501-0.15947323j, -0.11779542+0.20481829j,
         -0.24041152+0.2363504j ,  0.01969783+0.08535063j,
          0.55305991-0.2347544j ],
        [ 0.33967575-0.01888453j, -0.2486293 +0.04152524j,
         -0.23070798-0.09108827j, -0.60598129+0.21021226j,
          0.13988484-0.19766965j, -0.12100528+0.34146871j,
         -

In [9]:
U, s, Vh = LA.svd(a)
U.shape,  s.shape, Vh.shape

((9, 9), (5,), (5, 5))

In [14]:
# Reconstruct the original matrix from the decomposition:
sigma = scipy.zeros((m, n))
for i in range(min(m, n)):
    sigma[i, i] = s[i]
a1 = scipy.dot(U, scipy.dot(sigma, Vh))
a1

array([[-0.85892947-0.41963151j, -1.36745188+0.15815086j,
         0.77412713+1.69980048j,  0.35429503-0.6534544j ,
         0.26025711+1.01148851j],
       [ 0.03757626-0.43064671j,  1.50268845-1.70888921j,
         0.03250573-0.96992801j,  2.01194838+0.55773684j,
        -0.58813105+0.41700299j],
       [-2.0848362 +0.7576141j , -1.19080974-0.90352191j,
         1.27829808+0.53009j   ,  2.12501411-1.32698354j,
        -0.97281342+0.43931866j],
       [ 1.86450351-0.36182246j, -0.96730906+0.30861901j,
         0.37569513-2.39960954j,  0.4237144 +0.57683613j,
         1.18512586+1.09576293j],
       [ 2.43577561+1.1520762j ,  0.55215916+1.08061376j,
        -0.25474526+1.57588748j, -1.13956064+1.17906452j,
        -0.82017069+0.59983885j],
       [ 1.42485067+2.48962682j, -0.22983815+1.1072615j ,
         2.00736635-1.47809547j,  0.06288505+0.22943667j,
        -2.47592698-0.82358964j],
       [-0.78732941-1.02811623j, -1.15486985+0.41721049j,
         0.47523286+0.30381578j,  1.549562

In [16]:
scipy.allclose(a, a1)

True