# Linear Systems - Iterative Methods

As for nonlinear methods, we will start from an educated guess $x^{(0)}$. We obtain the following result by multiplying the previous one by a matrix $B$ called **iteration matrix**, and summing a vector $g$ to it. Those two members should satisfy the relation

$$x = Bx +g$$

We define the error at step $k$ as

$$e^{(k)} = x - x^{(k)} \implies e^{(k + 1)} = Be^{(k)}$$

(implication derives from taking equation (2) and subtracting it from equation (1) (slide 2).

The fundamental (although not sufficent) condition for the matrix to reduce the error is that **B should have a spectral radius which is smaller than 1**, because being multiplied for the error this reduces it gradually. Also, if the spectral radius is closer to 1 from below, the convergence is slower.

$$\rho(B) = max |\lambda_i(B)| < 1$$

A general way of setting up an iterative method is based of the decomposition of A by using a **preconditioner** $P$ which satisfies 

$$A = P - (P - A)$$

Hence,

$$Ax = b \implies Px = (P - A)x + b$$

Usually, preconditioner are acting on some properties of the system (e.g. in fluid dynamics they depend on pressure) and are not known most of the time. We can translate B and g as

$$B = P^{-1}(P - A) = I - P^{-1}A$$
$$g = P^{-1}b$$

The **residual** at iteration k is defined as

$$r^{(k)} = b - Ax^{(k)} = P(x^{(k + 1)}-x^{(k)})$$

If we generalize this formula adding a variable $\alpha$ which can be both static or dynamic to change the residuals, we optain a family of methods called **Richardson's method**.

$$P(x^{(k + 1)}-x^{(k)}) = \alpha_kr^{(k)}$$

$\alpha$ cannot be 0, nor change the sign of r.

## Jacobi method

In the Jacobi method, we have that $P = D = diag(a_{11}, a_{22}, \dots, a_{nn})$. The Jacobi methods uses $\alpha_k = 1$ and it is slow since it does not take in account the current iteration work. 

## Gauss-Seidel method

The preconditioner is this case is a lower diagonal matrix $P = D - E$ and $\alpha_k = 1$ where E is the lower triangular matrix without the diagonal, multiplied by -1 of A ($E_{ij} = -a_{ij}$ if i > j, = 0 elsewhere). It is faster since it uses the currently computed results in the formula (see the x+1 factor present, which wasn't there in the previous case).

The convergence of these methods is present if $A$ is strictly diagonally dominant by row. If $A$ is symmetric positive definite, then Gauss-Seidel converges. If A is a tridiagonla non-singular matrix without null diagonal elements. Then the two methods are **both divergent or convergent**, but if they diverge we have that $\rho(B_j)^2 = \rho(B_{GS})$.

## Richardson method

In the Richardson method, $\alpha$ could be either **stationary preconditioned** or **dynamic preconditioned** (in the latter case, $\alpha $ varies during iterations).

In order to choose $\alpha$, if A and P are symmetric positive definite, we have two optimal criteria:

* **Stationary case**: $\alpha_k = \frac{2}{\lambda_{min} + \lambda_{max}}$

* **Dynamic case**: 

$$\alpha_k = \frac{(z^{(k)})^Tr^{(k)}}{(z^{(k)})^TAz^{(k)}}$$

Where $z^{(k)} = P^{-1}r^{(k)}$. This is called the **preconditioned gradient method** because the solution is equal to the one giving minimum energy, which is encoded by a gradient equal to 0. This shows how numerical analysis has some implications in physics.

If P = I, we replace z with r in the dynamic case.

See slide 17 to define the steps of Richardson method. $P$ **should make the resolution easy in order to make the whole iterative process computationally doable**.

In the relationship 8 in slide 18 we resume the considerations which in the past were about 8 methods, which are now condensed in a single definition.

$z^{(k)}$ is the **preconditioned error**, that is the error of the system after applying the precondition matrix to A.

We are able to minimize the coefficient $\alpha$ even though it acts on the error which could not be obtained, thanks to optimization capabilities.