## Iterative Methods for Solving Linear Systems

Iterative techniques are rarely used for solving linear systems of small dimension because the computation time required for convergence usually exceeds that required for direct methods such as Gaussian elimination.  However, for very large systems, espeically sparse sytems (systems with a high percentage of 0 entries), these iterative techniques can be very efficient in terms of computational run times and memory usage.

An iterative technique starts to solve the matrix equation $A\bar{x} = \bar{b}$ starts with an initial approximation $\bar{x_0}$ and generates a sequence of vectors $\{\bar{x_1},\bar{x_2},...,\bar{x_n}\}$ that converges to $\bar{x}$ as $n\rightarrow\infty$. These techniques involve a process that onverts the system $A\bar{x}=\bar{b}$ to an equivalent system of the form $\bar{x}=T\bar{x}+\bar{c}$. The process then follows, for an initial guess $\bar{x_0}$:

$$
\bar{x_1} = T\bar{x_0} + \bar{c}\\
\bar{x_2} = T\bar{x_1} + \bar{c}\\
\vdots\\
\bar{x_n} = T\bar{x}_{n-1} + \bar{c}\\
$$

We will stop the iteration when some convergence critereon has been reached. A popular convergence critereon uses the $L_{\infty}$norm. The $L_{\infty}$norm is a metric that represents the greatest length or size of a vector or matrix component. Expressed mathematically,

$x = [x_1, x_2, ..., x_n]^T$

$|x|_{\infty} =$ max $|x_j|$ for all $i = 1, 2, ..., n$

###An example using an iterative method.

Consider the system:

$$
\begin{matrix} E_1: & 10x_1 & -x_2 & +2x_3 & & =& 6\\E_2: & -x_1 & +11x_2 & -x_3 & +3x_4 & =&25\\E_3: & 2x_1 & -x_2 & + 10x_3 & -x_4 & =&-11\\E_4: &  & -3x_2 & -x_3 & +8x_4 & = &15\end{matrix}
$$

Let us solve each equation, $E_j$, for the variable $x_j$.

$$
\begin{matrix} E_1: & x_1 =&  &\frac{1}{10}x_2& - \frac{1}{5}x_3 &  & +\frac{3}{5}\\
E_2: & x_2 = &\frac{1}{11}x_1 & &\frac{1}{11}x_3 & -\frac{3}{11}x_4 & +\frac{25}{11}\\
E_3: & x_3 = &-\frac{1}{5}x_1 & +\frac{1}{10}x_2 & & +\frac{1}{10}x_4 & -\frac{11}{10}\\
E_4: & x_4 = & & -\frac{3}{8}x_2 & + \frac{1}{8}x_3 & & +\frac{15}{8} \end{matrix}
$$

using the notation from the previous page, we have:

$$
\bar{x}_1 = \begin{pmatrix} 0 & \frac{1}{10} & -\frac{1}{5} & 0\\
\frac{1}{11} & 0 & \frac{1}{11} & -\frac{3}{11}\\
-\frac{1}{5} & \frac{1}{10} & 0 & \frac{1}{10}\\
0 & -\frac{3}{8} & \frac{1}{8} & 0 \end{pmatrix}\bar{x}_0 + \begin{pmatrix}\frac{3}{5}\\\frac{25}{11}\\-\frac{11}{10}\\\frac{15}{8}\end{pmatrix},$$ for $\bar{x}_0 = \begin{pmatrix}0\\0\\0\\0\end{pmatrix}$ then $\bar{x}_1 = \begin{pmatrix}0.6000\\2.2727\\-1.1000\\1.8750\end{pmatrix}$

we repeat this process until the desired convergence has been reached. This technique is called the Jacobi iterative method.

##Observations on the Jacobi iterative method.

Let's consider a matrix $A$, in which we split into three matrices, $D$, $U$, $L$, where these matrices are diagonal, upper triangular, and lower triangular respectively.

$$
A = \begin{pmatrix}a_{11}&a_{12}&...&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&...&...&a_{nn}\end{pmatrix}, D = \begin{pmatrix}a_{11}&0&...&0\\0&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\0&...&...&a_{nn}\end{pmatrix}, L = \begin{pmatrix}0&0&...&0\\-a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\-a_{n1}&...&-a_{n(n-1)}&0\end{pmatrix}, U = \begin{pmatrix}0&-a_{12}&...&-a_{1n}\\\vdots&\ddots& &\vdots\\\vdots& &\ddots&-a_{(n-1)n}\\0&...&...&0\end{pmatrix}
$$

If we let $A = D-L-U$, then the matrix equation $A\bar{x} = \bar{b}$ becomes:

$$(D-L-U)\bar{x} = \bar{b}$ &nbsp;&nbsp;&nbsp;&nbsp; or &nbsp;&nbsp;&nbsp;&nbsp; $D\bar{x} = (L+U)\bar{x}+\bar{b}$$

if $D^{-1}$ exists, that implies $a_{jj} \neq 0$, then

$$\bar{x} = D^{-1}(L+U)\bar{x}+\bar{b}$$

The results in the matrix form of the Jacobi iteration method:

$$\bar{x}_k = D^{-1}(L+U)\bar{x}_{k-1}+D^{-1}\bar{b}$$

We can see that one requirement for the Jacobi iteration to work is for $a_{ii} \neq 0$. This may involve row exchanges before iterating for some linear systems.

##An improvement to the Jacobi iterative method (Gauss-Seidel).

During the Jacobi iteration we always use the components of $\bar{x}_{k-1}$ to compute $x_{(k)_i}$ but for $i > 1, x_{(k)_1}, ...,x_{(k)_{i-1}}$ are already computed and are most likely the best approximations of the real solution. Therefore, we can calculate $x_{(k)_i}$ using the most recently calculated values when available. This technique is called Gauss-Seidel iteration. The pseudocode is as follows:

For the matrix equation $\bar{x} = A \bar{b}$ with an initial guess $\bar{x}_0$.

$$
A=\begin{pmatrix}a_{11}&a_{12}&...&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&...&...&a_{nn}\end{pmatrix}, \bar{b} = \begin{pmatrix}b_1\\b_1\\\vdots\\b_n\end{pmatrix}, \bar{x}_0 = \begin{pmatrix}x_{(0)_1}\\x_{(0)_2}\\\vdots\\x_{(0)_n}\end{pmatrix}
$$


Steps:

  1. While $\frac{|\bar{x}_{(k)}-\bar{x}_{k-1}|_{\infty}}{|\bar{x}_{(k)}|_{\infty}} >$ *tolerance* do Steps 2-3
  1. For $i = 1, 2, ..., n$ do Step 3.
  1. $x_{(k)i} = \frac{-\sum_{j=1}^{i-1}(a_{ij}x_{(k)_j})-\sum_{j=i+1}^{n}(a_{ij}x_{(k-1)_j})+b_i}{a_{ii}}$

##Observations on the Jacobi iterative method.

Let's consider a matrix $A$, in which we split into three matrices, $D, U, L,$ where these matrices are diagonal, upper triangular, and lower triangular respectively. 

$$
A = \begin{pmatrix}a_{11}&a_{12}&...&a_{1n}\\a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\a_{n1}&...&...&a_{nn}\end{pmatrix}, D = \begin{pmatrix}a_{11}&0&...&0\\0&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\0&...&...&a_{nn}\end{pmatrix}, L = \begin{pmatrix}0&0&...&0\\-a_{21}&\ddots& &\vdots\\\vdots& &\ddots&\vdots\\-a_{n1}&...&-a_{n(n-1)}&0\end{pmatrix}, U = \begin{pmatrix}0&-a_{12}&...&-a_{1n}\\\vdots&\ddots& &\vdots\\\vdots& &\ddots&-a_{(n-1)n}\\0&...&...&0\end{pmatrix}
$$

We will leave, as an exercise for the student, the derivation, but the matrix equation for the Gauss-Seidel iteration method is as follows:

$$\bar{x}_k = (D-L)^{-1}U\bar{x}_{k-1}+(D-L)^{-1}\bar{b}$$

In order for the lower triangular matrix (D-L) to be invertible it is necessary and sufficient for $a_{ii}\neq 0$. As before, this may involve row exchanges before iterating for some linear systems.

## Convergence of iterative methods.

First, a definition, the *spectral radius*, $\rho$, of matrix $A$ is the maximum of the absolute values of the matrix $A$'s eigenvalues. 

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\rho(A) =$ max$|\lambda_i|$ where the $\lambda_i$'s are the eigenvalues of $A.$

Now, if $\rho(A) < 1$, then $(I-A)^{-1}$ exists, and 

$$(I-A)^{-1} = I + A + A^2 + ... = \sum_{j=0}^{\infty}A^j$$

We can prove this, starting with the eignvalue equation:

$$A\bar{x} = \lambda \bar{x} \rightarrow (I-A)\bar{x} = (1-\lambda)\bar{x}$$

$\lambda$ is an eignvalue of $A$, exactly when $(1-\lambda)$ is an eigenvalue of $(1-A)$. But $|\lambda|\leq \rho(A) < 1$, therefore $1$ cannot be an eigenvalue of $A$, and $0$ cannot be an eigenvalue of $(I-A)$. A matrix in which none of the eigenvalues are zero is always invertible, therefore $(I-A)^{-1}$ exists. There also exists an identity which states that a matrix $A$ is *convergent* if $\rho(A) < 1$, which implies that $\lim_{n\rightarrow \infty}A^n\bar{x} = 0$ for all $\bar{x}$. Now, let

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$S_m = I + A + A^2 + ... + A^m$. Then,

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$(I-A)S_m = (1+A+A^2+...+A^m)-(A+A^2+...+A^{m+1})=(I-A^{m+1})$. Since $A$ is convergent, we can see that by taking the limit of both sides,

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\lim_{m\rightarrow\infty}(I-A)S_m=\lim_{m\rightarrow\infty}(I-A^{m+1})=I$, thus

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$(I-A)^{-1} = \lim_{m\rightarrow\infty}S_m = \sum_{j=0}^{\infty}A^j$

Now for any $\bar{x}_0\in\mathbb{R}^n$ the sequence $\{\bar{x}_k\}_{k=0}^{\infty}$ computed by

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\bar{x}_k = T\bar{x}_{k-1}+\bar{c}$, for each $k\geq1$

Converges to the unique solution of $\bar{x} = T \bar{x} + \bar{c}$ if and only if $\rho (T) < 1$.

We can prove this, first we will assume that $\rho(T) < 1$. Then,

&nbsp;$\bar{x}_k = Tx_{k-1}+\bar{c}$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$= T(T\bar{x}_{k-2} + \bar{c}) + \bar{c}$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$= T^2 \bar{x}_{k-2}+(T-I)\bar{c}$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\vdots$

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$= T^k \bar{x}_0 + (T^{k-1} + ... + T + I)\bar{c}$

Since $\rho(T) < 1$, $T$ is convergent, then from waht we learned previously, it follows:

$$lim_{k\rightarrow \infty}\bar{x}_k = lim_{k\rightarrow \infty}T^k\bar{x}_0 + (\sum_{j=0}^{\infty}T^j)\bar{c} = 0 + (I-T)^{-1} \bar{c}$$

Therefore, $\bar{x}_k$ converges to $(I-T)^{-1}\bar{c}$.