#### Fixed point method

Linear system of equations solves $b=Ax$ for $x$, $A\in \mathbf{R}^{m \times m}$, full rank, we have seen that we can use either Cholesky or LU with partial pivoting

These methods are `direct`, meaning we transform the problem into simpler form, and compute the `exact solution` in a finite number of steps

For `iterative` methods, we start with some `initial guess` $x^0$, and successively improve `approximation `

$$x^0 \rightarrow x^1 \rightarrow x^2 \rightarrow \cdots$$ until some desired accuracy is reached

$$\|Ax^k-b\|\leq \epsilon$$

This is similar to iterative methods we used to find eigenvalues such as power iterations and orthogonal iterations

One of the simplest form of iterative method is `fixed-point` method

For a mapping $F$ and assume $x^*$ is the solution to our system of equations, we would like to have

$$x^*=F(x^*)$$

or in a linear form

$$x^*=Gx^*+f$$

where $G$ is a matrix and $f$ is a vector

Then, we can choose an initial guess $x^0$ and repeat

$$x^{k+1}=F(x^k)$$

until convergence

#### `Jacobi` method

For our linear system of equations $Ax=b$

The Jacobi method splits $A$ in to $D-E-F$, where $D$ is the diagonal of $A$,  $-E$ is the strict lower part, and $-F$ is the strict upper part

We can rewrite

$$Ax=b \Longleftrightarrow (D-E-F)x=b$$

Rearrange

$$Dx=(E+F)x+b$$

and we have the `iterative form`

$$x^{k+1}=D^{-1}\left((E+F)x^k+b\right)$$

In essence, what this does is to keep non-diagonal entries of $x$ at values from previous iteration while solving for the diagonal entries of $x$

We can see this from a simple example, assume a 3 x 3 system with initial guess of $x^0=(x_1^0, x_2^0, x_3^0)$ (which will not be accurate as reflected by the residual $r^0$)

$$\begin{align*}
a_{11}x_1^0+a_{12}x_2^0+a_{13}x_3^0&=b_1-r_1^0\\
a_{21}x_1^0+a_{22}x_2^0+a_{23}x_3^0&=b_2-r_2^0\\
a_{31}x_1^0+a_{32}x_2^0+a_{33}x_3^0&=b_3-r_3^0
\end{align*}$$

What Jacobi method does in the first iteration is to solve the diagonal entries of $x$ while keep the non-diagonal $x$ as the initial guess

$$\begin{align*}
a_{11}\color{red}{x_1^1}+a_{12}x_2^0+a_{13}x_3^0&=b_1\rightarrow \color{red}{x_1^1}\\
a_{21}x_1^0+a_{22}\color{red}{x_2^1}+a_{23}x_3^0&=b_2\rightarrow \color{red}{x_2^1}\\
a_{31}x_1^0+a_{32}x_2^0+a_{33}\color{red}{x_3^1}&=b_3\rightarrow \color{red}{x_3^1}
\end{align*}$$

Then, it plugs these better approximations into all entries and repeats the process

#### Example

We have

$$A=\begin{bmatrix}2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2\end{bmatrix},\, x=\begin{bmatrix}3 \\ 4 \\ 5\end{bmatrix}, \, b=Ax=\begin{bmatrix}2 \\ 0 \\ 6\end{bmatrix}$$

and $x^0=\begin{bmatrix}1 \\ 1\\ 1\end{bmatrix}$

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import time
np.set_printoptions(formatter={'float': '{: 0.4f}'.format})

plt.style.use('dark_background')
# color: https://matplotlib.org/stable/gallery/color/named_colors.htm

In [2]:
A = np.array([[2, -1, 0], [-1, 2, -1], [0, -1, 2]])
b = np.array([2, 0, 6])
x = np.array([1, 1, 1])

D = np.diag(A)
E = -np.tril(A, -1)
F = -np.triu(A, 1)

iter = 20
for i in range(iter):
    x = ((E+F)@x + b) / D
    print(f'# {i+1}: {x}')

# 1: [ 1.5000  1.0000  3.5000]
# 2: [ 1.5000  2.5000  3.5000]
# 3: [ 2.2500  2.5000  4.2500]
# 4: [ 2.2500  3.2500  4.2500]
# 5: [ 2.6250  3.2500  4.6250]
# 6: [ 2.6250  3.6250  4.6250]
# 7: [ 2.8125  3.6250  4.8125]
# 8: [ 2.8125  3.8125  4.8125]
# 9: [ 2.9062  3.8125  4.9062]
# 10: [ 2.9062  3.9062  4.9062]
# 11: [ 2.9531  3.9062  4.9531]
# 12: [ 2.9531  3.9531  4.9531]
# 13: [ 2.9766  3.9531  4.9766]
# 14: [ 2.9766  3.9766  4.9766]
# 15: [ 2.9883  3.9766  4.9883]
# 16: [ 2.9883  3.9883  4.9883]
# 17: [ 2.9941  3.9883  4.9941]
# 18: [ 2.9941  3.9941  4.9941]
# 19: [ 2.9971  3.9941  4.9971]
# 20: [ 2.9971  3.9971  4.9971]


We see Jacobi method gradually converges to the true $x$

#### `Gauss-Seidel` method

The idea of Gauss-Seidel is to immediately use the updated entries of $x^{k+1}$ as long as they become avaiable, rather than waiting for all entries to be updated

Still use the 3 x 3 system above, at the first iteration Gauss-Seidel solves

$$\begin{align*}
a_{11}\color{red}{x_1^1}+a_{12}x_2^0+a_{13}x_3^0&=b_1\rightarrow \color{red}{x_1^1}\\
a_{21}\color{red}{x_1^1}+a_{22}\color{red}{x_2^1}+a_{23}x_3^0&=b_2\rightarrow \color{red}{x_2^1}\\
a_{31}\color{red}{x_1^1}+a_{32}\color{red}{x_2^1}+a_{33}\color{red}{x_3^1}&=b_3\rightarrow \color{red}{x_3^1}
\end{align*}$$

and the iteration form becomes

$$(D-E)x^{k+1}=\left(Fx^k+b\right)$$

where $D-E$ is `lower triangular` matrix, and we can use forward substitution

(If we swap the role of $E$ and $F$, we can also solve it using back substitution)

In [3]:
def forward_substitution(L, b):
    m, n = L.shape
    x = np.zeros(n)
    for i in range(n):
        x[i] = (b[i] - np.dot(L[i, :i], x[:i])) / L[i, i]
    return x

In [4]:
x = np.array([1, 1, 1])
D = np.diag(np.diag(A))

for i in range(iter):
    x = forward_substitution(D-E, F@x+b)
    print(f'# {i+1}: {x}')

# 1: [ 1.5000  1.2500  3.6250]
# 2: [ 1.6250  2.6250  4.3125]
# 3: [ 2.3125  3.3125  4.6562]
# 4: [ 2.6562  3.6562  4.8281]
# 5: [ 2.8281  3.8281  4.9141]
# 6: [ 2.9141  3.9141  4.9570]
# 7: [ 2.9570  3.9570  4.9785]
# 8: [ 2.9785  3.9785  4.9893]
# 9: [ 2.9893  3.9893  4.9946]
# 10: [ 2.9946  3.9946  4.9973]
# 11: [ 2.9973  3.9973  4.9987]
# 12: [ 2.9987  3.9987  4.9993]
# 13: [ 2.9993  3.9993  4.9997]
# 14: [ 2.9997  3.9997  4.9998]
# 15: [ 2.9998  3.9998  4.9999]
# 16: [ 2.9999  3.9999  5.0000]
# 17: [ 3.0000  4.0000  5.0000]
# 18: [ 3.0000  4.0000  5.0000]
# 19: [ 3.0000  4.0000  5.0000]
# 20: [ 3.0000  4.0000  5.0000]


We see that Gauss-Seidel method converges noticeably faster than Jacobi method

#### Note on `convergence`

Though Jacobi and Gauss-Seidel can be applied to any matrix with non-zero elements on the diagonals, `convergence` is only guaranteed if the matrix is either `strictly diagonally dominant`, that is

$$|a_{ii}|\geq\sum_{j\neq i}|a_{ij}|, \forall i$$

or `symmetric and positive definite`

We now try Gauss-Seidel on a large matrix that is strictly diagonally dominant

In [5]:
np.random.seed(42)
m = 2000

A = np.random.randn(m, m)
A = A + m * np.eye(m)
x_gt = np.random.randn(m)
b = A @ x_gt

D = np.diag(np.diag(A))
E = -np.tril(A, -1)
F = -np.triu(A, 1)

x = np.ones_like(x_gt)

start_time = time.time()
iter = 20
for i in range(iter):
    x = forward_substitution(D-E, F@x+b)
    print(f'# {i+1}: {np.linalg.norm(x-x_gt)}')
print(f'Time taken: {time.time()-start_time}')

# 1: 1.0251969427344578
# 2: 0.01271551292339024
# 3: 0.00014406948722738274
# 4: 1.4895913879327367e-06
# 5: 1.3727245751499747e-08
# 6: 1.1765983053825444e-10
# 7: 9.262165253815274e-13
# 8: 3.4872462241548426e-14
# 9: 3.4114192696649624e-14
# 10: 3.4142990482108346e-14
# 11: 3.412798437252336e-14
# 12: 3.4148011409232124e-14
# 13: 3.414289848333577e-14
# 14: 3.414409190540434e-14
# 15: 3.414409190540434e-14
# 16: 3.414409190540434e-14
# 17: 3.414409190540434e-14
# 18: 3.414409190540434e-14
# 19: 3.414409190540434e-14
# 20: 3.414409190540434e-14
Time taken: 1.0531046390533447


One `benefit of iterative method` such as Gauss-Seidel is that we don't need to deal with the factorization of potentially large matrix $A$, which can be very time consuming

For example, we can use the LU with partial pivoting to solve the same problem...

In [6]:
def lu_factorization(A):
    m, n = A.shape
    u_mat = A.copy().astype(float)
    l_mat = np.identity(m)
    p_mat = np.identity(m)

    for k in range(m-1):
        # Find pivot
        pivot = np.argmax(np.abs(u_mat[k:, k])) + k

        if pivot != k:
            # Swap rows in u, p, and l
            u_mat[[k, pivot], :] = u_mat[[pivot, k], :]
            p_mat[[k, pivot], :] = p_mat[[pivot, k], :]
            l_mat[[k, pivot], :k] = l_mat[[pivot, k], :k]

        for j in range(k + 1, m):
            l_mat[j, k] = u_mat[j, k] / u_mat[k, k]
            # Subtract multiply of kth row from jth row
            u_mat[j, k:] -= l_mat[j, k] * u_mat[k, k:]

    return p_mat, l_mat, u_mat

def back_substitution(R, b):
    m, n = R.shape
    x = np.zeros(n)
    for i in range(n - 1, -1, -1):
        x[i] = (b[i] - np.dot(R[i, i + 1:], x[i + 1:])) / R[i, i]
    return x

In [7]:
start_time = time.time()
p, l, u = lu_factorization(A)
y_lu = forward_substitution(l, p @ b)
x_lu = back_substitution(u, y_lu)

print(np.linalg.norm(x_lu - x_gt))

print(f'Time taken: {time.time()-start_time}')

5.968062548557861e-14
Time taken: 25.044858932495117
