# **Lab 2: Iterative methods**
**Martin Börjeson**

# **Abstract**

This lab is about implementing various iterative methods, which are procedures of no predetermined length, where each iteration improves the result of the previous one. 

I wrote methods for performing Jacobi iteration, Gauss-Seidel iteration, and Newton's method for scalar and vector functions.

All functions passed my limited test suite, though the computational complexity of my Jacobi and Gauss-Seidel iterations is greater than it has to be, which is a result of me directly writing them as a form of left-preconditioned Richardson iteration.

#**About the code**

The code is written by me, Martin Börjeson.

In [2]:
"""DD2363 Methods in Scientific Computing, """
"""KTH Royal Institute of Technology, Stockholm, Sweden."""

# This file is part of the course DD2365 Advanced Computation in Fluid Mechanics
# KTH Royal Institute of Technology, Stockholm, Sweden
#
# Report by Martin Börjeson

'KTH Royal Institute of Technology, Stockholm, Sweden.'

# **Set up environment**

To have access to the neccessary modules you have to run this cell. If you need additional modules, this is where you add them. 

In [3]:
# Load neccessary modules.
from google.colab import files

import time
import numpy as np
import unittest

from matplotlib import pyplot as plt
from matplotlib import tri
from matplotlib import axes
from mpl_toolkits.mplot3d import Axes3D


#Methods from lab1
def modified_gram_schmidt_iteration(A: np.ndarray):
  dims = A.shape
  if(dims[0] != dims[1]):
    raise Exception("Matrix is not quadratic!")
  if(np.linalg.matrix_rank(A)!=A.shape[0]):
    raise Exception("Matrix is singular!")
  A = np.copy(A)
  n = dims[0]
  R = np.zeros(A.shape, dtype = float)
  Q = np.zeros(A.shape, dtype = float)
  for j in range(n):
    v = A[:,j]
    for i in range(j):
      R[i,j] = np.dot(Q[:,i],v)
      v -= R[i,j]*Q[:,i]
    R[j,j] = np.linalg.norm(v)
    Q[:,j] = v/R[j,j]
  return Q,R

def backward_substitution(R: np.ndarray, v: np.ndarray) -> np.ndarray:
  n = v.shape[0]
  x = np.zeros(v.shape, dtype = float)
  x[-1] = v[-1]/R[-1,-1]
  for i in range(n-2,-1,-1):
    sum = 0
    for j in range(i+1,n):
      sum += R[i,j]*x[j]
    x[i] = (v[i]-sum)/R[i,i]
  return x

def solve(A: np.ndarray, b: np.ndarray) -> np.ndarray:
  A = np.copy(A)
  b = np.copy(b)
  Q,R = modified_gram_schmidt_iteration(A)
  return backward_substitution(R,np.dot(np.transpose(Q),b))

# **Introduction**

An iterative method is a procedure which takes an undetermined amount of steps, where each step improves upon the result of the previous one. Some mathemathical expressions are infeasible to directly evaluate, such as finding the inverse of a very large matrix. An approximate solution, however, is often sufficient. 

In this report, I've implemented an algorithm to perform Jacobi and Gauss-Seidel iterations, as well as Newton's method for scalar and vector functions. 

A class and a few methods from the Numpy library are used. They are the following:

`numpy.ndarray` class

`numpy.zeros()`

`numpy.array()`

`numpy.matmul()`

`numpy.dot()`

`numpy.diag()`

`numpy.linalg.norm()`


## **Method**

Both Jacobi and Gauss-Seidel iteration are both a kind of left preconditioned Richardson iteration on the form $BAx = Bb$, where $Ax = b$ is the equation that we really are trying to solve, but the error reduction in each step is minimized by multiplying both sides with the preconditional matrix $B \approx A^{-1}$.

In Jacobi iteration, $B$ is the diagonal elemenents of the matrix $A$, and in Gauss-Seidel it is the (lower) triangular elements the matrix $A$.

\\
Newton's method is a special case of fixed point iteration $x^{(k+1)}=x^{(k)}+ \alpha f(x^{(k)})$ where $\alpha = -f´(x^{(k)})^{-1}$, which gives the approximation quadratic convergence, which is pretty darn fast. The caveat is that the derivative of the evaluated function $f(x)$ has to be computed at each step, which isn't always easy.

Newton's method can also be used to find fixed points or roots in vector functions. In that case, the inverted derivative of $f(x)$ is replaced with the inverted jacobian matrix in the formula.

My implementations of Newton's methods take both the function and its derivative/jacobian as input, which offloads the hard problem of finding the derivatives to the user of the algorithm.


#Jacobi iteration

Input: matrix $A$, vector $b$

Output: vector $x$ where $x$ is the approximate solution to the equation $b=Ax$ through Jacobi iteration

In [4]:
from pyparsing.helpers import And
def jacobi_iter(A: np.ndarray, b: np.ndarray, itermax = 100, tol = 10**(-10)) -> np.ndarray:
  Dvec = np.diag(A)
  A1 = np.diag(Dvec)
  A1_inv = np.diag(1/Dvec)
  A2 = A - A1
  Mj = np.matmul(-A1_inv,A2)
  c = np.dot(A1_inv,b)
  xprev = np.zeros(len(b))
  nr = 1

  iter = 0
  while iter<itermax and nr > tol:
    iter +=1
    x = np.dot(Mj,xprev) + c
    r = x-xprev
    xprev = x
    nr = np.linalg.norm(r)
  return x

# Gauss-Seidel iteration

Input: matrix $A$, vector $b$

Output: vector $x$ where $x$ is the approximate solution to the equation $b=Ax$ through Gauss-Seidel iteration

In [5]:
#Inversion of lower triangular matrix
def forward_inversion(L: np.ndarray) -> np.ndarray:
  n = L.shape[0]
  X = np.identity(n, dtype = float)
  for i in range(0,n):
    X[i] /= L[i,i]
    L[i] /= L[i,i]
    for j in range(i+1,n):
      X[j] -= X[i]*L[j,i]/L[i,i]
      L[j] -= L[i]*L[j,i]/L[i,i]
  return X

def gauss_seidel_iter(A: np.ndarray, b: np.ndarray, itermax = 100, tol = 10**(-10)) -> np.ndarray:
  A1 = np.tril(A)
  A2 = A-A1
  A1_inv = forward_inversion(A1)
  Mgs = np.matmul(-A1_inv,A2)
  c = np.dot(A1_inv,b)
  xprev = np.zeros(len(b))

  nr = 1
  iter = 0
  while iter<itermax and nr > tol:
    iter +=1
    x = np.dot(Mgs,xprev) + c
    r = x-xprev
    xprev = x
    nr = np.linalg.norm(r)
  return x

# Newton's method for scalar nonlinear equation $f(x) = 0$

Input: scalar function $f(x)$, derivative $f'(x)$, initial guess $x_0$.

Output: real number $x$, where $f(x)\approx0$

In [6]:
def newtons_method(f, df, x0, maxiter = 100, tol = 10**-10):

  x = x0
  iter = 0

  while(iter < maxiter and np.abs(f(x))>tol):
    iter += 1
    x -= f(x)/df(x)
  return x

#Bonus assignment: Newton's method for vector nonlinear equation $f(x) = 0$

Input: Vector function $f(x)$, jacobian matrix $j_f$, initial guess vector $x_0$.

Output: Vector $x$, where $f(x)\approx 0$

In [7]:
def newtons_vector_method(f, jf, x0, maxiter = 100, tol = 10**-10):
  x = np.copy(x0)
  iter = 0

  while(iter < maxiter and np.linalg.norm(f(x))>tol):
    iter += 1
    x -= solve(jf(x),f(x)) #Ax = b solver from lab1
  return x

# **Results**

Below are a limited set of tests for the varies methods.

#Jacobi iteration

The Jacobi iteration converges if $||I-B^{-1}A || <1$, where B contains the diagonal elements of A. This is the case if A is a diagonally dominant matrix, i.e. that $|a_{i,i}| \ge \Sigma_{j \ne i}|a_i,j|$ for all $i$.

I selected a matrix where that condition holds as a test matrix for the iteration.

I then calculate the geometric mean of the convergences at every step, which should give a good estimate of the overall convergence

Test: Convergence of residual



In [15]:
#TESTS

A = np.array([[1,0.2,0,0,0],[0,2,0.1,0,0],[0.4,0,1,0,0],[0.5,0,0,5,0],[0,0.1,0,0,1]])
b = np.array([1,2,3,4,5])
x = jacobi_iter(A,b)

arr = []
for i in range(1,100):
  arr.append(jacobi_iter(A,b,i))

k = 4
i = 0
sum = 1
while(k<len(arr)-1 and np.linalg.norm(arr[k+1]) != np.linalg.norm(arr[k+2])):
  k += 1
  i += 1
  sum *= np.log(np.linalg.norm(arr[k]-arr[k-1])/np.linalg.norm(arr[k-1]-arr[k-2]))/np.log(np.linalg.norm(arr[k-1]-arr[k-2])/np.linalg.norm(arr[k-2]-arr[k-3]))
#Linear convergence
print("Convergence q ≈", sum**(1/i))

Convergence q ≈ 1.0000000303404102


# Gauss-Seidel iteration

The same condition for converging holds for the Gauss-Seidel iteration, that $||I-B^{-1}A ||$ has to be less than 1, though in this case B is instead the (lower) triangular matrix of A. The same matrix used to test jacobi iteration worked in this case also.

The geometric mean of the convergence was used here as well, which was useful as the convergence varied greatly between each step (0.65 to 1.53 roughly).

Test: Convergence of residual

In [48]:
#TESTS

A = np.array([[1,0.2,0,0,0],[0,2,0.1,0,0],[0.4,0,1,0,0],[0.5,0,0,5,0],[0,0.1,0,0,1]])
b = np.array([1,2,3,4,5])
x = gauss_seidel_iter(A,b)

arr = []
for i in range(1,100):
  arr.append(gauss_seidel_iter(A,b,i))

k = 2
i = 0
sum = 1
temp = 0
while(k<len(arr)-1 and np.linalg.norm(arr[k+1]) != np.linalg.norm(arr[k+2])):
  k += 1
  i += 1
  temp = np.log(np.linalg.norm(arr[k]-arr[k-1])/np.linalg.norm(arr[k-1]-arr[k-2]))/np.log(np.linalg.norm(arr[k-1]-arr[k-2])/np.linalg.norm(arr[k-2]-arr[k-3]))
  print("q ≈", temp)
  sum *= temp
  
#Linear convergence
print("Convergence q ≈", sum**(1/i))

q ≈ 1.526309990491204
q ≈ 0.6551749030211009
q ≈ 1.5263099904813349
q ≈ 0.6551749030354376
q ≈ 1.5263099924098946
q ≈ 0.6551749091371564
Convergence q ≈ 1.000000001768021


# Newton's method for scalar nonlinear equation $f(x) = 0$

I used a function which passes through 0 and which is easy to find the derivative to. $f(x) = x^2-x-5,$ $f'(x) = 2x-1$. The convergence seemed to increase at each step, which is why I chose the overall convergence to be the convergence at the gratest value $k$ where $x^{(k+1)} \ne x^{(k)}$.

Test: Convergence of residual

In [38]:
#TESTS
def scalar_function(x):
  return x**2-x-5

def scalar_dfunction(x):
  return 2*x-1


arr = []
for i in range(1,100):
  arr.append(newtons_method(scalar_function, scalar_dfunction, 0, i))

k = 1
i = 0
sum = 0
while(k<len(arr)-1 and np.abs(arr[k+1]) != np.abs(arr[k+2])):
  k += 1
sum += np.log(np.abs(arr[k]-arr[k-1])/np.abs(arr[k-1]-arr[k-2]))/np.log(np.abs(arr[k-1]-arr[k-2])/np.abs(arr[k-2]-arr[k-3]))

#Quadratic convergence
print("Convergence q ≈", sum)

Convergence q ≈ 1.968993291218997


#Bonus assignment: Newton's method for vector nonlinear equation $f(x) = 0$

I created the vector function by plotting creating two multivariable scalar functions $f(x,y) = z_1$ and $g(x,y) = z_2$ and plotting them in 3d to ensure that their surfaces intersect somewhere where the output is 0. This means that the vector function $F(x,y) = [f(x,y),g(x,y]^T$ maps some input to the zero-vector.

I then tested the convergence in the same way as I tested Newton's method for scalar functions.

Test: Convergence of residual

In [45]:
#TESTS

#f([x1,x2]) = [(x1)^2 + cos(x2), -x1 + x2^2-8]
def vector_function(x: np.ndarray):
  return np.array([x[0]**2 + np.cos(x[1]), -x[0] + x[1]**2-8])

def vector_dfunction(x: np.ndarray):
  return np.array([[2*x[0],-np.sin(x[1])],[-1,2*x[1]]])

x0 = np.array([50,25],dtype = float)
arr = []
for i in range(1,100):
  arr.append(newtons_vector_method(vector_function, vector_dfunction, x0, i))

k = 6
i = 0
sum = 0
while(k<len(arr)-1 and np.linalg.norm(arr[k+1]) != np.linalg.norm(arr[k+2])):
  k += 1
sum += np.log(np.linalg.norm(arr[k]-arr[k-1])/np.linalg.norm(arr[k-1]-arr[k-2]))/np.log(np.linalg.norm(arr[k-1]-arr[k-2])/np.linalg.norm(arr[k-2]-arr[k-3]))

#Quadratic convergence
print("Convergence q ≈", sum)

Convergence q ≈ 1.9951656908053532


# **Discussion**

The limited test results seem to confirm that the methods work as they should. When I wrote the tests for Newton's method of vector functions, I first got a linear convergence instead of a quadratic one. I later discovered that I had made a mistake when writing the jacobian, which of course messed up the convergence. It's interesting that the method converged despite a wrongly defined jacobian, at least in my test case. It was also interesting that the convergence varied between the iteration steps for the Gauss-Seidel iteration. Not sure why that was the case.

My implementations of Jacobi and Gauss-Seidel iteration are both written as a Richardson iteration, which I've now realized might be a mistake, as I use matrix multiplication to create the iteration matrix $M_J$ and $M_{GS}$. The computational complexity of general matrix multiplication is the same as that of matrix inversion, which is exactly what we were trying to get away from. Each iteration step in my implementation has quadratic complexity, as it should, but complexity for the creation of the iteration matrix is higher.

If I were to redo my implementations, I would instead write them in terms of the components of the matrix $A$, which has a quadratic computational complexity for each iteration step, and no setup required.