<a href="https://colab.research.google.com/github/NogginBops/DD2363_VT23/blob/main/Lab1/report_lab_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab 1: Matrix Factorization**
**Julius Häger**

# **Abstract**



```
# This is formatted as code
```

Short summary of the lab report. State the objectives, methods used, main results and conlusions. 

The goal of this Lab is to build procedures for some common linear algebra operations and functions. The procedures presented are, sparse matrix-vector product, QR factorization using the Householder reflection method, direct solver for Ax=b, and blocked matrix-matrix product.

#**About the code**

In [1]:
"""This program is lab report in the course"""
"""DD2363 Methods in Scientific Computing, """
"""KTH Royal Institute of Technology, Stockholm, Sweden."""

# Copyright (C) 2023 Julius Häger (juliusha@kth.se)

# This file is part of the course DD2365 Advanced Computation in Fluid Mechanics
# KTH Royal Institute of Technology, Stockholm, Sweden
#
# This is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

'KTH Royal Institute of Technology, Stockholm, Sweden.'

# **Set up environment**

To have access to the neccessary modules you have to run this cell. If you need additional modules, this is where you add them. 

In [2]:
# Load neccessary modules.
from google.colab import files

import time
import numpy as np

from IPython.display import display, Math

#try:
#    from dolfin import *; from mshr import *
#except ImportError as e:
#    !apt-get install -y -qq software-properties-common 
#    !add-apt-repository -y ppa:fenics-packages/fenics
#    !apt-get update -qq
#    !apt install -y --no-install-recommends fenics
#    from dolfin import *; from mshr import *
    
#import dolfin.common.plotting as fenicsplot

from matplotlib import pyplot as plt
from matplotlib import tri
from matplotlib import axes
from mpl_toolkits.mplot3d import Axes3D

# Converts a sparse (CRS) matrix from a dense representation
def sparse_from_dense(matrix):
  val = []
  col_idx = []
  row_ptr = [0]

  count = 0
  for r, row in enumerate(matrix):
    for c, v in enumerate(row):
      if (v != 0):
        count += 1;
        val.append(v)
        col_idx.append(c)
    row_ptr.append(count)
  return SparseMatrix(np.array(val, dtype=matrix.dtype), col_idx, row_ptr)


# **Introduction**

To represent sparse matrices a compressed row storage (CRS) can be used to reduce the memory footprint of the matrix[TODO]. This introduces a need for a modified matrix-vector procedure that can take advantage of the CRS form of the matrix. This algorithm should ideally not consider any elements of the matrix that contains zeros, and as such has potential to be more efficient.

It is known that any real square matrix $A$ can be decomposed into as follows $A=QR$ where $Q$ is an orthogonal matrix and $R$ is an upper triangular matrix. There are three well known algorithms for calculating this factorization, use that uses the Gram-Schmidt process, one that uses Householder reflections, and one that uses Givens rotations. In this report the method using Householder reflections will be presented.

Solving systems of linear equations of the form $Ax=b$ is a common operation when doing various simulations. A direct solver proceduce using QR decomposition will be presented.

As computers get progressively faster at floating point opreations memory access speed is being left behind. This means that for some algorithms computational time complexity doesn't dominate program time, instead the computer is left waiting for main memory reads. This issue is especially common when multiplying large matrices as using a naïve method for multiplication results in many duplicated memory accesses and for large matrices that don't fit in the processor cache this can cause a lot of cache misses which increase the time of all of the memory accesses. This problem can be mitigated by increasing the computational intensity of the algorithm used. In this report a blocked matrix-matrix multiplication procedure will be presented.

# **Method**

## Assignment 1: Sparse Matrix-Vector product

To represent compressed row storage of matrices the following defintion is used:

In [3]:
class SparseMatrix:
  def __init__(self, val, col_idx, row_ptr):
    self.val = val
    self.col_idx = col_idx
    self.row_ptr = row_ptr

Where `val` contains is an array is the non-zero elements of the matrix, `col_idx` is an array that corresponds to the `val` array and specifies in that column that value is found in. `row_ptr` contains indices into `val` and `col_idx` that specify the start and end position of each row.

To implement the sparse matrix-vector product we go through non-zero values of each row of the matrix and use `col_idx` to look up the corresponding value in the vector.

In [4]:
## Assignment 1. Function: sparse matrix-vector product

def sparse_matrix_vector_product(matrix, vector):
  res = np.zeros(len(vector), dtype=np.result_type(matrix.val, vector))
  for i in range(len(res)):
    for j in range(matrix.row_ptr[i], matrix.row_ptr[i+1]):
      res[i] = res[i] + matrix.val[j] * vector[matrix.col_idx[j]]

  return res

To verify that this procedure produces correct results we can compare this to NumPy's built in matrix-vector product procedure, as this procedure has a high likeliehood of being correct as it is widely used.

In [5]:
## Assignment 1. Tests

dense = np.array([[3.0, 2, 0, 2, 0, 0],\
                  [0, 2, 1, 0, 0, 0],\
                  [0, 0, 1, 0, 0, 0],\
                  [0, 0, 3, 2, 0, 0],\
                  [0, 0, 0, 0, 1, 0],\
                  [0, 0, 0, 0, 2, 3]])

matrix = sparse_from_dense(dense)

vector = np.array([1, 1, 1, 1, 1, 1])

print(sparse_matrix_vector_product(matrix, vector))

print(dense @ vector)

[7. 3. 1. 5. 1. 5.]
[7. 3. 1. 5. 1. 5.]


# Assignment 2: QR factorization 

The simplicity of the Gram-Schmidt algorithm is very appealing, but it is inherently numerically unstable. Householder reflections are a lot more numerically stable, at the cost of a more complicated implementation. Compared to other numerically stable algorithms such as Givens rotations Householder reflections are not as easily parallelizable and have higher bandwidth requirements. But as the algorithm presented here is not going to be parallelized nor run on large matrices I have determined that Householder reflections are going to be sufficient.

In [6]:
## Assignment 2. Function: QR factorization

# FIXME: Reference the error in the course book algorithm. And reference this also:
# http://mlwiki.org/index.php/Householder_Transformation

def qr_householder(matrix):
  n = matrix.shape[0]
  A = matrix.copy().astype(float)
  Q = np.identity(n)
  for k in range(n - 1):
    x = A[k:n, k]
    v_k = x.copy()
    norm = np.linalg.norm(x)
    s = -np.sign(x[0])
    v_k[0] = v_k[0] - s*norm
    v_k = v_k/np.linalg.norm(v_k)
    for m in range(k, n):
      A[k:n,m] = A[k:n,m] - (2 * v_k * np.dot(v_k, A[k:n,m]))
    
    v_kT = np.transpose(np.atleast_2d(v_k))
    I = np.identity(k)
    F_k = np.identity(n - k) - 2 * ((v_k * v_kT) / np.dot(v_k, v_k))
    Z = np.zeros((k, n - k))
    Q_kT = np.block([[I, Z], [np.transpose(Z), np.transpose(F_k)]])
    Q =  Q @ Q_kT

  return Q, A

To verify this algorithm we can check that $R$ is in fact an upper triangular matrix. We can also check the Frobenius norms $|| Q^TQ - I||_F$ which should be $0$ if $Q$ is an orthogonal matrix (note that $Q^T = Q^{-1}$ for orthogonal matrices). We can also look at the Fronebious norm $|| QR - A ||_F$ to verify that composing $Q$ and $R$ does indeed give us $A$.

When checking if $R$ is actually triangular we use an epsilon $\epsilon = 1\times10^{-15}$ to compensate for the floating point errors that happen which causes some of the lower elements of the array to be not exactly zero but very close to it.

In [18]:
## Assignment 2. Tests

def is_upper_triangular(matrix):
  m, n = matrix.shape
  for row in range(m):
    for col in range(n):
      if col < row:
        if matrix[row, col] > 1e-15:
          return False
  return True

Q, R = qr_householder(dense)

print(f"R is upper triangular: {is_upper_triangular(R)}")

norm_fro = np.linalg.norm(Q@R - dense, ord = 'fro')
display(Math(rf'|| QR - A ||_F = {norm_fro}'))

norm_fro = np.linalg.norm((np.transpose(Q) @ Q) - np.eye(*Q.shape), ord = 'fro')
display(Math(rf'|| Q^TQ - I ||_F = {norm_fro}'))

R is upper triangular: True


<IPython.core.display.Math object>

<IPython.core.display.Math object>

To solve $Ax = b$ it is possible to calculate the inverse of $A$ and rewrite the equation as $x = A^{-1}b$ from which $x$ can be directly calculated. Calculating the inverse of a general matrix $A$ can be quite involved, so instead we can use the QR-factorization we just created to simplify this task. We can rearrange the equation as follows:
$$
\begin{align*}
Ax&=b\\
QRx&=b\quad\text{(using $A=QR$)}\\
Rx&=Q^Tb\quad\text{(using $Q^{-1} = Q^T$ and multiplying from the left)}
\end{align*}
$$

This form is much easier to solve as you easily do the $Q^Tb$ matrix-vector product and then you are left with $Rx = b'$ which can easily be solved with backwards substitusion as $R$ in an upper triangular matrix.

And so the algorithm consists of first QR-factorizing the matrix $A$ and then doing the multiplication $b' = Q^Tb$ follwed by backwards-substitution to solve $Rx = b'$.

In [8]:
## Assignment 3. Function: direct solver Ax=b

def backward_substitution(U, b):
  n = U.shape[0]
  x = np.zeros(n)
  x[n - 1] = b[n - 1] / U[n - 1, n - 1]
  for i in range(n - 2, -1, -1):
    sum = 0
    for j in range(i + 1, n):
      sum += U[i, j] * x[j]
    x[i] = (b[i] - sum) / U[i, i]
  
  return x

def solve(A, b):
  Q2, R2 = np.linalg.qr(A)
  Q, R = qr_householder(A)
  b_q = np.transpose(Q) @ b
  x = backward_substitution(R, b_q)
  return x

To test that this procedure works as it should we can verify that $Ax = b$ by checking that $|| Ax - b || = 0$. We can also check the computed result against a algorithmically calculated solution. In this case the solution to the following system of linear equations can be calcualted to be $\left[ \frac{1}{5}, \frac{3}{35} \right]^T$:

$$
\begin{bmatrix}
2 & 7 \\
5 & 0
\end{bmatrix}
x
=
\begin{bmatrix}
1 \\
1
\end{bmatrix}\\
x=\begin{bmatrix}
\frac{1}{5} \\
\frac{3}{35}
\end{bmatrix}
$$

In [19]:
## Assignment 3. Tests

m = np.array([[2, 7], [5, 0]])
b = np.array([1, 1])

x = solve(m, b)

x_actual = np.array([ 1 / 5.0, 3 / 35.0 ])

diff = np.linalg.norm((m @ x) - b)
display(Math(rf'|| Ax - b || = {diff}'))

diff_actual = np.linalg.norm(x - x_actual)
display(Math(rf'|| x - y || = {diff_actual}'))

<IPython.core.display.Math object>

<IPython.core.display.Math object>

In [10]:
## Extra Assignment. Blocked Matrix-Matrix product

def blocked_matrix_matrix_product(A, B):
  M = 2
  N = 2
  P = 2

  m, p = A.shape
  _, n = B.shape
  bm = int(np.ceil(m / M))
  bn = int(np.ceil(n / N))
  bp = int(np.ceil(p / P))

  C = np.zeros((m, n))

  for i in range(0, M):
    for j in range(0, N):
      for k in range(0, P):
        ib = i * bm
        jb = j * bn
        kb = k * bp
        A_b = A[ib:ib+bm, kb:kb+bp]
        B_b = B[kb:kb+bp, jb:jb+bn]
        C[ib:ib+bm,jb:jb+bn] = C[ib:ib+bm,jb:jb+bn] + (A_b @ B_b)
        

  return C

In [11]:
## Extra Assignment. Tests

A = np.array([[1.0, 2, 3, 4, 1], \
              [5, 6, 7, 8, 1], \
              [9, 1, 2, 3, 1], \
              [4, 5, 6, 7, 1]])

B = np.array([[1.0, 2], \
              [5, 6], \
              [9, 1], \
              [4, 5], \
              [1, 1]])


print(A)
print(B)

AB = blocked_matrix_matrix_product(A, B)

print("AB:", AB)
print("A@B", A@B)

[[1. 2. 3. 4. 1.]
 [5. 6. 7. 8. 1.]
 [9. 1. 2. 3. 1.]
 [4. 5. 6. 7. 1.]]
[[1. 2.]
 [5. 6.]
 [9. 1.]
 [4. 5.]
 [1. 1.]]
AB: [[ 55.  38.]
 [131.  94.]
 [ 45.  42.]
 [112.  80.]]
A@B [[ 55.  38.]
 [131.  94.]
 [ 45.  42.]
 [112.  80.]]


# **Results**

Present the results. If the result is an algorithm that you have described under the *Methods* section, you can present the data from verification and performance tests in this section. If the result is the output from a computational experiment this is where you present a selection of that data. 

# **Discussion**

Summarize your results and your conclusions. Were the results expected or surprising. Do your results have implications outside the particular problem investigated in this report? 