<a href="https://colab.research.google.com/github/johanhoffman/DD2363_VT23/blob/main/template-report-lab-X.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Lab 1: Matrix Factorization**
**Teo Nordström**

# **Abstract**

This file contains the solutions to the three mandatory problems from Lab1 in DD2363, in addition to the solution to one of the optional problems. It is based upon pseudocode and info found in *Methods in Computational Science* by Johan Hoffman (2021)

#**About the code**

A short statement on who is the author of the file, and if the code is distributed under a certain license.

In [None]:
"""This file is based on a template for lab reports in the course"""
"""DD2363 Methods in Scientific Computing, """
"""KTH Royal Institute of Technology, Stockholm, Sweden."""

# TEMPLATE INFO:
# Copyright (C) 2020 Johan Hoffman (jhoffman@kth.se)

# This file is part of the course DD2365 Advanced Computation in Fluid Mechanics
# KTH Royal Institute of Technology, Stockholm, Sweden
#
# This is free software: you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.

# This template is maintained by Johan Hoffman
# Please report problems to jhoffman@kth.se

# CODE INFO:
# Code written by Teo Nordström 2024, no license.

'KTH Royal Institute of Technology, Stockholm, Sweden.'

# **Set up environment**

These are the neccessary modules for everything in this file to work.

In [None]:
from google.colab import files

import numpy as np



# **Introduction**

All solutions will be partially or entirely based upon the book *Methods in Computational Science* by Johan Hoffman (2021). In the text, it will be referred to as the "course book".

# Sparse Matrix-Vector Product
A sparse matrix is a matrix in which a large percentage of values are zero. Considering the fact that $x \cdot 0 = 0$ for any $x$, calculating all of these multiplications would be wasteful on processing time and power. To perform this matrix-vector product we do not store the matrix on the standard matrix form, instead opting to store it in a way that contains three lists, these being the values, the column index for each value, and lastly the cutoff points for when we progress to the next row. By doing this, we can save both processing time and memory, as long as the matrix is suitably sparse.

# QR Factorization
QR Factorization is used to split up a matrix $A$ into component parts $Q$ and $R$ at the form $A = QR$. These matrices in turn have specific desirable properties: $Q$ is an orthogonal matrix and $R$ is an upper triangular matrix (both of which in and of themselves have many desirable properties). There are many methods to perform this factorization but this time Modified Gram-Schmidt QR Factorization will be performed.

# Direct Solver
As an extension to QR Factorization, a Direct Solver can be used to find the answer to the equation $Ax = b$ (where $A$ is a real quadratic matrix). This is since, while directly calculating the inverse to $A$ may be difficult, calculating the inverse to $Q$ is quite simple due to its properties. The orthogonality of $Q$ means that $Q^{-1}=Q^T$ which is very simple to process. Further, the fact that $R$ is upper triangular means that we can use backwards substitution to calculate the remaining $Rx = b'$ where $b' = Q^{-1}b$.

# Least-Squares Problem (bonus)
Least-squares is a problem where the solution is the linear solution that gives in total the least difference between the correct solutions and the given solution. By fitting a line to a non-linear problem we can still gather some important information. The form will be a matrix equation $Ax = b$ but where $A$ is of size $m \times n$ where $m > n$, meaning there will be more equations than unknowns.



# **Method**

# Sparse Matrix-Vector Product
To calculate the product of a sparse matrix and a vector we use the pseudocode of Algorithm 5.9 from the course book. It begins by going through each value in the vector $x$ and performing methods using the lists established in the introduction. For each value of $x$ it takes all of the values in the corresponding row of $A$ using the row pointers as a delimitation. Then it does the multiplications one at a time, taking the value from the matrix and multiplying it by the value in the right row of $x$. This is added to the corresponding value in $b$ and is after everything is complete returned.

In [None]:
sm_ex = [                                   # Sparse Matrix Example
    [3, 2, 2, 2, 1, 1, 3, 2, 1, 2, 3],  # Values
    [0, 1, 3, 1, 2, 2, 2, 3, 4, 4, 5],  # Column index
    [0, 3, 5, 6, 8, 9, 11]              # Row pointers
]

def sparse_matrix_vector_product (A, x):
    b = []
    for i in range(len(A[2]) - 1):
        b.append(0)
        for j in range(A[2][i], A[2][i+1]):
            b[i] += A[0][j]*x[A[1][j]]
    return b


print(sparse_matrix_vector_product(sm_ex, [1, 1, 1, 1, 1, 1]))

[7, 3, 1, 5, 1, 5]


#QR Factorization

To factor a matrix $A$ into $Q$ and $R$, Modified Gram-Schmidt Iteration was used. The code is based upon the pseudocode provided by Algorithm 5.3 in the course book. It takes in a square and full-rank matrix $A$, and from this it generates the $Q$ and $R$ factors. By using projection we construct the orthogonal matrix $Q$, and since each step essentially is multiplication with an upper triangular matrix we also get the upper triangular matrix $R$.

In [None]:
def modified_gram_schmidt_iteration(A:np.ndarray):
    n = len(A)
    Q = np.zeros((n, n))
    R = np.zeros((n, n))
    for j in range(n):
        v = A[:, j]
        for i in range(j):
            R[i, j] = np.dot(Q[:, i], v)
            v = v - R[i, j] * Q[:, i]
        R[j, j] = np.linalg.norm(v)
        Q[:, j] = v / R[j, j]
    return Q, R

mod_gsi_A = np.array([[3, 6, 1], [4, 1, 2], [5, 7, -4]])

Q, R = modified_gram_schmidt_iteration(mod_gsi_A)
print(Q)  # Orthogonal
print()
print(R)  # Upper Triangular

[[ 0.42426407  0.56273425  0.70945765]
 [ 0.56568542 -0.77648602  0.27761386]
 [ 0.70710678  0.28354827 -0.64776568]]

[[ 7.07106781  8.06101731 -1.27279221]
 [ 0.          4.58475735 -2.12443086]
 [ 0.          0.          3.85574812]]


# Direct Solver

The Direct Solver extends upon the function implemented for QR Factorization. It uses the fact that $Q^{-1}=Q^T$ to simplify the problem $Ax=b → QRx=b → Rx = Q^{-1}b → Rx = Q^Tb$. After this, a simple matrix-vector multiplication can be used to calculate the right hand side of the equation leading to the equation $Rx = b'$. Since $R$ is an upper triangular matrix, backwards substitution can now be used to find $x$ from $R$ and $b'$. The pseudocode for the backwards substitution was provided by Algorithm 5.2 in the course book.  

In [None]:
def backward_substitution(U:np.ndarray, b:np.ndarray):
    n = U.shape[1]
    x = np.zeros(n)
    x[n-1] = b[n-1] / U[n-1, n-1]
    for i in range(n-2, -1, -1):
        sum = 0
        for j in range(i+1, n):
            sum += U[i, j] * x[j]
        x[i] = (b[i] - sum) / U[i, i]
    return x


def direct_solver(A:np.ndarray, b:np.ndarray):
    Q, R = modified_gram_schmidt_iteration(A)
    y = np.transpose(Q).dot(b)
    x = backward_substitution(R, y)
    return x


bs_A = np.array([[2, -1], [-1, 2]])
bs_b = np.array([1, 2])

print(direct_solver(bs_A, bs_b))

[1.33333333 1.66666667]


# Least-Squares Problem (bonus)

Least-Squares did not have a direct pseudocode implementation that could be found in the course book, but there was information in Example 2.17 that could be used to develop a functioning Least-Squares solver using the pseudoinverse of $A$. The pseudoinverse is a left inverse version of $A$ since $A^+_LA = I$ where $A^+_L = (A^TA)^{-1}A^T$ is the pseudoinverse. We use this in the form $\overline{x} = A^+_L b$ which returns the solution to the Least-Squares problem. The inverse of the matrices used in $A^+_L$ may be difficult to calculate on their own, but it is once again possible to QR Factorize this matrix to simplify the process. The function begins by getting the matrix product of $A^TA$ and factoring it into $Q$ and $R$. We now have that $\overline{x} = (QR)^{-1}A^Tb = R^{-1}Q^{-1}A^Tb$. Since $Q{-1} = Q^T$ we can multiply it with $A^Tb$ and get $\overline{x} = R^{-1}b''$. We can then move over $R$ to the left hand side $R\overline{x} = b''$ and use the previously designed backwards substitution function to get the result.

In [None]:
def least_squares_problem(A:np.ndarray, b:np.ndarray):
    Q, R = modified_gram_schmidt_iteration(np.transpose(A) @ A)
    ainv = np.transpose(Q) @ (np.transpose(A) @ b)
    x = backward_substitution(R, ainv)
    return x

lsp_A = np.array([[2, -1], [-1, 2], [2, 1]])
lsp_b = np.array([1, 2, 1])

print(least_squares_problem(lsp_A, lsp_b))

[0.4 0.8]


# **Results**

In this section tests will be performed to verify that the solutions are correct

# Sparse Matrix-Vector Product
To make sure that we are calculating the matrices correctly we will generate a random sparse matrix. It will then be converted to work for the implementation, and then we will compare it to the matrix multiplication implemented in numpy. If difference $= 0 $ in all cases, the function is working.


In [None]:
def smvp_test(iters):
  for test in range(iters):
    m = np.random.randint(5, 10)
    n = np.random.randint(5, 10)
    sparse_matrix = []
    for _ in range(m):
        sparse_matrix.append([(np.random.randint(-10, 10) if np.random.random() < 0.25 else 0) for _ in range(n)])

    sparse_setup = [[], [], []]
    row_pointer_ind = 0
    for i, row in enumerate(sparse_matrix):
        sparse_setup[2].append(row_pointer_ind)
        for j, val in enumerate(row):
            if val != 0:
                sparse_setup[0].append(val)
                sparse_setup[1].append(j)
                row_pointer_ind += 1
    sparse_setup[2].append(row_pointer_ind)

    sparse_matrix = np.array(sparse_matrix)
    test_vector = [np.random.randint(-10, 10) for _ in range(n)]

    test_product = sparse_matrix_vector_product(sparse_setup, test_vector)
    validation_product = sparse_matrix @ np.array(test_vector)

    print(f"Test {test+1}: {np.array(test_product) - validation_product}" )

smvp_test(5)


Test 1: [0 0 0 0 0 0 0 0]
Test 2: [0 0 0 0 0 0 0 0 0]
Test 3: [0 0 0 0 0 0 0]
Test 4: [0 0 0 0 0 0]
Test 5: [0 0 0 0 0 0 0 0]



# QR Factorization

To make sure that the QR factorization is working correctly we check if $R$ is an upper triangular matrix. This can be done visually, but we iterate to check that all values under the diagonal are 0. We also check the frobenius norms $||Q^TQ-I||$ and $||QR-A||$ which should be colse to 0.

We generate matrices until we get an invertible one that can be used for the test.

In [None]:
def mgsi_test(iters):
    for test in range(iters):
        n = np.random.randint(4, 8)
        matrix = np.random.rand(n, n)
        while np.linalg.matrix_rank(matrix) != n:
            matrix = np.random.rand(n, n)

        Q, R = modified_gram_schmidt_iteration(matrix)

        upper_triangular = True
        for i, row in enumerate(R):
            if i != 0:
                for j in range(i):
                    if row[j] != 0:
                        upper_triangular = False
                        break
                if not upper_triangular:
                    break

        frob_qt = np.linalg.norm(Q.T @ Q - np.identity(n))
        frob_qr = np.linalg.norm(Q @ R - matrix)

        print(f"Test {test}: Upper Triangular = {upper_triangular}, ||Q^TQ-I|| = {frob_qt}, ||QR -A|| = {frob_qr}")


mgsi_test(5)

Test 0: Upper Triangular = True, ||Q^TQ-I|| = 1.612827105125927e-15, ||QR -A|| = 1.38989305853506e-16
Test 1: Upper Triangular = True, ||Q^TQ-I|| = 6.761801203493883e-16, ||QR -A|| = 2.34489445204375e-16
Test 2: Upper Triangular = True, ||Q^TQ-I|| = 5.916314568483542e-13, ||QR -A|| = 1.0838898772828417e-16
Test 3: Upper Triangular = True, ||Q^TQ-I|| = 3.708455557512007e-15, ||QR -A|| = 3.9631554108143483e-16
Test 4: Upper Triangular = True, ||Q^TQ-I|| = 2.563575312921917e-15, ||QR -A|| = 3.6424339576766555e-16



# Direct Solver

To test the direct solver we just have to see whether it has gotten the right answer. This can be done by checking if the norm of the left-hand side minus the right-hand side is close to zero, aka the residual $||Ax - b||$. We also manufacture our own solution $b = Ay$ to test so the answer the direct solver gives ($x_y$) is the same, which is done using the residual $||x_y - y||$

In [None]:
def ds_test(iters):
    for test in range(iters):
        n = np.random.randint(2, 10)
        matrix = np.random.rand(n, n)
        while np.linalg.matrix_rank(matrix) != n:
            matrix = np.random.rand(n, n)
        vector = np.random.rand(n)

        x = direct_solver(matrix, vector)
        res_axb = np.linalg.norm(matrix @ x - vector)

        y = np.random.rand(n)
        x_y = direct_solver(matrix, matrix @ y)
        res_xy = np.linalg.norm(x_y - y)
        print(f"Test {test}: ||Ax-b|| = {res_axb}, ||x_y-y|| = {res_xy}")


ds_test(5)


Test 0: ||Ax-b|| = 8.005932084973442e-16, ||x_y-y|| = 9.226046854752787e-15
Test 1: ||Ax-b|| = 1.7010341408823585e-15, ||x_y-y|| = 6.656664898175294e-14
Test 2: ||Ax-b|| = 1.2412670766236366e-16, ||x_y-y|| = 7.850462293418876e-16
Test 3: ||Ax-b|| = 2.220446049250313e-16, ||x_y-y|| = 1.2456232622454378e-16
Test 4: ||Ax-b|| = 3.780290039183201e-15, ||x_y-y|| = 1.9188705342173193e-14



# Least-Squares Problem (bonus)

To see if our Least-Squares implementation is working we will use the residual $||A\overline{x}-b||$. The residual on its own will not tell us that much, however, considering that the result will not be exact. We will therefore have to refer to a known good solution of the problem: the function in numpy that solves the least-squares problem. By comparing the residual we get to the residual the function gets, we can see if both agree, in which case it is highly probable to be the correct solution.

In [None]:
def lsp_test(iters):
    for test in range(iters):
        n = np.random.randint(5, 10)
        m = n + np.random.randint(5, 10)
        matrix = np.random.rand(m, n)
        vector = np.random.rand(m)

        x = least_squares_problem(matrix, vector)
        res_axb = np.linalg.norm(matrix @ x - vector)

        x_np = np.linalg.lstsq(matrix, vector, rcond=None)
        res_axb_np = np.linalg.norm(matrix @ x_np[0] - vector)

        print(f"Test {test}: Own = {res_axb}, numpy = {res_axb_np}, diff = {res_axb - res_axb_np}")


lsp_test(5)

Test 0: Own = 0.9660874836445156, numpy = 0.9660874836445157, diff = -1.1102230246251565e-16
Test 1: Own = 1.0409659470914951, numpy = 1.0409659470914951, diff = 0.0
Test 2: Own = 0.750878647739558, numpy = 0.750878647739558, diff = 0.0
Test 3: Own = 0.37202665725379663, numpy = 0.3720266572537966, diff = 5.551115123125783e-17
Test 4: Own = 0.9619465162838096, numpy = 0.9619465162838097, diff = -1.1102230246251565e-16


# **Discussion**

The results of the methods were about as expected as possible. They are very close to the example algorithms provided in the book and therefore I have no reason to believe that they should not work. While only a few of the test cases got to a value of exactly zero (which would be the perfect proof) all of them came close enough to zero to say that they are well within tolerances.