## QR Factorization

### Introduction

QR Factorization is another important concept for the "factorization" of a matrix. The characteristic is that such factorization imposes no requirements on the matrix, whether you are dealing with a non-square matrix or a non-full-rank matrix (but we still need to avoid a non-full-rank matrix).

It can be written as:

$$\textbf{A}=\textbf{QR}$$

Where $\textbf{Q}$ contains the orthonormal basis, and $\textbf{R}$ is an upper-triangular matrix. If $\textbf{A}$ is a non-squared tall matrix ($m\times n$ matrix, $m>n$), we can also write it in this way:

$$
\textbf{A}=
\begin{bmatrix}
\textbf{Q}_1 & \textbf{Q}_2
\end{bmatrix}
\begin{bmatrix}
\textbf{R}_1 \\ 0
\end{bmatrix}=\textbf{Q}_1\textbf{R}_1
$$

### Gram-Schmidt Algorithm

We have already mentioned the Gram-Schmidt orthogonality in [linear algebra notes](../basic_knowledges/linear_algebra.md). And now we will use this idea to do the QR factorization.

In [39]:
import numpy as np

def qr_gram_schmidt(A):
    '''
    QR Factorization via Gram-Schmidt Algorithm
    Input:
    Full ranked matrix A
    Output:
    Factorized Q and R
    '''
    A = np.array(A, dtype=float)
    row = A.shape[0]
    col = A.shape[1]
    Q = np.array(A, dtype=float)
    R = np.zeros((col, col))
    for i in range(col):
        for j in range(i):
            R[j, i] = Q[:, j].T @ A[:, i]
            Q[:, i] = Q[:, i] - R[j, i] * Q[:, j]
        R[i, i] = np.linalg.norm(Q[:, i])
        if R[i, i] == 0:
            raise ValueError("Stop: A is not full column rank.")
        Q[:, i] = Q[:, i] / (np.linalg.norm(Q[:, i]))
    
    return Q, R

### Modified Gram-Schmidt Algorithm

Instead of performing decompositions incrementally, we can also decompose the subsequent vectors all at once after each basis is computed.

In exact arithmetic, classical and modified algorithms are equivalent. But in floating-point arithmetic, the modified algorithm will be more stable.

In [17]:
def modified_qr(A):
    '''
    Modified QR Factorization
    Input:
    Full ranked matrix A
    Output:
    Factorized Q and R
    '''
    Q = np.array(A, dtype=float)
    row = Q.shape[0]
    col = Q.shape[1]
    R = np.zeros((col, col))
    for i in range(col):
        R[i, i] = np.linalg.norm(Q[:, i])
        if R[i, i] == 0:
            raise ValueError("Stop: A is not full column rank.")
        Q[:, i] = Q[:, i] / (np.linalg.norm(Q[:, i]))
        for j in range(i + 1, col):
            R[i, j] = Q[:, j].T @ Q[:, i]
            Q[:, j] = Q[:, j] - R[i, j] * Q[:, i]
    return Q, R

### Sensitivity analysis

([Reference](https://www.math.uci.edu/~ttrogdon/105A/html/Lecture23.html)) We can do some sensitivity analysis to the two methods, and we can find that analytically the error is eliminated. First of all, let's write the mathematical expression for two algorithms.

* Classical Algorithm
$$\mathbf{v}_j = \mathbf{x}_j - \sum_{i=1}^{j-1} (\mathbf{u}_i^T \mathbf{x}_j) \mathbf{u}_i$$

* Modified Algorithm

$$\mathbf{v}_j^{(k)} = \mathbf{v}_j^{(k-1)} - (\mathbf{u}_k^T \mathbf{v}_j^{(k-1)}) \mathbf{u}_k, \quad k < j, \quad \mathbf{v}_j^{(0)} = \mathbf{x}_j, \quad \mathbf{v}_j^{(j-1)} = \mathbf{v}_j$$

Let

$$\mathbf{\hat{u}}_i = \mathbf{u}_i + \delta \mathbf{u}_i$$

for classical algorithm and

$$\mathbf{\hat{u}}_k = \mathbf{u}_k + \delta \mathbf{u}_k$$

for modified algorithm

**Classical:**

$$
\begin{align*}
\mathbf{\hat{v}}_j &= \mathbf{x}_j - \sum_{i=1}^{j-1} (\mathbf{u}_i + \delta \mathbf{u}_i)^T \mathbf{x}_j (\mathbf{u}_i + \delta \mathbf{u}_i) \\
&= \mathbf{x}_j - \sum_{i=1}^{j-1} (\mathbf{u}_i^T \mathbf{x}_j) \mathbf{u}_i - \left[ \sum_{i=1}^{j-1} (\delta \mathbf{u}_i^T \mathbf{x}_j \mathbf{u}_i) + \sum_{i=1}^{j-1} (\mathbf{u}_i^T \mathbf{x}_j \delta \mathbf{u}_i) \right] \\
&= \mathbf{v}_j + \mathcal{O}(j)
\end{align*}
$$

**Modified:** 

$$
\begin{align*}
\mathbf{\hat{v}}_j^{(1)} &= \mathbf{x}_j - (\mathbf{u}_1 + \delta \mathbf{u}_1)^T \mathbf{x}_j (\mathbf{u}_1 + \delta \mathbf{u}_1) \\
&= \mathbf{x}_j - \mathbf{u}_1^T \mathbf{x}_j \mathbf{u}_1 - (\delta \mathbf{u}_1^T \mathbf{x}_j \mathbf{u}_1 + \mathbf{u}_1^T \mathbf{x}_j \delta \mathbf{u}_1) \\
&= \mathbf{v}_j^{(1)} - \delta \mathbf{v}_j^{(1)} \\
\mathbf{\hat{v}}_j^{(2)} &= \mathbf{\hat{v}}_j^{(1)} - (\mathbf{u}_2 + \delta \mathbf{u}_2)^T \mathbf{\hat{v}}_j^{(1)} (\mathbf{u}_2 + \delta \mathbf{u}_2) \\
&= \mathbf{\hat{v}}_j^{(1)} - (\mathbf{u}_2^T + \delta \mathbf{u}_2^T) (\mathbf{v}_j^{(1)} - \delta \mathbf{v}_j^{(1)}) (\mathbf{u}_2 + \delta \mathbf{u}_2) \\
&= \mathbf{v}_j^{(1)} - \delta \mathbf{v}_j^{(1)} - \mathbf{u}_2^T \mathbf{v}_j^{(1)} \mathbf{u}_2 - (\delta \mathbf{u}_2^T \mathbf{v}_j^{(1)} \mathbf{u}_2 + \mathbf{u}_2^T \mathbf{v}_j^{(1)} \delta \mathbf{u}_2 - \mathbf{u}_2^T \delta \mathbf{v}_j^{(1)} \mathbf{u}_2)
\end{align*}
$$

Since $\mathbf{u}_2^T \delta \mathbf{v}_j^{(1)} \mathbf{u}_2 = \delta \mathbf{v}_j^{(1)} \mathbf{u}_2^T \mathbf{u}_2 = \delta \mathbf{v}_j^{(1)}$, two terms canceled out, then:

$$\mathbf{\hat{v}}_j^{(2)} = \mathbf{v}_j^{(2)} - \delta \mathbf{v}_j^{(2)}$$

Repeat this step and we will finally get:

$$
\begin{align*}
\mathbf{\hat{v}}_j &= \mathbf{v}_j - \delta \mathbf{v}_j^{(j-1)} \\
&= \mathbf{v}_j + \mathcal{O}(1)
\end{align*}
$$

In [44]:
from scipy.linalg import hilbert
n = 200
A = 0.00001* np.eye(n) + hilbert(n)
Q1, R1 = qr_gram_schmidt(A)
Q2, R2 = modified_qr(A)
# Measure the loss of orthogonality
print(f"Classical: {np.linalg.norm(np.eye(n) - Q1.T @ Q1)}")
print(f"Modified: {np.linalg.norm(np.eye(n) - Q2.T @ Q2)}")

Classical: 181.20932543528096
Modified: 5.962397721037823e-11
