# Cholesky Factorization

## Introduction
Let's talk about why we need Cholesky Factorization first. Matrices can have different structures: lower triangular, upper triangular, diagonal, tridiagonal, pentadiagonal, and so on. As we showed before in LU factorization, with proper use of matrix structure, we can improve our efficiency in solving a linear system.

## Tridiagonal Matrix
To reinforce this point, let us consider another example: the tridiagonal matrix.

A tridiagonal matrix is in this form:

$$
\mathbf{M} = \begin{bmatrix}
y_1    & z_1    & 0      & \cdots & 0      \\
x_2    & y_2    & z_2    & \ddots & \vdots \\
0      & \ddots & \ddots & \ddots & 0      \\
\vdots & \ddots & x_{n-1} & y_{n-1} & z_{n-1} \\
0      & \cdots & 0      & x_n    & y_n
\end{bmatrix}
$$

Obviously, this linear system will be easy to solve, and we can only use three vectors to store the whole stuff. To solve such system, we need to use Thomas algorithm, which only have a computational complexity of $\mathcal{O}(n)$. Thomas algorithm is a special case of LU factorization.

In [20]:
import numpy as np

def thomas_algorithm(a, b, c, RHS):
    '''
    Thomas Algorithm for solving tridiagonal systems
    Input:
    Three diagonals: a, b, c
    Right hand side: RHS
    Output: solution x
    '''
    def thomas_LU(a, b, c):
        '''
        Thomas LU Factorization
        Input:
        Three diagonals: a, b, c (a is mid, b is upper side, c is lower side)
        Output:
        Factorized LU matrix, with stored as diagonal u1, u2, l
        '''
        u1 = np.array(a, dtype=float).reshape(-1)
        u2 = np.array(b, dtype=float).reshape(-1)
        c = np.array(c, dtype=float).reshape(-1)
        l = np.zeros(u2.size)
        n = u1.size
        for i in range(n - 1):
            l[i] = c[i] / u1[i]
            u1[i + 1] = u1[i + 1] - l[i] * u2[i]

        return u1, u2, l                

    def thomas_substitution(mid, side, RHS, is_fwd=1):
        '''
        Forward/backward substitution that only takes two vectors
        Input:
        Two diagonals and RHS
        Forward/backward selector
        Output:
        Solution x
        '''
        mid = np.array(mid, dtype=float).reshape(-1)
        side = np.array(side, dtype=float).reshape(-1)
        x = np.array(RHS, dtype=float).reshape(-1)
        n = mid.size
        if is_fwd == 1:
            start = 1
            end = n
            step = 1
            x[0] = x[0] / mid[0]
        else:
            start = -2
            end = - n - 1
            step = -1
            x[-1] = x[-1] / mid[-1]
        for i in range(start, end, step):
            x[i] = x[i] - side[i - step] * x[i - step]
            x[i] = x[i] / mid[i]
        return x
    
    u1, u2, l = thomas_LU(a, b, c)
    n = a.size
    l_mid = np.ones(n)

    y = thomas_substitution(l_mid, l, RHS)
    x = thomas_substitution(u1, u2, y, is_fwd=0)

    return x.reshape(-1, 1)


Then, we solve a system with
$$
A = \begin{bmatrix}
4 & 1 & 0 & 0 \\
1 & 4 & 1 & 0 \\
0 & 1 & 4 & 1 \\
0 & 0 & 1 & 4
\end{bmatrix}\quad
b = \begin{bmatrix}
6 \\
12 \\
18 \\
19
\end{bmatrix}
$$

In [21]:
sub_diag = np.array([1, 1, 1])
main_diag = np.array([4, 4, 4, 4])
sup_diag = np.array([1, 1, 1])
b = np.array([6, 12, 18, 19])

x = thomas_algorithm(main_diag, sup_diag, sub_diag, b)

print(x)

[[1.]
 [2.]
 [3.]
 [4.]]


## Cholesky Factorization
Symmetry and positive definite matrix is also a good structure for a matrix, which can be commonly seen in mechanical and electrical engineering. Symmetry matrix is easy to understand, where $\textbf{A}=\textbf{A}^T$. But what is a positive definite matrix?


### Positive Definite Matrix
Positive definite matrix is defined as:
$$\textbf{x}^T\textbf{Ax}>0,\quad\forall\textbf{x}\ne\textbf{0}\in\mathbb{R}^n$$
We also have semi-definite matrix (which is a weak form):
$$\textbf{x}^T\textbf{Ax}\ge0$$

> **Theorem** > If a symmetry matrix $\textbf{A}$ has no negative or zero eigenvalues, then we can say this matrix is symmetry and positive definite.

**Proof**:

For the positive definite, we need to prove that $\forall\textbf{x}\neq\textbf{0}\in\mathbb{R}^n$, $\textbf{x}^T\textbf{Ax}>0$. Since this matrix is verified as a symmetric matrix, we can use the Eigendecomposition $\textbf{A}=\textbf{V}\boldsymbol{\Lambda}\textbf{V}^T$ to show that $\textbf{V}^T\textbf{AV}=\boldsymbol{\Lambda}$. We know that eigenvectors for a full-ranked symmetry matrix should be orthonormal to each other, i.e., $\mathbb{R}^n=\text{span}(\textbf{V})$. We can write them further:

$$
\begin{align*}
\textbf{V}&=\begin{bmatrix} \textbf{v}_1 & \textbf{v}_2 & \cdots & \textbf{v}_n \end{bmatrix} \\
\textbf{x}&=a_1\textbf{v}_1+a_2\textbf{v}_2+\cdots+a_n\textbf{v}_n \\
&=\begin{bmatrix} \textbf{v}_1 & \textbf{v}_2 & \cdots & \textbf{v}_n \end{bmatrix}
\begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_n \end{bmatrix} \\
&= \textbf{V}\textbf{a}
\end{align*}
$$

Plug this into $\textbf{x}^T\textbf{Ax}$ and we can get:

$$
\begin{align*}
\textbf{x}^T\textbf{Ax} &= (\textbf{Va})^T\textbf{AVa} \\
&=\textbf{a}^T\textbf{V}^T\textbf{AVa} \\
&=\textbf{a}^T\boldsymbol{\Lambda}\textbf{a} \\
&=a_1^2\lambda_1+a_2^2\lambda_2+\cdots+a_n^2\lambda_n
\end{align*}
$$

Where $\textbf{x}^T\textbf{Ax}>0$ if and only if $\lambda_i>0$ for arbitrary $i$.

### Congruence Transformation
> **Theorem** > If $A \in \mathbb{R}^{n \times n}$ is symmetric positive definite and $H \in \mathbb{R}^{n \times q}$ has full column rank, then $H^T AH$ is symmetric positive definite.

This theorem is easy to prove, you can verify it yourself. The main problem is: why we need this theorem?

Consider a quadratic system $\textbf{E}(\textbf{x})=\textbf{x}^T\textbf{Ax}$. In practice, we usually won't work with the variable $\textbf{x}$ directly. Instead, we will have a transformation matrix applied to some variables $\textbf{y}$, which will be written as $\textbf{x}=\textbf{Hy}$. Then, the transformation of the matrix becomes $\textbf{H}^T\textbf{AH}$.

There are many other applications for Congruence Transformation, I will not list all of them here.

### Cholesky Factorization
A positive definite matrix can be factorized by:

$$\textbf{A}=\textbf{R}^T\textbf{R}$$

Where $\textbf{R}$ is an upper triangular matrix. If $\textbf{A}$ is a $2\times 2$ matrix, we can easily get the factorization analytically by setting and solving parameters for $\textbf{R}$ and $\textbf{A}$:

$$
\mathbf{A} = \begin{bmatrix}
a & b \\
b & c
\end{bmatrix}, \quad
\mathbf{R} = \begin{bmatrix}
\sqrt{a} & u \\
0 & v
\end{bmatrix}
$$

Where $u = \frac{b}{\sqrt{a}}$ and $v = \sqrt{c - u^2}$.

Now, let's consider the general case here ($n\times n$).

$$
\mathbf{A} = \left[ 
\begin{array}{c|c}
a_1 & \mathbf{b}_1^\top \\
\hline
\mathbf{b}_1 & \mathbf{C}_1
\end{array} 
\right]
$$

Now set the relationship between $\textbf{R}$ and $\textbf{A}$:

$$
\underbrace{
\left[ \begin{array}{c|c}
\sqrt{a_1} & \mathbf{0} \\
\hline
\mathbf{u}_1 & \mathbf{V}_1^\top
\end{array} \right]
}_{\mathbf{R}_1^\top}
\underbrace{
\left[ \begin{array}{c|c}
\sqrt{a_1} & \mathbf{u}_1^\top \\
\hline
\mathbf{0} & \mathbf{V}_1
\end{array} \right]
}_{\mathbf{R}_1}
=
\left[ \begin{array}{c|c}
a_1 & \mathbf{b}_1^\top \\
\hline
\mathbf{b}_1 & \mathbf{V}_1^\top \mathbf{V}_1 + \mathbf{u}_1 \mathbf{u}_1^\top
\end{array} \right]
=
\left[ \begin{array}{c|c}
a_1 & \mathbf{b}_1^\top \\
\hline
\mathbf{b}_1 & \mathbf{C}_1
\end{array} \right]
= \mathbf{A}
$$

Where $\textbf{u}_1=\frac{\textbf{b}_1}{\sqrt{a_1}}$ and $\textbf{A}_1=\mathbf{V}_1^\top \mathbf{V}_1=\textbf{C}_1-\mathbf{u}_1 \mathbf{u}_1^\top$.

$\textbf{A}_1=\mathbf{V}_1^\top \mathbf{V}_1$ can be considered as a sub-factorization. By repeating this step until $\textbf{A}_{n-1}$ is a $2\times 2$ matrix ($2\times 2$ matrix can be factorized, as mentioned before).

In [26]:
def cholesky_factorization(A):
    '''
    Cholesky Factorization
    Input:
    Symmetry and positive definite matrix A
    Output:
    Factorized upper triangular matrix R
    '''
    A = np.array(A, dtype=float)
    n = A.shape[0]
    for i in range(n):
        A[i, i] = np.sqrt(A[i, i])
        A[i, i + 1:] = A[i, i + 1:] / A[i, i]
        for j in range(i + 1, n):
            for k in range(i + 1, n):
                A[j, k] = A[j, k] - A[i, j] * A[i, k]
    R = np.triu(A, k=0)
    return R

In [30]:
A = np.array([
    [4, 12, -16],
    [12, 37, -43],
    [-16, -43, 98]])

R = cholesky_factorization(A)
print("Factorized matrix R:")
print(R)
print(f"Did the decomposition process succeed? {(R.T @ R == A).all()}")

Factorized matrix R:
[[ 2.  6. -8.]
 [ 0.  1.  5.]
 [ 0.  0.  3.]]
Did the decomposition process succeed? True


### Symmetric Indefinite Systems
If $\textbf{A}$ is a symmetric indefinite system, Cholesky Factorization is not applicable. The LDL factorization is used, where
$$\textbf{PAP}^T=\textbf{LDL}^T$$

In [None]:
# Code for LDL factorization (Not completed yet)

### Summary Table
<center>

| Matrix structure | Solver | Cost |
| :--- | :--- | :--- |
| Tridiagonal | Thomas algorithm | $\mathcal{O}(n)$ |
| Symmetric and positive definite (SPD) | Cholesky | $\mathcal{O}(n^3/3)$ |
| General dense | LU with pivoting | $\mathcal{O}(n^3)$ |

</center>