In [5]:
using DrWatson;
@quickactivate "MATH361Lectures"
using LinearAlgebra;
import MATH361Lectures;

# Cholesky Factorization

In many practical applications (*e.g.* numericla approximation of linear PDEs) the matrix $A$ in the linear system $Ax=b$ has a special structure. In this lecture, we are interested in the case when $A$ is what is known as **symmetric and positive definite** (SPD). 

A matrix $A$ is symmetric positive definite if

1) $A = A^{T}$ (that is, $A$ is symmetric), and

2) $x^{T}Ax > 0$ for all nonzero vectors $x$

Symmetric positive definite matrices satisfy some interesting and useful properties. We state the following theorem but leave the proofs of each part as either an exercise or as part of a linear algebra course. 

If $A$ is SPD, then

i) all the diagonal entries of $A$ are positive, 

ii) all the eigenvalues of $A$ are positive,

iii) the determinant of $A$ is positive, and

iv) every submatrix $B$ of $A$ obtained by deleting any set of rows and the corresponding set of columns from $A$ is SPD. 

Our major result is the following: 

> If $A$ is SPD, then there is a unique lower triangular matrix with positive diagonal entries that satisfies $A=LL^{T}$. 

The factorization $A=LL^{T}$ is called **Cholesky** factorization. 

We will give a proof of the existence of the Cholesky factorization. The proof we will by way of mathematical induction. 

**Question:** What is a proof by induction? 

### Proof of Existence of Cholesky Factorization

We begin with the base case, that is, when $A$ is a $1\times 1$ SPD matrix so that $A=\alpha$ where $\alpha > 0$. In this case, take $L=\sqrt{\alpha}$ so $L^{T}=\sqrt{\alpha}$ and obviously $LL^{T} = A$. 

Now we proceed with the induction stype. Out induction hypothesis is that for all $n \leq N-1$ if $A$ is an $n\times n$ SPD matrix then $A$ possesses a Cholesky factorization. We will show that is $A$ is an $N\times N$ SPD matrix, then there is a unique lower triangular matrix with positive diagonal entries that satisfies $A=LL^{T}$. 

Observe that we may write $A$ as

$$A = \left[\begin{array}{@{}c|c@{}} A_{N-1} & b \\ \hline \\
b^{T} & a_{NN}\end{array}\right].$$

Now, we use the fact stated earlier that if $A$ is SPD, then every submatrix $B$ of $A$ obtained by deleting any set of rows and the corresponding set of columns from $A$ is SPD. In particular, this tells us that $A_{N-1}$ is SPD and of size $n \leq N-1$. Therefore, we have a Cholesky factorization $A_{N-1}=L_{N-1}L_{N-1}^{T}$. 

Thus, we will look for a matrix $L$ of the form

$$L = \left[\begin{array}{@{}c|c@{}} L_{N-1} & {\bf 0} \\ \hline \\
c^{T} & \alpha\end{array}\right],$$

that satisfies $LL^{T} = A$, that is

$$\left[\begin{array}{@{}c|c@{}} L_{N-1} & {\bf 0} \\ \hline \\
c^{T} & \alpha\end{array}\right]\left[\begin{array}{@{}c|c@{}} L_{N-1}^{T} & c \\ \hline \\
{\bf 0} & \alpha\end{array}\right] = \left[\begin{array}{@{}c|c@{}} A_{N-1} & b \\ \hline \\
b^{T} & a_{NN}\end{array}\right].$$

Computing the blosk matrix multiplication corresonding to $LL^{T}$ gives

$$LL^{T} = \left[\begin{array}{@{}c|c@{}} L_{N-1}L_{N-1}^{T} & L_{N-1}c \\ \hline \\
c^{T}L_{N-1}^{T} & c^{T}c + \alpha^2\end{array}\right] = \left[\begin{array}{@{}c|c@{}} A_{N-1} & b \\ \hline \\
b^{T} &  a_{NN}\end{array}\right].$$

Thus, we have the existence of a Cholesky factorization for $A$ provided

i) $L_{N-1}c = b$ has a unique solution, and

ii) $c^{T}c + \alpha^{2} = a_{NN}$ has a positive solution $\alpha$. 

Now, since $L_{N-1}$ is a lower triangular matrix with positive diagonal entries (by the induction hypothesis) $L_{N-1}c = b$  has a unique solution (which can be computed by forward substitution). Furthermore, $c^{T}c + \alpha^{2} = a_{NN}$ will have a positive solution $\alpha = \sqrt{a_{NN} - c^{T}c}$ provided $\alpha^{2} > 0$. We show now that this is the case. 

If $A=LL^{T}$ with $L$ as just constructed, then $0 < \det(A) = \det(LL^{T}) = \det(L)\det(L^{T})$. Now by the structure of $L$, we have that $\det(L) = \det(L_{N-1})\alpha$, so $0 < \det(L_{N-1})^{2}\alpha^{2}$ and since $\det(L_{N-1})^2 > 0$ we must have $\alpha^{2} > 0$. 

This completes the proof of the existence of a Cholesky factorization for any SPD matrix $A$. 

The question now is, how do we actually compute the Cholesky factorization of an SPD matrix $A$. That is, what is an algorithm we can implement on a computer. Our next goal is to derive such an algorithm. 

Consider a factorization for a $3 \times 3$ SPD matrix that looks as follows:

$$A = \left[\begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right] = \left[\begin{array}{ccc} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{array}\right] \left[\begin{array}{ccc} l_{11} & l_{21} & l_{31} \\ 0 & l_{22} & l_{32} \\ 0 & 0 & l_{33} \end{array}\right] = \left[\begin{array}{ccc} l_{11}^{2} & l_{11}l_{21} & l_{11}l_{31} \\ l_{11}l_{21} & l_{21}^2+l_{22}^2 & l_{21}l_{31}+l_{22}l_{32} \\ l_{11}l_{31} & l_{21}l_{31} + l_{22}l_{32} & l_{31}^2 +l_{32^2}+ l_{33^3} \end{array}\right] $$

Comparing the far left matrix entries with the far right matrix entries leads to a system of equations:

$$
\begin{align}
l_{11}^2 &= a_{11}  \\
l_{11}l_{21} &= a_{21} \\
l_{11}l_{31} & = a_{31} \\
l_{21}^2 + l_{22}^2 & = a_{22} \\
l_{21}l_{31} + l_{22}l_{32} &= a_{32} \\
l_{31}^2 +l_{32^2}+ l_{33}^2 &= a_{33}
\end{align}
$$

Notice that we've ordered our equations by starting in the first column at the diagonal entry, then moving down the rows in the first column, then moving to the second column at the diagonal entry, and so on. Solving the equations in this order leads to an algorithm for Cholesky factorization. That is, we have

$$
\begin{align}
l_{11} &= \sqrt{a_{11}}  \\
l_{21} &= \frac{a_{21}}{l_{11}} \\
l_{31} & = \frac{a_{31}}{l_{11}} \\
l_{22} & = \sqrt{a_{22} - l_{21}^2} \\
l_{32} &= \frac{a_{32} - l_{21}l_{31}}{l_{22}} \\
l_{33} &= \sqrt{a_{33} - l_{31}^2 - l_{32}^2}
\end{align}
$$

We can describe the algorithm as follows:

Step 1: Input SPD matrix $A$.

Step 2: Initialize a matrix of same size as $A$ to store $L$.

Step 3: Loop over columns $k=1:n$, for each column set diagonal entry to be

$$L_{kk} = \sqrt{A_{kk} - \sum_{j=1}^{k-1}L_{kj}^2}$$

and while in column $k$ loop over rows $i=(k+1):n$ to get entries

$$L_{ik} = \frac{A_{ik} - \sum_{j=1}^{k-1}L_{ij}L_{kj}}{L_{kk}}.$$

Step 4: Return $L$.

We can make the implementation easier if we observe that

1) $\sum_{j=1}^{k-1}L_{kj}^2$ is the dot product of the first $k-1$ entries of row $k$ with itself, and

2) $\sum_{j=1}^{k-1}L_{ij}L_{kj}$ is the dot product of the first $k-1$ entries of row $i$ with the first $k-1$ entries of column $k$ which is the same as the first $k-1$ entries of row $k$ because the matrix is symmetric. 

We leave it as a homework exercise for you to code in Julia your own implementation of the previously described algorithm. We demonstrate Cholesky factorization by calling the implementation `chfact` from the `MATH361Lectures.jl` module and the implementation `cholesky` from `LinearAlgebra.jl`.

In [7]:
# Construct a random 5x5 matrix:
A = rand(5,5);
# Use A to obtain an SPD matrix:
A = A*A'

5×5 Matrix{Float64}:
 2.41122  1.2805    1.49452   1.73048  1.2112
 1.2805   0.84895   0.880233  0.84553  0.566008
 1.49452  0.880233  1.70211   1.4214   1.03861
 1.73048  0.84553   1.4214    1.60412  1.14013
 1.2112   0.566008  1.03861   1.14013  0.932787

In [8]:
La = MATH361Lectures.chfact(A)

5×5 Matrix{Float64}:
 1.55281    0.0       0.0       0.0       0.0
 0.824635   0.411008  0.0       0.0       0.0
 0.962463   0.210587  0.855234  0.0       0.0
 1.11442   -0.178733  0.451862  0.355049  0.0
 0.780008  -0.187864  0.382866  0.18107   0.331222

In [11]:
C = cholesky(A);
Lb = C.L

5×5 LowerTriangular{Float64, Matrix{Float64}}:
 1.55281     ⋅         ⋅         ⋅         ⋅ 
 0.824635   0.411008   ⋅         ⋅         ⋅ 
 0.962463   0.210587  0.855234   ⋅         ⋅ 
 1.11442   -0.178733  0.451862  0.355049   ⋅ 
 0.780008  -0.187864  0.382866  0.18107   0.331222

Let's check our results against $A$. 

In [13]:
La*La' - A

5×5 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0

In [14]:
Lb*Lb' - A

5×5 Matrix{Float64}:
 0.0          0.0  0.0  0.0  2.22045e-16
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 2.22045e-16  0.0  0.0  0.0  0.0

Do you agree or disagree that we have successfully obtain a Cholesky factorization for our test example? An interesting question is, what happens if we try to compute the Cholesky factorization of a matrix that is not SPD? You will explore this in the homework exercises. For now, let's check that the matirix $A$ we generated satisfies the properties of an SPD matrix.

In [15]:
eigen(A)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
5-element Vector{Float64}:
 0.043223439787548584
 0.07221000047842457
 0.3390762969685256
 0.6045238135368313
 6.440146014170182
vectors:
5×5 Matrix{Float64}:
  0.52953     0.0669776  -0.144481  -0.596917  -0.581314
 -0.633424   -0.243152    0.522474  -0.410348  -0.313508
  0.309818    0.0510467   0.584665   0.588618  -0.461631
 -0.463809    0.584111   -0.405841   0.219978  -0.480206
 -0.0852376  -0.769805   -0.446774   0.283624  -0.346534

Note that $A$ has all positive eigenvalues. 