In [1]:
using DrWatson;
@quickactivate "MATH361Lectures"
using LinearAlgebra;
import MATH361Lectures;

# Cholesky Factorization

In many practical applications (*e.g.* numerical approximation of solutions to linear PDEs) the matrix $A$ in the linear system $Ax=b$ has a special structure. In this lecture, we are interested in the case when $A$ is what is known as **symmetric and positive definite** (SPD). 

A matrix $A$ is symmetric positive definite if

1) $A = A^{T}$ (that is, $A$ is symmetric), and

2) $x^{T}Ax > 0$ for all nonzero vectors $x$

Symmetric positive definite matrices satisfy some interesting and useful properties. We state the following theorem but leave the proofs of each part as either an exercise or as part of a linear algebra course. 

If $A$ is SPD, then

i) all the diagonal entries of $A$ are positive, 

ii) all the eigenvalues of $A$ are positive,

iii) the determinant of $A$ is positive, and

iv) every submatrix $B$ of $A$ obtained by deleting any set of rows and the corresponding set of columns from $A$ is SPD. 

Our major result is the following: 

> If $A$ is SPD, then there is a unique lower triangular matrix with positive diagonal entries that satisfies $A=LL^{T}$. 

The factorization $A=LL^{T}$ is called **Cholesky** factorization. 

We will give a proof of the existence of the Cholesky factorization. The proof will by way of mathematical induction. 

**Question:** What is a proof by induction? 

### Proof of Existence of Cholesky Factorization

We begin with the base case, that is, when $A$ is a $1\times 1$ SPD matrix, then $A=\alpha$ where $\alpha > 0$. In this case, take $L=\sqrt{\alpha}$ so $L^{T}=\sqrt{\alpha}$ and obviously $LL^{T} = A$. 

Now we proceed with the induction step. Our induction hypothesis is that for all $n \leq N-1$ if $A$ is an $n\times n$ SPD matrix then $A$ possesses a Cholesky factorization. We will show that this implies that if $A$ is an $N\times N$ SPD matrix, then there is a unique lower triangular matrix with positive diagonal entries that satisfies $A=LL^{T}$. 

Observe that we may write $A$ as

$$A = \left[\begin{array}{@{}c|c@{}} A_{N-1} & b \\ \hline \\
b^{T} & a_{NN}\end{array}\right].$$

Now, we use the fact stated earlier that if $A$ is SPD, then every submatrix $B$ of $A$ obtained by deleting any set of rows and the corresponding set of columns from $A$ is SPD. In particular, this tells us that $A_{N-1}$ is SPD and of size $n \leq N-1$. Therefore, we have a Cholesky factorization $A_{N-1}=L_{N-1}L_{N-1}^{T}$. 

Thus, we will look for a matrix $L$ of the form

$$L = \left[\begin{array}{@{}c|c@{}} L_{N-1} & {\bf 0} \\ \hline \\
c^{T} & \alpha\end{array}\right],$$

that satisfies $LL^{T} = A$, that is

$$\left[\begin{array}{@{}c|c@{}} L_{N-1} & {\bf 0} \\ \hline \\
c^{T} & \alpha\end{array}\right]\left[\begin{array}{@{}c|c@{}} L_{N-1}^{T} & c \\ \hline \\
{\bf 0} & \alpha\end{array}\right] = \left[\begin{array}{@{}c|c@{}} A_{N-1} & b \\ \hline \\
b^{T} & a_{NN}\end{array}\right].$$

Computing the block matrix multiplication corresonding to $LL^{T}$ gives

$$LL^{T} = \left[\begin{array}{@{}c|c@{}} L_{N-1}L_{N-1}^{T} & L_{N-1}c \\ \hline \\
c^{T}L_{N-1}^{T} & c^{T}c + \alpha^2\end{array}\right] = \left[\begin{array}{@{}c|c@{}} A_{N-1} & b \\ \hline \\
b^{T} &  a_{NN}\end{array}\right].$$

Thus, we have the existence of a Cholesky factorization for $A$ provided

i) $L_{N-1}c = b$ has a unique solution, and

ii) $c^{T}c + \alpha^{2} = a_{NN}$ has a positive solution $\alpha$. 

Now, since $L_{N-1}$ is a lower triangular matrix with positive diagonal entries (by the induction hypothesis) $L_{N-1}c = b$  has a unique solution (which can be computed by forward substitution). Furthermore, $c^{T}c + \alpha^{2} = a_{NN}$ will have a positive solution $\alpha = \sqrt{a_{NN} - c^{T}c}$ provided $\alpha^{2} > 0$. We now demonstrate that this is the case. 

If $A=LL^{T}$ with $L$ as just constructed, then $0 < \det(A) = \det(LL^{T}) = \det(L)\det(L^{T})$. Now by the structure of $L$, we have that $\det(L) = \det(L_{N-1})\alpha$, so $0 < \det(L_{N-1})^{2}\alpha^{2}$ and since $\det(L_{N-1})^2 > 0$ we must have $\alpha^{2} > 0$. 

This completes the proof of the existence of a Cholesky factorization for any SPD matrix $A$. 

The question now is, how do we actually compute the Cholesky factorization of an SPD matrix $A$. That is, what is an algorithm we can implement on a computer. Our next goal is to derive such an algorithm. 

Consider a factorization for a $3 \times 3$ SPD matrix that looks as follows:

$$A = \left[\begin{array}{ccc} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{array}\right] = \left[\begin{array}{ccc} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{array}\right] \left[\begin{array}{ccc} l_{11} & l_{21} & l_{31} \\ 0 & l_{22} & l_{32} \\ 0 & 0 & l_{33} \end{array}\right] = \left[\begin{array}{ccc} l_{11}^{2} & l_{11}l_{21} & l_{11}l_{31} \\ l_{11}l_{21} & l_{21}^2+l_{22}^2 & l_{21}l_{31}+l_{22}l_{32} \\ l_{11}l_{31} & l_{21}l_{31} + l_{22}l_{32} & l_{31}^2 +l_{32^2}+ l_{33^3} \end{array}\right] $$

Comparing the far left matrix entries with the far right matrix entries leads to a system of equations:

$$
\begin{align}
l_{11}^2 &= a_{11}  \\
l_{11}l_{21} &= a_{21} \\
l_{11}l_{31} & = a_{31} \\
l_{21}^2 + l_{22}^2 & = a_{22} \\
l_{21}l_{31} + l_{22}l_{32} &= a_{32} \\
l_{31}^2 +l_{32^2}+ l_{33}^2 &= a_{33}
\end{align}
$$

Notice that we've ordered our equations by starting in the first column at the diagonal entry, then moving down the rows in the first column, then moving to the second column at the diagonal entry, and so on. Solving the equations in this order leads to an algorithm for Cholesky factorization. That is, we have

$$
\begin{align}
l_{11} &= \sqrt{a_{11}}  \\
l_{21} &= \frac{a_{21}}{l_{11}} \\
l_{31} & = \frac{a_{31}}{l_{11}} \\
l_{22} & = \sqrt{a_{22} - l_{21}^2} \\
l_{32} &= \frac{a_{32} - l_{21}l_{31}}{l_{22}} \\
l_{33} &= \sqrt{a_{33} - l_{31}^2 - l_{32}^2}
\end{align}
$$

We can describe the algorithm as follows:

Step 1: Input SPD matrix $A$.

Step 2: Initialize a matrix of same size as $A$ to store $L$.

Step 3: Loop over columns $k=1:n$, for each column set diagonal entry to be

$$L_{kk} = \sqrt{A_{kk} - \sum_{j=1}^{k-1}L_{kj}^2}$$

and while in column $k$ loop over rows $i=(k+1):n$ to get entries

$$L_{ik} = \frac{A_{ik} - \sum_{j=1}^{k-1}L_{ij}L_{kj}}{L_{kk}}.$$

Step 4: Return $L$.

We can make the implementation easier if we observe that

1) $\sum_{j=1}^{k-1}L_{kj}^2$ is the dot product of the first $k-1$ entries of row $k$ with itself, and

2) $\sum_{j=1}^{k-1}L_{ij}L_{kj}$ is the dot product of the first $k-1$ entries of row $i$ with the first $k-1$ entries of column $k$ which is the same as the first $k-1$ entries of row $k$ because the matrix is symmetric. 

We leave it as a homework exercise for you to code in Julia your own implementation of the previously described algorithm. We demonstrate Cholesky factorization by calling the implementation `chfact` from the `MATH361Lectures.jl` module and the implementation `cholesky` from `LinearAlgebra.jl`.

In [2]:
# Construct a random 5x5 matrix:
A = rand(5,5);
# Use A to obtain an SPD matrix:
A = A*A'

5×5 Matrix{Float64}:
 1.28795  1.25174  1.60007  1.18293  1.42488
 1.25174  1.6054   1.827    1.64017  1.41157
 1.60007  1.827    2.3785   1.95592  2.0428
 1.18293  1.64017  1.95592  2.1154   1.60528
 1.42488  1.41157  2.0428   1.60528  2.10498

In [3]:
eigvals(A)

5-element Vector{Float64}:
 0.03959190740509841
 0.07999628810651525
 0.3358852440766218
 0.6355436955441254
 8.401207394974998

In [4]:
La = MATH361Lectures.chfact(A)

5×5 Matrix{Float64}:
 1.13488  0.0        0.0       0.0       0.0
 1.10297  0.623581   0.0       0.0       0.0
 1.4099   0.436053   0.447806  0.0       0.0
 1.04234  0.786581   0.320056  0.554771  0.0
 1.25554  0.0428938  0.566996  0.146676  0.428679

In [5]:
C = cholesky(A);
Lb = C.L

5×5 LowerTriangular{Float64, Matrix{Float64}}:
 1.13488   ⋅          ⋅         ⋅         ⋅ 
 1.10297  0.623581    ⋅         ⋅         ⋅ 
 1.4099   0.436053   0.447806   ⋅         ⋅ 
 1.04234  0.786581   0.320056  0.554771   ⋅ 
 1.25554  0.0428938  0.566996  0.146676  0.428679

Let's check our results against $A$. 

In [6]:
La*La' - A

5×5 Matrix{Float64}:
 2.22045e-16  0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0

In [7]:
Lb*Lb' - A

5×5 Matrix{Float64}:
 2.22045e-16  0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0
 0.0          0.0  0.0  0.0  0.0

Do you agree or disagree that we have successfully obtain a Cholesky factorization for our test example? An interesting question is, what happens if we try to compute the Cholesky factorization of a matrix that is not SPD? You will explore this in the homework exercises. For now, let's check that the matirix $A$ we generated satisfies the properties of an SPD matrix.

In [8]:
eigen(A)

Eigen{Float64, Float64, Matrix{Float64}, Vector{Float64}}
values:
5-element Vector{Float64}:
 0.03959190740509846
 0.07999628810651524
 0.33588524407662124
 0.6355436955441245
 8.401207394974998
vectors:
5×5 Matrix{Float64}:
 -0.241251   0.65842    -0.507256  -0.348644   -0.35975
  0.661944  -0.208908   -0.502929   0.306351   -0.414003
 -0.565127  -0.620765   -0.104905  -0.0856245  -0.526256
 -0.160831   0.370502    0.42008    0.671726   -0.45736
  0.397985   0.0148415   0.549801  -0.571007   -0.461581

Note that $A$ has all positive eigenvalues. 

We may use Cholesky factorization to solve a linear system $Ax=b$ whenever $A$ is SPD. 

1) We factorize $A = LL^T$, 

2) Solve $Ly=b$ via forward substitution, and

3) Solve $L^T x = y$ via backward substitution. 

In [9]:
# Take A as before
A

5×5 Matrix{Float64}:
 1.28795  1.25174  1.60007  1.18293  1.42488
 1.25174  1.6054   1.827    1.64017  1.41157
 1.60007  1.827    2.3785   1.95592  2.0428
 1.18293  1.64017  1.95592  2.1154   1.60528
 1.42488  1.41157  2.0428   1.60528  2.10498

In [10]:
# Choose a right hand side vector
b = [1.0, 1.0, 1.0, 1.0, 1.0]

5-element Vector{Float64}:
 1.0
 1.0
 1.0
 1.0
 1.0

In [11]:
# Obtain Cholesky factorization
La = MATH361Lectures.chfact(A);
# Do forward substitution
y = MATH361Lectures.forwardsub(La,b);
# Do backward substitution
x1 = MATH361Lectures.backsub(La',y)

5-element Vector{Float64}:
  1.5263518666581217
  1.3047703270239335
 -2.796779415653074
  0.5253532975460999
  0.8804205440646654

In [12]:
# Check answer
A*x1

5-element Vector{Float64}:
 1.0
 1.0
 0.9999999999999997
 1.0000000000000004
 0.9999999999999996

In [13]:
# Compare with using Julia Cholesky and also backslash
# Do forward substitution
y = MATH361Lectures.forwardsub(Lb,b);
# Do backward substitution
x2 = MATH361Lectures.backsub(Lb',y)

5-element Vector{Float64}:
  1.526351866658122
  1.3047703270239321
 -2.796779415653073
  0.5253532975461003
  0.8804205440646649

In [14]:
A*x2

5-element Vector{Float64}:
 0.9999999999999996
 0.9999999999999998
 0.9999999999999996
 0.9999999999999998
 0.9999999999999998

In [15]:
x3 = A \ b

5-element Vector{Float64}:
  1.5263518666581186
  1.3047703270239308
 -2.7967794156530665
  0.525353297546098
  0.8804205440646633

In [16]:
x1 ≈ x2

true

In [17]:
x1 ≈ x3

true

In [18]:
x2 ≈ x3

true

Starting in the next lecture, we examine the problem of solving a linear system $Ax = b$ where $A$ is a rectangular $m \times n$ matrix with $m > n$. In order to do this, we will use another matrix factorization known as $QR$ factorization. In preparation for this, it is recommended that the watch the following videos on [fitting data by least squares](https://youtu.be/F6RN_X5-sFU), [the normal equation](https://youtu.be/_lQHgJOuy90), and [QR factorization](https://youtu.be/9iA8P1mg170). 