In [1]:
import numpy as np
import matplotlib.pyplot as plt

# for pretty printing
np.set_printoptions(4, linewidth=100, suppress=True)

----

## Chapter 7

### Cholesky Decomposition

For a positive definite matrix $A \in \mathbb{R}^{n\times n}$, the Cholesky decomposition $A = R^\top R$ is a factorization of $A$ into a product of an upper triangular matrix $R$ and its transpose.



Perhaps the most difficult thing about the Cholesky decomposition is how to pronounce the name *Cholesky*. The pronounciation of the *-olesky* part is quite evident; the problem is the leading constant *Ch-*.

The English Wikipedia argues that the pronounciation should be something like *show-less-key*.

The [Korean Mathematical Society Dictionary of Mathematics](https://www.kms.or.kr/mathdict/list.html) does not have an entry with the word *Cholesky* in it. (I mean, really...)

The [Korean Statistical Society Dictionary of Statistics](https://kss.or.kr/homepage/custom/statistics) states that 콜레스키(*/kol-les-key/*) is the correct Korean transliteration of *Cholesky*.

The Japanese and Chinese transliterations of the name *Cholesky* are コレスキー(*/ko-re-su-ki/*) and 科列斯基(roughly */kuh-ly'eh-soo-jee/*), respectively.

We will leave it for you to decide by yourself.

&nbsp;

The second most difficult thing might be how one can actually compute a Cholesky decomposition given a positive definite matrix.

To at least overcome the second largest difficulty, let us try implementing a Cholesky factorization from scratch.

The motivation is similar to the Gaussian elimination: first we eliminate all entries in the first column except the $(1, 1)$-entry $a_{11}$ by multiplying an appropriate lower triangular matrix.

&nbsp;

As $A$ is symmetric, there exists a vector $\mathbf{v} \in \mathbb{R}^{n-1}$ and a symmetric matrix $B_1 \in\mathbb{R}^{(n-1)\times (n-1)}$ such that
\begin{align*}
A = \begin{bmatrix}
a_{11} & \mathbf{v}^\top \\
\mathbf{v} & B_1
\end{bmatrix}.
\end{align*}

By a direct computation one can verify that
\begin{align*}
A = \underbrace{\begin{bmatrix}
\sqrt{a_{11}} & \mathbf{0} \\
\frac{1}{\sqrt{a_{11}}}\mathbf{v} & I
\end{bmatrix}}_{R_1^\top} \
 \begin{bmatrix}
1 & \mathbf{0}^\top \\
\mathbf{0} & B_1 - \frac{1}{a_{11}}\mathbf{v}\mathbf{v}^\top
\end{bmatrix}   \
\underbrace{\begin{bmatrix}
\sqrt{a_{11}} & \frac{1}{\sqrt{a_{11}}}\mathbf{v}^\top \\
\mathbf{0} & I
\end{bmatrix}}_{R_1}.
\end{align*}

and hence we can now recursively move on to a Cholesky factorization of a smaller matrix $B_1 - \frac{1}{a_{11}}{\mathbf{v}\mathbf{v}}^\top$.

For convenience, let us denote $A_1 = B_1 - \frac{1}{a_{11}}\mathbf{v}\mathbf{v}^\top$.

> But to do Cholesky factorization on $A_1 = B_1 - \frac{1}{a_{11}}{\mathbf{v}\mathbf{v}}^\top$, it should be positive definite.
>
>It turns out that $A_1$ is indeed positive definite: as $R_1$ is an upper triangular matrix with positive diagonal entries, $R_1$ is invertible, and thus
\begin{align*}
(R_1^\top)^{-1} A R_1^{-1}=  
 \begin{bmatrix}
1 & \mathbf{0}^\top \\
\mathbf{0} & A_1
\end{bmatrix}    
\end{align*}  
> is positive definite. (Why?)
>
> Therefore, $A_1 = B_1 - \frac{1}{a_{11}}\mathbf{v}\mathbf{v}^\top$ must be positive definite. (Why?)



In [2]:
rng = np.random.default_rng(1)

n = 8
C = rng.random((n, n))
S = C.T @ C

In [3]:
S

array([[3.0601, 3.1416, 2.6488, 2.3415, 1.4219, 2.4014, 2.3462, 1.8511],
       [3.1416, 4.3358, 2.9287, 3.2087, 1.8643, 2.7488, 2.87  , 2.2581],
       [2.6488, 2.9287, 2.8854, 2.4135, 1.4264, 2.4177, 2.007 , 1.8344],
       [2.3415, 3.2087, 2.4135, 3.0863, 1.2495, 2.4203, 2.2281, 1.5786],
       [1.4219, 1.8643, 1.4264, 1.2495, 1.6136, 1.3627, 1.4659, 1.8795],
       [2.4014, 2.7488, 2.4177, 2.4203, 1.3627, 3.003 , 1.8872, 1.4053],
       [2.3462, 2.87  , 2.007 , 2.2281, 1.4659, 1.8872, 2.2043, 1.9057],
       [1.8511, 2.2581, 1.8344, 1.5786, 1.8795, 1.4053, 1.9057, 2.4442]])

So the first step of the Choslesky decomposition would look as follows.

In [4]:
A = np.copy(S)
R = np.zeros(np.shape(A))

R[0, 0] = np.sqrt(A[0, 0])
R[0, 1:] = A[0, 1:] / np.sqrt(A[0, 0])

A[1:, 1:] = A[1:, 1:] - np.outer(A[0, 1:], A[0, 1:]) / A[0, 0]

A[0, 0] = 1   # These three lines are not necessary,
A[0, 1:] = 0  # because we will never revisit the first
A[1:, 0] = 0  # rows and columns of *A*. Details below.


In [5]:
print("A after first step = ")
print(A)
print()
print("R_1 = ")
print(R)

A after first step = 
[[ 1.      0.      0.      0.      0.      0.      0.      0.    ]
 [ 0.      1.1106  0.2094  0.8049  0.4046  0.2835  0.4614  0.3577]
 [ 0.      0.2094  0.5926  0.3867  0.1956  0.3391 -0.0238  0.2321]
 [ 0.      0.8049  0.3867  1.2947  0.1615  0.5828  0.4329  0.1621]
 [ 0.      0.4046  0.1956  0.1615  0.9529  0.2469  0.3758  1.0193]
 [ 0.      0.2835  0.3391  0.5828  0.2469  1.1186  0.0461 -0.0473]
 [ 0.      0.4614 -0.0238  0.4329  0.3758  0.0461  0.4055  0.4864]
 [ 0.      0.3577  0.2321  0.1621  1.0193 -0.0473  0.4864  1.3245]]

R_1 = 
[[1.7493 1.7959 1.5142 1.3385 0.8129 1.3728 1.3412 1.0582]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.  

Now suppose we (somehow) obtained the next Cholesky decomposition $A_1 = P^\top P$. Then we would have
\begin{align*}
A &= \begin{bmatrix}
\sqrt{a_{11}} & \mathbf{0} \\
\frac{1}{\sqrt{a_{11}}}\mathbf{v} & I
\end{bmatrix}  
 \begin{bmatrix}
1 & \mathbf{0}^\top \\
\mathbf{0} & A_1
\end{bmatrix}   
 \begin{bmatrix}
\sqrt{a_{11}} & \frac{1}{\sqrt{a_{11}}}\mathbf{v}^\top \\
\mathbf{0} & I
\end{bmatrix}  \\[5pt]
&= \begin{bmatrix}
\sqrt{a_{11}} & \mathbf{0} \\
\frac{1}{\sqrt{a_{11}}}\mathbf{v} & I
\end{bmatrix}  
 \begin{bmatrix}
1 & \mathbf{0}^\top \\
\mathbf{0} & P^\top
\end{bmatrix}  
 \begin{bmatrix}
1 & \mathbf{0}^\top \\
\mathbf{0} & P
\end{bmatrix}   
 \begin{bmatrix}
\sqrt{a_{11}} & \frac{1}{\sqrt{a_{11}}}\mathbf{v}^\top \\
\mathbf{0} & I
\end{bmatrix}  \\[5pt]
&= \begin{bmatrix}
\sqrt{a_{11}} & \mathbf{0} \\
\frac{1}{\sqrt{a_{11}}}\mathbf{v} & P^\top
\end{bmatrix}     
 \begin{bmatrix}
\sqrt{a_{11}} & \frac{1}{\sqrt{a_{11}}}\mathbf{v}^\top \\
\mathbf{0} & P
\end{bmatrix} .
\end{align*}





In other words, we can just overwrite what we are going to further obtain in the remaining part of $R_1$, while leaving the first row as is.

The next step to take is clear: we do the exact same thing we did with $A$ on $A_1$. For the vector $\mathbf{w} \in \mathbb{R}^{n-2}$ and the symmetric matrix $B_2 \in\mathbb{R}^{(n-2)\times (n-2)}$ that satisfy
\begin{align*}
A_1 = \begin{bmatrix}
a_{22} & \mathbf{w}^\top \\
\mathbf{w} & B_2
\end{bmatrix},
\end{align*}
we focus on the fact that
\begin{align*}
A_1 = \underbrace{\begin{bmatrix}
\sqrt{a_{22}} & \mathbf{0} \\
\frac{1}{\sqrt{a_{22}}}\mathbf{w} & I
\end{bmatrix}}_{R_2^\top} \
 \begin{bmatrix}
1 & \mathbf{0}^\top \\
\mathbf{0} & B_2 - \frac{1}{a_{22}}\mathbf{w}\mathbf{w}^\top
\end{bmatrix}   \
\underbrace{\begin{bmatrix}
\sqrt{a_{22}} & \frac{1}{\sqrt{a_{22}}}\mathbf{w}^\top \\
\mathbf{0} & I
\end{bmatrix}}_{R_2}.
\end{align*}
Be aware that $a_{22}$ is not necessarily the original $(2, 2)$-entry of $A$.

In [6]:
R[1, 1] = np.sqrt(A[1, 1])
R[1, 2:] = A[1, 2:] / np.sqrt(A[1, 1])

A[2:, 2:] = A[2:, 2:] - np.outer(A[1, 2:], A[1, 2:]) / A[1, 1]

A[1, 1] = 1   # Again, these three lines are not necessary,
A[1, 2:] = 0  # because we will never revisit the second
A[2:, 1] = 0  # rows and columns of *A* in the subsequent steps.

In [7]:
print("A after second step = ")
print(A)
print()
print("R after second step = ")
print(R)

A after second step = 
[[ 1.      0.      0.      0.      0.      0.      0.      0.    ]
 [ 0.      1.      0.      0.      0.      0.      0.      0.    ]
 [ 0.      0.      0.5531  0.2349  0.1193  0.2856 -0.1108  0.1646]
 [ 0.      0.      0.2349  0.7114 -0.1317  0.3773  0.0985 -0.0971]
 [ 0.      0.      0.1193 -0.1317  0.8055  0.1436  0.2077  0.889 ]
 [ 0.      0.      0.2856  0.3773  0.1436  1.0462 -0.0717 -0.1386]
 [ 0.      0.     -0.1108  0.0985  0.2077 -0.0717  0.2138  0.3378]
 [ 0.      0.      0.1646 -0.0971  0.889  -0.1386  0.3378  1.2092]]

R after second step = 
[[1.7493 1.7959 1.5142 1.3385 0.8129 1.3728 1.3412 1.0582]
 [0.     1.0539 0.1987 0.7638 0.3839 0.269  0.4378 0.3395]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0.     0.     0.     0.     0.     0.     0.     0.    ]
 [0

Noticing the similarity of the first and second step, and with some educated reasoning (which you should verify by yourself), we can put the whole step into a loop to obtain a code that performs the full Cholesky decomposition.

In [8]:
def cholesky(S):
    A = np.copy(S)
    n, _ = np.shape(A)
    R = np.zeros((n, n))

    for i in range(n):
        if A[i, i] < 0 :
            # a safeguard for when *A* is not positive definite
            raise ValueError("The input matrix seems to be not positive definite.")

        R[i, i] = np.sqrt(A[i, i])
        R[i, i+1:] = A[i, i+1:] / np.sqrt(A[i, i])

        A[i+1:, i+1:] = A[i+1:, i+1:] - np.outer(A[i, i+1:], A[i, i+1:]) / A[i, i]

    return R

In [9]:
R = cholesky(S)
print(R)

[[ 1.7493  1.7959  1.5142  1.3385  0.8129  1.3728  1.3412  1.0582]
 [ 0.      1.0539  0.1987  0.7638  0.3839  0.269   0.4378  0.3395]
 [ 0.      0.      0.7437  0.3159  0.1605  0.384  -0.149   0.2214]
 [ 0.      0.      0.      0.782  -0.2332  0.3274  0.1861 -0.2136]
 [ 0.      0.      0.      0.      0.8517  0.1859  0.3229  0.9436]
 [ 0.      0.      0.      0.      0.      0.87   -0.1557 -0.3784]
 [ 0.      0.      0.      0.      0.      0.      0.1689  0.2779]
 [ 0.      0.      0.      0.      0.      0.      0.      0.0611]]


Notice that this implementation overwrites the entries of $A$ during the computation. This is why we store the generated positive symmetric matrix in the variable `S`, and make a copy `A` of the matrix when performing the Cholesky decomposition.

&nbsp;

Except for that issue, we can see, as in below, that the Cholesky decomposition itself is nicely done.

In [10]:
np.all(np.isclose(R.T @ R, S))

True


&nbsp;

---

>As a side note, be careful that all entries of $A$ being positive does not necessarily imply that $A$ is positive definite.
>A simple counterexample is
\begin{align*}
A = \begin{bmatrix}
1 & 2 \\ 2 & 1
\end{bmatrix}.
\end{align*}
>
>Unless you have a well-founded reason (*e.g.*, a diaonal entry is negative), you should probably avoid making guesses about the positive definiteness of a matrix from how it looks like.

In [11]:
A = np.array([[1.0, 2.0],
              [2.0, 1.0]])

try:
    # If *A* is positive definite, then *R* will be computed without errors.
    # We first "try" to see if the code can be executed without errors or not.
    R = cholesky(A)
except ValueError as msg:
    # If *A* is not positive definite, then the safeguard will kick in,
    # raising an error. This is a technical feature of python,
    # and just understand that this part prints out the error message.
    print(msg)


The input matrix seems to be not positive definite.




---

&nbsp;

Of course, in real life, you almost never implement the Cholesky factorization by yourself. For instance, you can use the NumPy function `numpy.linalg.cholesky`.

However, when using such functions, you should always be aware of how they work: `numpy.linalg.cholesky` computes a variant of the Cholesky decomposition we know, $A = LL^\top$, where $L$ is a lower triangular matrix.

In [12]:
L = np.linalg.cholesky(S)
print(L)

[[ 1.7493  0.      0.      0.      0.      0.      0.      0.    ]
 [ 1.7959  1.0539  0.      0.      0.      0.      0.      0.    ]
 [ 1.5142  0.1987  0.7437  0.      0.      0.      0.      0.    ]
 [ 1.3385  0.7638  0.3159  0.782   0.      0.      0.      0.    ]
 [ 0.8129  0.3839  0.1605 -0.2332  0.8517  0.      0.      0.    ]
 [ 1.3728  0.269   0.384   0.3274  0.1859  0.87    0.      0.    ]
 [ 1.3412  0.4378 -0.149   0.1861  0.3229 -0.1557  0.1689  0.    ]
 [ 1.0582  0.3395  0.2214 -0.2136  0.9436 -0.3784  0.2779  0.0611]]


But of course, one can show that $L = R^\top$ always holds. (Can you see why?)