---
title: LU-Factorization
subject:  Linear Algebraic Systems
subtitle: Peeling the Onion
short_title: Peeling the LU onion
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: Gaussian Elimination, LU factorization
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

This section is based on ALA Ch 1.3, Ch 5.3 of Jessy Grizzles [ROB 101 notes](https://github.com/michiganrobotics/rob101/blob/main/Fall%202021/Textbook/ROB_101_December_2021_Grizzle.pdf).

## Learning Objectives

By the end of this page, you should know:- 
- what the $LU$ factorization of a matrix is
- how to apply $LU$ factorization to solve systems of linear equations
- how to represent row operations using elementary matrices
- LU factorization using elementary matrix actions
- how LU factorization relates to Gaussian Elimination (forward elimination and back substitution)

## LU (Lower-Upper) Factorization: Regular Case
In the previous section, we saw a correct and perfectly acceptable way of solving systems of linear equations with regular $A$ matrices.  It turns out that it is very closely related to an approach based on _factorizing_ the coefficient matrix $A$ into a product of a lower triangular matrix $L$ and and upper triangular matrix $U$ such that $A=LU$, which we'll develop in this section. You'll see that the expressions end up being somewhat simpler to work with, and that the pseudocode is a bit "cleaner" (specifically, we won't have nested for loops).  This is also _much_ closer to the way solutions to linear systems are implemented in modern linear algebra computational packages.

We'll start with a worked example, and then define the general algorithm.  

### Column-Row Multiplication
A special case of matrix multiplication is multiplying a $m \times 1$ column vector $\vv c$ and a $1 \times n$ row vector $\vv r$ together.  We'll work with $m=n=3$ here, but the general case is very similar.  Applying the rules of matrix arithmetic, we see that if
$$
\vv c = \bm c_1\\ c_2\\ c_3\em, \quad \vv r = \bm r_1 &  r_2 & r_3\em,$$
then
$$\vv c \vv r = \bm r_1 \vv c & r_2 \vv c & r_3 \vv c\em = \bm c_1r_1 & c_1 r_2 & c_1 r_3 \\c_2r_1 & c_2 r_2 & c_2 r_3\\ c_3r_1 & c_3 r_2 & c_3 r_3\em = \bm c_1 \vv r \\ c_2 \vv r \\ c_3 \vv r\em.
$$
(prop1)=
For this section, the most important property we will use is that the $i$th row of $\vv c \vv r$, given by $c_i \vv r$, is a copy of the row vector $\vv r$ scaled by the corresponding component $c_i$ of the column vector $\vv c$.

### Pealing the onion
Consider the square matrix
$$
A = \bm 1 & 4 & 5 \\ 2 & 9 & 17 \\ 3 & 18 & 58 \em.
$$
Our goal is to find a a column vector $\vv c_1$ and a row vector $\vv r_1$ such that
$$
A - \vv c_1 \vv r_1 = \bm 0 & 0 & 0\\ 0 & \star & \star \\ 0 & \star & \star\em,
$$
where here $\star$ means we do not care about the specific values.  Another way of saying this is that we want to zero out the first row and column of $M$ by choosing $\vv c_1$ and $\vv r_1$ so that the first row and first column of the matrix product $\vv c_1 \vv r_1$ match those of $M$.  Can we do this?

In the special case when the $(1,1)$ entry of $A$ is equal to 1, i.e., when $a_{11}=1$, we can do this pretty easily!  We'll do the obvious thing and just define $\vv c_1$ and $\vv r_1$ to be the first column of $M$ and the first row of $M$, respectively, that is, $\vv c_1 = (1,2,3)$ and $\vv r_1 = \bm 1 & 4 & 5\em$.[^brackets]  Then, remembering the [property](#prop1) that we identified earlier, we have that
$$
\vv c_1 \vv r_1 = \bm 1 \\ 2 \\ 3\em \bm 1 & 4 & 5\em=\bm 1 & 4 & 5 \\ 2 & 8 & 10 \\ 3 & 12 & 15\em,
$$
and would you look at that, we met our objective:
$$
A = \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 9 & 17 \\ \underline 3 & 18 & 58 \em, \quad \vv c_1 \vv r_1 = \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 8 & 10 \\ \underline 3 & 12 & 15\em.
$$
We can then compute
\begin{eqnarray}
A - \vv c_1 \vv r_1 &= \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 9 & 17 \\ \underline 3 & 18 & 58 \em - \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 8 & 10 \\ \underline 3 & 12 & 15\em\\
& = \bm \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & 1 & 7 \\ \underline 0 & 6 & 43\em.
\end{eqnarray}

[^brackets]: Remember that $\vv c_1 = (1,2,3)$ is a $3 \times 1$ column vector because of the round brackets and commas, and $\vv r_1 = \bm 1 & 4 & 5\em$ is a $1 \times 3$ row vector because of the square brackets and no commas.

In doing this, we've taken a $3 \times 3$ matrix and essentially turned it into a $2 \times 2$ matrix.  Let's see if we can do it again: define $\vv c_2$ and $\vv r_2$ to be the second column and second row of $A-\vv c_1\vv r_1$:
$$
\vv c_2 = \bm 0 \\ 1 \\ 6 \em, \quad \vv r_2 = \bm 0 & 1 & 7\em.
$$
We then compute
$$
\vv c_2 \vv r_2 = \bm 0 \\ 1 \\ 6 \em\bm 0 & 1 & 7\em=\bm 0 & 0 & 0 \\ 0 & 1 & 7 \\ 0 & 6 & 42\em,
$$
to obtain
$$
A- \vv c_1 \vv r_1 = \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 43\em, \quad \vv c_2 \vv r_2 = \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 42\em.
$$
Next we subtract the latter from the former:
\begin{eqnarray}
(A - \vv c_1\vv r_1) - \vv c_2 \vv r_2 &= \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 43\em - \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 42\em \\
& = \bm  \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & 1\em.
\end{eqnarray}

Just like that, we're down to what is essentially a $1 \times 1$ matrix.  We'll quickly note that $\vv c_3 = (0, 0, 1)$ and $\vv r_3 = \bm 0 & 0 & 1\em$ satisfies
$$
\vv c_3 \vv r_3 =  \bm  \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & 1\em,
$$
and we then immediately have that $A - \vv c_1 \vv r_1 - \vv c_2 \vv r_2 -\vv c_3 \vv r_3 = 0$, or equivalently
\begin{equation}
A = \vv c_1 \vv r_1 + \vv c_2 \vv r_2 + \vv c_3 \vv r_3 =\underbrace{\bm \vv c_1 & \vv c_2 & \vv c_3\em}_{L}\underbrace{\bm\vv r_1 \\ \vv r_2 \\ \vv r_3\em}_{U}.
\end{equation}
Here we used the properties of [block matrix multiplication](./022-linsys-matvec.ipynb#blockmatmul) we saw last time.

In the above, used $L$ and $U$ for two special matrices that were built up from the $\vv c_i$ and $\vv r_i$ we have identified so far:
- $L = \bm \vv c_1 & \vv c_2 & \vv c_3\em = \bm 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 6 & 1\em$  is _lower triangular_,
- $U = \bm\vv r_1 \\ \vv r_2 \\ \vv r_3\em = \bm 1 & 4 & 5 \\ 0 & 1 & 7 \\ 0 & 0 & 1 \em$ is _upper triangular_, and
- $A=LU$, that is we _factored_ our original matrix $M$ into a product of a lower triangular matrix $L$ and an upper triangular matrix $U$.

### A More General Version
We got lucky in the previous example: at every step of the way, the upper left most entry of the matrix was equal to 1.  In general, this won't be the case, but we'll see that for regular matrices $A$, our trusty friend the _pivot_ will help us out.  Again, we will start with a simple example to work out the main idea, and then state a more general principle.  The key fact we use throughout is that for a regular matrix, _all of the pivots are nonzero._

Let's try to compute an $LU$-factorization for the matrix 
$$
A = \bm 2 & 3 \\ 4 & 5 \em.
$$

If we use our prior strategy of setting $\vv c_1 = (2, 4)$ to be the first column of $M$, and $\vv r_1 = \bm 2 & 3\em$ to be the first row of $M$, we run into trouble, because
$$
A - \vv c_1 \vv r_1 = \bm 2 & 3 \\ 4 & 5 \em - \bm 2 \\ 4 \em\bm 2 & 3\em=\bm 2 & 3 \\ 4 & 5 \em -\bm 4 & 6 \\ 8 & 12 \em = \bm -2 & -3 \\ -4 & -7\em.
$$
This sadly does **not** zero out the first row and column of $M$.  The secret here is to build $\vv c_1$ in a slightly different way: we will instead set $\vv c_1$ to be the first column of $A$ divided by $a_{11}$.  Remember, $a_{11}$ is the _first pivot_ of $A$.  So keeping $\vv r_1$ the same, and instead setting $\vv c_1 = (\frac{2}{2}, \frac{4}{2})=(1,2)$, we then get exactly what we were hoping for:
$$
A - \vv c_1 \vv r_1 = \bm 2 & 3 \\ 4 & 5 \em - \bm \frac{2}{2} \\ \frac{4}{2} \em\bm 2 & 3\em=\bm 2 & 3 \\ 4 & 5 \em -\bm 2 & 3 \\ 4 & 6 \em = \bm 0 & 0\\ 0 & -1\em.
$$

More generally, here is the basic recipe for _peeling the onion_ when matrix pivots are not equal to one (but are also nonzero!):
\begin{eqnarray}
\bm 
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nn}
\em
-\frac{1}{a_{11}}\bm a_{11} \\ a_{21} \\ \vdots  \\ a_{n1}\em \bm a_{11} & a_{12} & \cdots & a_{1n} \em \\=
\bm 
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & a_{n2} & \cdots & a_{nn}
\em - \bm 
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & \star & \cdots & \star \\
\vdots & \vdots & \ddots & \vdots \\
a_{n1} & \star & \cdots & \star
\em \\ = \bm 
0 & 0 & \cdots & 0 \\
0 & \star & \cdots & \star \\
\vdots & \vdots & \ddots & \vdots \\
0 & \star & \cdots & \star
\em
\end{eqnarray}

### Solving Linear Equations with $LU$-factorizations
We've made a lot of progress, but we still do not have a way of dealing with cases where the matrix $A$ is not regular, i.e., when it has pivots that are zero.  We'll defer that situation for a little bit still, and instead look at why we bothered with $LU$-factorizations to begin with!  Suppose we are interested in solving a system of linear equations $A\vv x = \vv b$, and we assume that $A$ is regular so that we can apply the algorithm above to compute an $LU$-facotrization of $A$, i.e., we find a lower triangular matrix $L$ and an upper triangular matrix $U$ so that $A=LU$.  Then our linear system becomes $LU\vv x = \vv b$.  

Neat, but so what?  Well we know how to solve linear systems that look like $U\vv x = \vv z$, for $U$ upper triangular, by Back Substitution, and while we haven't seen it explicitly, hopefully you believe that we can solve linear systems that look like $L\vv y = \vv c$, for $L$ lower triangular, by _Forward Substitution_.[^fsub]  Our strategy will be to turn solving $A\vv x = \vv b$ into solving a lower triangular system by Forward Substitution, and then an upper triangular system by Back Substitution, both things we know how to do easily!  With that in mind, let us introduce a new intermediate variable $\vv z = U \vv x$.  Then I can find the solution to $A\vv x = LU\vv x = \vv b$ by solving
$$
L\vv z = \vv b,
$$
for $\vv z$ via Forward Substitution, and then solving 
$$
U \vv x = \vv z,
$$
for $\vv x$ via Backward Substitution.  If we piece everything together, we see that $\vv x$ is indeed a solution to our original equation $A\vv x = \vv b$ because
$$
A\vv x = L\underbrace{U \vv x}_{=\vv z}= \underbrace{L\vv z}_{=\vv b} = \vv b.
$$

A neat algorithm applying LU factorization to solve system of linear equations is as follows.

[^fsub]: Forward Substitution is exactly what it sounds like.  To solve $L\vv y = \vv c$, when $L$ is lower triangular, we start at the top row and set $y_1 = c_1/l_{11}$. We then move down a row, and solve $l_{21}y_1 + l_{22}y_2 = c_2$ by plugging in $y_1$ and solving for $y_2$.  We continue in this manner until we've solved for $\vv y$. 

1. Solve $L\vv z = \vv b$ via forward substitution.
2. Using $\vv z$, solve $U\vv x = \vv z$ via back substitution.

## Elementary Matrices
It turns out that our strategy of adding equations together is in fact an operation that can be realized by matrix multiplication.  Our starting point will be to introduce a the idea of an _elementary matrix_, which encodes the corresponding elementary operation.  So far we've only seen the elementary operation of adding equations together, which we recall, is equivalent to adding rows of the augmented matrix $M = [A \, | \, \vv b]$ together, but we'll see other types of operations shortly.  The definition below will work for all of them.

```{prf:definition} Elementary Matrix
:label: elementary
The _elementary matrix_ associated with an elementary row operation for $m$-rowed matrices is the $m \times m$ matrix obtained by applying the the row operation to the $m\times m$ identity matrix $I_m$.
```

This definition is a bit abstract, so let's see it in action.  Suppose that we have a system of three equations in three unknowns, encoded by the augmented matrix $M = [ A\, | \, \vv b]$, where $A$ is a $3\times 3$ matrix and $\vv b$ is a $3 \times 1$ vector.  What is the elementary matrix associated with subtracting twice the first row from the second row?  If we start with the identify matrix 
$$
I_3 = \bm 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \em,
$$
and apply this operation to it, we end up with the elementary matrix 
$$
E_1 = \bm 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1\em,
$$
which we named $E_1$ just to keep track of which elementary matrix we are talking about later on.  Let's check if it does what it's supposed to on a two different $3 \times 3$ matrices 
$$
\underbrace{\bm 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1\em}_{E_1}\bm 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \em = \bm 1 & 2 & 3 \\ 2 & 1 & 0 \\ 7 & 8 & 9 \em, \quad \underbrace{\bm 1 & 0 & 0 \\ -2 & 1 & 0 \\ 0 & 0 & 1\em}_{E_1}\bm 1 & 2 & 3 \\ 1 & 2 & 3 \\ 1 & 2 & 3 \em =\bm 1 & 2 & 3 \\ -1 & -2 & -3 \\ 1 & 2 & 3 \em .
$$
Indeed it does, and by playing around with our rules of matrix arithmetic and multiplication, you should be able to convince yourself that that left multiplying _any_ 3-row matrix by $E_1$ will subtract twice its from its second row.

We can also use [](#elementary) to reverse engineer what row operation an elementary matrix is encoding.

````{exercise} Reverse Engineering Elementary Matrices
:label: em-ex
What elementary row operations are realized if we left multiply by the following matrices?
$$
E_2 = \bm 1 & 0 & 0 \\ 0 & 1 & 0 \\ -1 & 0 & 1 \em, \quad E_3 = \bm 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & \frac{1}{2} & 1 \em.
$$
:::{hint} Click me for a hint!
:class: dropdown
What would I have to do to
$$
I_3 = \bm 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \em,
$$
to get $E_2$ and $E_3$?
:::
```{solution} em-ex
:class: dropdown
Left multiplying by $E_2$ realizes subtracting the first row from the last row of a matrix.  This is true because to get the last row of $E_2$, $\bm -1 & 0 & 1\em$ from $I_3$, we need to subract the first row $\bm 1 & 0 & 0 \em$ of $I_3$ from the last row $\bm 0 & 0 & 1 \em$ of $I_3$.

Using similar reasoning, we see that left multiplying by $E_3$ realizes adding $1/2$ the second row of a matrix to its last row.
```
````

### Elementary Matrices in Action
Let us use elementary matrices to design a matrix that can help us solve linear equations.  Let's revisit the very first system of equations we encountered:
\begin{eqnarray}
\label{simple-linsys}
x_1+2x_2+x_3 & = 2,\\
2x_1+6x_2+x_3 & =7,\\
x_1+x_2+4x_3 & =3,
\end{eqnarray}
or in matrix-vector form
$$\underbrace{\begin{bmatrix} 1 & 2 & 1 \\
2 & 6 & 1 \\
1 & 1 & 4 \end{bmatrix}}_A\underbrace{\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix}}_{\vv x} = \underbrace{\begin{bmatrix} 2 \\ 7 \\ 3 \end{bmatrix}}_{\vv b}
$$
which we solved by first applying the row operation encoded by $E_1$ (subtracted twice equation 1 from equation 2), then applying the operation encoded by $E_2$ (subtract the first equation from the last equation ), and finally by applying the operation encoded by $E_3$ (add 1/2 first equation to last equation).  Funny how that conveninently worked out!  Therefore, if we keep careful track of the row operations and their matrix multiplication realizations, we conclude that:
$$
A = \begin{bmatrix} 1 & 2 & 1 \\
2 & 6 & 1 \\
1 & 1 & 4 \end{bmatrix}, \quad E_3 E_2 E_1 A = U = \bm 1 & 2 & 1 \\ 0 & 2 & -1 \\ 0 & 0 & \frac{5}{2} \em.
$$
The way to read the expression $E_3 E_2 E_1 A$ is from left-to-right: we start with $A$, then apply $E_1$, then $E_2$, and finally $E_3$.  If we give the composition of elementary operations a name, say $E = E_3 E_2 E_1 A$, then we can check that $E [A \, | \, \vv b] = [U \, | \, \vv c]$, that is to say left multiplying the augmented matrix $M = [A \, | \, \vv b]$ by the matrix $E$ performs Gaussian Elmination for us!  This is just a first taste of a powerful feature of linear algebra: _we can encode very sophisticated and complex operations via matrix multiplication._
```{important}
The order of the operations here matters!  For example, $E_2 E_1 E_3 A$ is a very different matrix than $E_3 E_2 E_1 A$.
```

### Inverse Elementary Matrix

We can undo the action of an elementary matrix using the corresponding _inverse elementary matrix_. For example, to undo the action of $E_1$ (subtract twice equation 1 from equation 2), we add twice equation 1 to equation 2. The corresponding matrix is 
$$
L_1 = \begin{bmatrix}
1 & 0 & 0 \\
2 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$
We note that $L_1E_1 = I$, since, $L_1$ reverses the action done by $E_1$ and the end result $I$ does not do any action. Similarly, we can define appropriate inverses
$$
L_2 = \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
1 & 0 & 1
\end{bmatrix}, \ L_3 = \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & -\frac{1}{2} & 1
\end{bmatrix}
$$
We have $U = E_3E_2E_1A$. If we reverse the actions using $L_1L_2L_3$, we have
$$
L_1L_2L_3U = L_1L_2L_3E_3E_2E_1A = L_1L_2E_2E_1A = A,
$$
where we used the fact that $L_2E_2 = L_1E_1 = I$. Hence, calling $L = L_1L_2L_3$, we showed that $A = LU$. Since $L_1, L_2, L_3$ are all lower triangular, so is their product $L$. This is how we can obtain the LU facotrization of $A$ using elementary matrix operations. 

In [41]:
## applying LU factoriation to solve linear equations

from scipy.linalg import lu
import numpy as np

A = np.array([[2, 6, 1],
              [1, 1, 4],
              [1, 2, 1]])
b = np.array([7, 3, 2])
P, L, U = lu(A) # LU decomposition using scipy package
print("P: \n", P, "\nL: \n", L, "\nU: \n", U)

n = A.shape[1]

# forward substitution
z = np.zeros((n,))
z[0] = b[0]/L[0, 0]
for i in range(1, n):
    z[i] = (b[i] - np.sum(L[i, :i]*z[:i]))/L[i, i]

# back substitution
x = np.zeros((n,))
x[-1] = z[-1]/U[-1, -1]
for i in range(n-2, -1, -1):
    x[i] = (z[i] - np.sum(U[i, i+1:]*x[i+1:]))/U[i, i]

print("\nSolution x: \n", x)

P: 
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]] 
L: 
 [[1.  0.  0. ]
 [0.5 1.  0. ]
 [0.5 0.5 1. ]] 
U: 
 [[ 2.    6.    1.  ]
 [ 0.   -2.    3.5 ]
 [ 0.    0.   -1.25]]

Solution x: 
 [-3.  2.  1.]


## Optional: $LU$-factorization is Gaussian Elimination!

We have $A = LU$.  Suppose we have applied Gaussian Elimination to get our system into $U\vv x = \vv c$.  There is a matrix $M$ such that $MA= U$ and $M\vv b=\vv c$.  So we have 
$$
MA\vv x = MLU\vv x = M\vv b
$$
But if $MA = U$ then we must have that $ML= I$.

Now, if we go back to solving our system using $LU$-factorization, we have that $LU\vv x = \vv b$ which we can turn into $L\vv z = \vv b$ and $U\vv x = \vv z$.  We saw that we can solve that $L\vv z = b$ by forward substitution, but notice that we also have that $ML\vv z = \vv z = M \vv b$, i.e., the matrix $M$ that reduces our original system of equations to upper triangular form is actually performing this forward substitution.