---
title: Gaussian Elimination Revisited
subject:  Linear Algebraic Systems
subtitle: Gaussian Elimination as Matrix Factorization
short_title: LU Factorization
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: Gaussian Elimination, LU factorization
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

## Reading
Material related to this page, as well as additional exercises, can be found in LAA Ch. 2.5, ALA Ch 1.3, and ILA Ch. 2.6.  This page is mostly based on ALA Ch 1.3.

## Learning Objectives

By the end of this page, you should know:
- what the $LU$ factorization of a matrix is
- how to apply $LU$ factorization to solve systems of linear equations
- how this approach relates to Gaussian Elimination (forward elimination and back substitution)

## Gaussian Elimination: Regular Case
With basic matrix arithmetic operations in our toolkit, we will develop a systematic method for solving linear systems of equations.  For a linear system $A\vv x = \vv b$, with $A$ an $m\times n$ coefficient matrix, $\vv x$ an $n \times 1$ unknowns vector, and $\vv b$ an $m \times 1$ right hand side vector, we define the _augmented matrix_:
```{math}
:label: augmat
M = \left[\begin{array}{c|c} A & \vv b \end{array}\right]
=\left[ \begin{array}{cccc|c} 
a_{11} & a_{12} & \cdots & a_{1n} & b_1 \\
a_{21} & a_{22} & \cdots & a_{2n} & b_2 \\
\vdots & \vdots & \ddots & \vdots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn} & b_m \end{array}\right],
```
which is an $m \times (n+1)$ matrix obtained by tacking the right-hand side vector $\vv b$ onto the right of the coefficient matrix $A$.  The extra vertical line is just to remind us that the last column of this matrix plays a special role.  For example, the augmented matrix for our [example system](./021-linsys-gauss.ipynb#simple-linsys) is
\begin{equation}
\label{augmat-ex}
M = \left[ \begin{array}{ccc|c} 1 & 2 & 1 & 2\\ 2 & 6 & 1 & 7 \\ 1 & 1 & 4 & 3 \end{array}\right]
\end{equation}
Note that it is simple to go back and forth between the original linear system and the augmented matrix, but since operations on equations also affect their right-hand sides, it is convenient to keep track of everything together using the augmented matrix.

For the time being, we will concentrate our efforst on linear systems that have the same number, $n$, of equations as unknowns.  The associated coefficient matrix $A$ is square of size $n \times n$, and the corresponding augmented matrix $M = [ A \, | \, \vv b]$ then has size $n \times (n+1)$.

We start with a simple observation connecting [Linear System Operation \#1](./021-linsys-gauss.ipynb#linop1) to its equivalent matrix operation
```{prf:observation} Elementary Row Operation \#1
:label: rowop1
Adding a scalar multiple of one row of the augmented matrix to another row is the equivalent of adding a multiple of one equation to another in the system of linear equations it defines.  As such, this does not change the solution set and leads to an equivalent augmented matrix.
```

For example, when solving [example system](./021-linsys-gauss.ipynb#simple-linsys), our first step was to subtract two times the first equation from the second.  This is equivalently done by subtracting two times the first row of the augmented matrix [](#augmat-ex) from the second row:
$$
-2\bm 1 & 2 & 1 & 2 \em + \bm 2 & 6 & 1 & 7 \em = \bm 0 & 2 & -1 & 3\em.
$$
We recognize this as the second row of the modified augmented matrix
\begin{equation}
\label{pivot1}
\left[ \begin{array}{ccc|c} 1 & 2 & 1 & 2\\ 0 & 2 & -1 & 3\\ 1 & 1 & 4 & 3 \end{array}\right],
\end{equation}
that corresponds to the [first equivalent example system](./021-linsys-gauss.ipynb#simple-linsys0).  When elementary row operation \#1 is performed, it is critical that the result replaces the row being added to and _not_ the row being multiplied by the scalar.  Notice that the elimination of a variable in an equation, in this case the first variable in the second equation, amounts to making its entry in the coefficient matrix equal to zero.

### Pivots
```{image} ../figures/02-pivot.gif
:alt: Pivot!
:width: 500px
:align: center
```
We will call the $(1,1)$ entry of the coefficient matrix the _first pivot_.  The precise definition of a pivot will become clear as we continue, but one key requirement is that _a pivot must always be nonzero_.  Eliminating the first variable $x_1$ from teh second and third equations is the same as making all of the matrix entries in the column below the pivot equal to zero.  We have already done this with the $(2,1)$ entry in [](#pivot1).  To make the $(3,1)$ entry equal to zero, we subtract the first from from the last row, resulting in the augmented matrix
\begin{equation}
\label{pivot2}
\left[ \begin{array}{ccc|c} 1 & 2 & 1 & 2\\ 0 & 2 & -1 & 3\\ 0 & -1 & 3 & 1 \end{array}\right],
\end{equation}
which we again recognize as the corresponding to the [second equivalent example system](./021-linsys-gauss.ipynb#simple-linsys1).  The _second pivot_ is the $(2,2)$ entry of this matrix, which is $2$, and is hte coefficient of the second variable $x_2$ in the second equation.  Again, the pivot must be nonzero.  We use the [](#rowop1) of adding $1/2$ of the second row to the third row to make the entry below the second pivot equal to 0, resulting in the augmented matrix
\begin{equation}
\label{pivot3}
\left[ \begin{array}{ccc|c} 1 & 2 & 1 & 2\\ 0 & 2 & -1 & 3\\ 0 & 0 & \frac{5}{2} & \frac{5}{2} \end{array}\right],
\end{equation}
that corresponds to the [triangular system equivalent to our example system](./021-linsys-gauss.ipynb#simple-linsys2).  We write the final augmented matrix as
$$
N = [U \, | \, \vv c], \quad \text{where} \quad U = \bm 1 & 2 & 1 \\ 0 & 2 & -1 \\ 0 & 0 & \frac{5}{2}\em, \quad \vv c = \bm 2 \\ 3 \\ \frac{5}{2} \em.
$$

The corresponding linear system can be written as $U\vv x = \vv c$.  A special feature of this system is that the coefficient matrix $U$ is _upper triangular_[^upper], which means that all entries below the main diagonal are zero, i.e., $u_{ij}=0$ whenever $i>j$.  The three nonzero entries on its diagonal, $1$, $2$, and $5/2$, including the last one in the $(3,3)$ slot, are the three pivots.  Once the system has been reduced to this triangular form, we can easily solve it via Back Substitution.

[^upper]: It's convention we used the symbol $U$ to remind ourselves that the matrix is upper triangular.

What we just described is an algorithm for solving a linear system of $n$ equations in $n$ unknowns, and is known as _regular Gaussian Elimination_.  We'll call a square matrix $A$ _regular_ if the algorithm successfully reduces it to the upper triangular form $U$ with all nonzero pivots on the diagonal.  If this fails to happen, i.e., if a pivot appearing on the diagonal is zero, then the matrix is not regular.  We then use the pivot row to make all entries lying in the column below the pivot equal to zero through elementary row operations.  The solution is then found by applying Back Substitution to the resulting system.  We'll summarize both of these algorithms in _pseudocode_ below.  Later, we'll see how to translate this pseudocode into actual Python code that can be run on a computer.

:::{prf:algorithm} Regular Gaussian Elimination
:label: reg-ge

**Inputs** Augmented matrix $M = [ A \, | \, \vv b]$

**Output** Equivalent upper triangular form $M = [U \, | \, \vv c]$ if $A$ is regular, "$A$ is not regular" token otherwise

for $j=1$ to $n$:\
$\quad$ if $m_{jj}=0$:\
$\quad \quad$ **return** "$A$ is not regular"\
$\quad$ else for $i= j + 1$ to $n$:\
$\quad \quad$ set $l_{ij}\leftarrow m_{ij}/m_{jj}$\
$\quad \quad$ add $-l_{ij}$ times row $j$ of $M$ to row $i$ of $M$\
**return** $M = [U \, | \, \vv c]$ 
:::
Here we use what are called _in place updates_, meaning that the same letter $M$ (with entries $m_{ij}$) denotes the current augmented matrix at each stage in the computation.  We initialize with $M=[A \, | \, \vv b]$, and output (assuming $A$ is regular) the upper triangular equivalent augmented matrix $M = [U \, | \, \vv c]$, where $U$ is the upper triangular matrix with diagonal entries the pivots, and $\vv c$ is the resulting vector of the right-hand sides of the triangular system $U\vv x = \vv c$.

Next, let's take a look at the pseudocode for Back Substitution.
:::{prf:algorithm} Back Substitution
:label: back-sub

**Inputs** Triangular form augmented matrix $M = [U \, | \, \vv c]$.  $U$ is assumed to have nonzero diagonals $u_{ii}\neq 0$.

**Output** Solution $\vv x$ to $U\vv x = \vv c$.

set $x_n\leftarrow c_n/u_{nn}$\
for $i=n-1$ to $1$: (decrementing by $-1$ at each iteration)\
$\quad$ set $x_i \leftarrow \frac{1}{u_{ii}}\left(c_i-\displaystyle\sum_{j=1}^{i+1}u_{ij}x_j\right)$\
**return** solution $\vv x$

### Worked Examples

````{exercise}  TODO
:label: row-reduce-ex1
Write me
:::{hint} Click me for a hint!
:class: dropdown
Write me

:::
```{solution} my-exercise
:class: dropdown
Write me
```
````

````{exercise}  TODO
:label: row-reduce-ex1
Write me
:::{hint} Click me for a hint!
:class: dropdown
Write me

:::
```{solution} my-exercise
:class: dropdown
Write me
```
````

## LU (Lower-Upper) Factorization
The approach we saw above is a correct and perfectly acceptable way of solving systems of linear equations with regular $A$ matrices.  It turns out that it is very closely related to an approach based on _factorizing_ the coefficient matrix $A$ into a product of a lower triangular matrix $L$ and and upper triangular matrix $U$ such that $A=LU$, which we'll develop in this section. You'll see that the expressions end up being somewhat simpler to work with, and that the pseudocode is a bit "cleaner" (specifically, we won't have nested for loops).  This is also _much_ closer to the way solutions to linear systems are implemented in modern linear algebra computational packages.

We'll start with a worked example, and then define the general algorithm.  This section is based on Ch 5.3 of Jessy Grizzles [ROB 101 notes](https://github.com/michiganrobotics/rob101/blob/main/Fall%202021/Textbook/ROB_101_December_2021_Grizzle.pdf).

### Column-Row Multiplication
A special case of matrix multiplication is multiplying a $m \times 1$ column vector $\vv c$ and a $1 \times n$ row vector $\vv r$ together.  We'll work with $m=n=3$ here, but the general case is very similar.  Applying the rules of matrix arithmetic, we see that if
$$
\vv c = \bm c_1\\ c_2\\ c_3\em, \quad \vv r = \bm r_1 &  r_2 & r_3\em,$$
then
$$\vv c \vv r = \bm r_1 \vv c & r_2 \vv c & r_3 \vv c\em = \bm c_1r_1 & c_1 r_2 & c_1 r_3 \\c_2r_1 & c_2 r_2 & c_2 r_3\\ c_3r_1 & c_3 r_2 & c_3 r_3\em = \bm c_1 \vv r \\ c_2 \vv r \\ c_3 \vv r\em.
$$
(prop1)=
For this section, the most important property we will use is that the $i$th row of $\vv c \vv r$, given by $c_i \vv r$, is a copy of the row vector $\vv r$ scaled by the corresponding component $c_i$ of the column vector $\vv c$.

### Pealing the onion
Consider the square matrix
$$
M = \bm 1 & 4 & 5 \\ 2 & 9 & 17 \\ 3 & 18 & 58 \em.
$$
Our goal is to find a a column vector $\vv c_1$ and a row vector $\vv r_1$ such that
$$
M - \vv c_1 \vv r_1 = \bm 0 & 0 & 0\\ 0 & \star & \star \\ 0 & \star & \star\em,
$$
where here $\star$ means we do not care about the specific values.  Another way of saying this is that we want to zero out the first row and column of $M$ by choosing $\vv c_1$ and $\vv r_1$ so that the first row and first column of the matrix product $\vv c_1 \vv r_1$ match those of $M$.  Can we do this?

In the special case when the $(1,1)$ entry of $M$ is equal to 1, i.e., when $m_{11}=1$, we can do this pretty easily!  We'll do the obvious thing and just define $\vv c_1$ and $\vv r_1$ to be the first column of $M$ and the first row of $M$, respectively, that is, $\vv c_1 = (1,2,3)$ and $\vv r_1 = \bm 1 & 4 & 5\em$.[^brackets]  Then, remembering the [property](#prop1) that we identified earlier, we have that
$$
\vv c_1 \vv r_1 = \bm 1 \\ 2 \\ 3\em \bm 1 & 4 & 5\em=\bm 1 & 4 & 5 \\ 2 & 8 & 10 \\ 3 & 12 & 15\em,
$$
and would you look at that, we met our objective:
$$
M = \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 9 & 17 \\ \underline 3 & 18 & 58 \em, \quad \vv c_1 \vv r_1 = \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 8 & 10 \\ \underline 3 & 12 & 15\em.
$$
We can then compute
\begin{eqnarray}
M - \vv c_1 \vv r_1 &= \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 9 & 17 \\ \underline 3 & 18 & 58 \em - \bm \underline 1 & \underline 4 & \underline 5 \\ \underline 2 & 8 & 10 \\ \underline 3 & 12 & 15\em\\
& = \bm \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & 1 & 7 \\ \underline 0 & 6 & 43\em.
\end{eqnarray}

[^brackets]: Remember that $\vv c_1 = (1,2,3)$ is a $3 \times 1$ column vector because of the round brackets and commas, and $\vv r_1 = \bm 1 & 4 & 5\em$ is a $1 \times 3$ row vector because of the square brackets and no commas.

In doing this, we've taken a $3 \times 3$ matrix and essentially turned it into a $2 \times 2$ matrix.  Let's see if we can do it again: define $\vv c_2$ and $\vv r_2$ to be the second column and second row of $M-\vv c_1\vv r_1$:
$$
\vv c_2 = \bm 0 \\ 1 \\ 6 \em, \quad \vv r_2 = \bm 0 & 1 & 7\em.
$$
We then compute
$$
\vv c_2 \vv r_2 = \bm 0 \\ 1 \\ 6 \em\bm 0 & 1 & 7\em=\bm 0 & 0 & 0 \\ 0 & 1 & 7 \\ 0 & 6 & 42\em,
$$
to obtain
$$
M - \vv c_1 \vv r_1 = \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 43\em, \quad \vv c_2 \vv r_2 = \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 42\em.
$$
Next we subtract the latter from the former:
\begin{eqnarray}
(M - \vv c_1\vv r_1) - \vv c_2 \vv r_2 &= \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 43\em - \bm  0 & 0 & 0 \\ 0 & \underline 1 & \underline 7 \\ 0 & \underline 6 & 42\em \\
& = \bm  \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & 1\em.
\end{eqnarray}

Just like that, we're down to what is essentially a $1 \times 1$ matrix.  We'll quickly note that $\vv c_3 = (0, 0, 1)$ and $\vv r_3 = \bm 0 & 0 & 1\em$ satisfies
$$
\vv c_3 \vv r_3 =  \bm  \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & \underline 0 \\ \underline 0 & \underline 0 & 1\em,
$$
and we then immediately have that $M - \vv c_1 \vv r_1 - \vv c_2 \vv r_2 -\vv c_3 \vv r_3 = 0$, or equivalently
\begin{equation}
M = \vv c_1 \vv r_1 + \vv c_2 \vv r_2 + \vv c_3 \vv r_3 =\underbrace{\bm \vv c_1 & \vv c_2 & \vv c_3\em}_{L}\underbrace{\bm\vv r_1 \\ \vv r_2 \\ \vv r_3\em}_{U}.
\end{equation}
TODO: add reference to property of block matrix multiplication.
In the above, used $L$ and $U$ for two special matrices that were built up from the $\vv c_i$ and $\vv r_i$ we have identified so far:
- $L = \bm \vv c_1 & \vv c_2 & \vv c_3\em = \bm 1 & 0 & 0 \\ 2 & 1 & 0 \\ 3 & 6 & 1\em$  is _lower triangular_,
- $U = \bm\vv r_1 \\ \vv r_2 \\ \vv r_3\em = \bm 1 & 4 & 5 \\ 0 & 1 & 7 \\ 0 & 0 & 1 \em$ is _upper triangular_, and
- $M=LU$, that is we _factored_ our original matrix $M$ into a product of a lower triangular matrix $L$ and an upper triangular matrix $U$.

### A More General Version
We got lucky in the previous example: at every step of the way, the upper left most entry of the matrix was equal to 1.  In general, this won't be the case, but we'll see that for regular matrices $A$, our trusty friend the _pivot_ will help us out.

TODO: finish this section.  Observe that each step of "peeling the onion" is actually a "batch" version of forward elimination. Highlight connection to regular Gaussian elimination algorithm (basically each step is doing one inner for loop all at once).