## Linear System

### Overview



#### Take-aways

After studying this material, we will be able to
- explain Gaussian elimination in detail in a computation-oriented way,
  - conduct elimination and back substitution for linear systems in a computationally efficient way,
  - derive the complexity of the elimination part and back substitution part,
  - give the leading term of the complexity for the elimination part and back substitution part,
- explain how to solve linear systems using LU decomposition and related facts,
  - prove properties of triangular matrices that are related to LU decomposition,
  - find LU decomposition of a given matrix when possible,
  - solve a linear system when LU decomposition is given,
  - give the complexity of LU decomposition,
  - explain advantage of LU decomposition compared to plain Gaussian elimination,
- explain error amplification via condition numbers of matrices,
  - explain and use the notion of vector norm,
  - explain and use the notion of matrix norm,
  - find condition number of a matrix and what it means,
  - explain how error will amplify by examining condition number,
- explain how to solve linear systems using LU decomposition with partial pivoting (PA=LU) and related facts,
  - TBF

#### Notation/Settings/Acronyms

Common settings

| symbol | setting |
|---|---|
| $n$ | a positive integer |
| $A$ | nonsingular $n$-by-$n$ matrix |
| $b$ | (column) vector of length $n$ |
| $x$ | (column) vector of length $n$ |

Acronyms

|Abbreviation| meaning|
|---|---|
| SPD | Symmetric positive definite |

Common convention

| expression | meaning |
|---|---|
| $a_{ij}$, $a_{i,j}$, $A_{ij}$, $A_{i,j}$ | $(i,j)$-component of a matrix $A$ ($i$-th row, $j$-th column) |



**Problem of interest**

Given $A$ and $b$, find $x$ such that

$$ Ax = b. $$




#### Methods

- Methods for general matrices
   1. Direct methods
      - plain Gaussian elimination
      - Gaussian elimination using $PA = LU$ decomposition.
        - Preliminary: $A=LU$ decomposition
   2. Iterative methods
      - Jacobi iteration
      - Gauss-Seidel iteration
- Methods for SPD matrices
   1. Direct methods
      - Cholesky factorization
   2. Iterative methods
      - Conjugate gradient method
- Framework for improvements
   1. Preconditioning

**Remark**

- Direct method: Gives the exact solution in a finite number of steps. 
  - Caveat: rounding errors may destroy this nature in practice.
- Itervative method: Gives approximate solution every step. 
  - Theoreicially, true solution is obtained as a limit. 
  - In practice, a reasonable number of iterations can give a very good approximate solution.

### Gaussian elimination


#### Augmented matrix

Compact rearrangement of a system of linear equations in matrix form

$$
\begin{aligned}
x+2 y-z & =3 \\
2 x+y-2 z & =3 \\
-3 x+y+z & =-6 
\end{aligned}
\leftrightarrow
\left[\begin{array}{rrr:r}
1 & 2 & -1 & 3 \\
2 & 1 & -2 & 3 \\
-3 & 1 & 1 & -6
\end{array}\right]
$$


#### Elimination


##### Elementary row operations 

1. Swap one equation (or a row) for another (row): $R_i \leftrightarrow R_j$.
2. Add or subtract a multiple of one equation (or a row) from another (row): $R_i \gets R_i + c R_j$.
3. Multiply an equation (or a row) by a nonzero constant: $R_i \gets c R_i$.


**Example**

Solve the following system of linear equations using the augmented matrix and elementary row operations: 

$$
\begin{aligned}
x+2 y-z & =3 \\
2 x+y-2 z & =3 \\
-3 x+y+z & =-6 .
\end{aligned}
$$

[Example of Gaussian eliminations 1](../images/ex_GaussianElimination1_lp1000.png)

[Example of Gaussian eliminations 2](../images/ex_GaussianElimination2_lp1000.png)

#### Back substitution



**Remark** (back substitution)

- While there can be many creative ways to find the solution, we will following one single way: we are *developing a systematic method*.
- Let us call, in this class, the first step *elimination*.
- The second step (finding unknowns one by one) is called *back substitution* or *back solving*.

#### Complexity of Gaussian elimination

| Step | Complexity (precise) | Complexity (order) | 
|------|------------|-----|
| eliminations | $$ \frac 2 3 n^3 + \frac 1 2 n^2 - \frac 7 6 n $$ | $$=\mathcal{O}(n^3) $$ | 
| back substitutions | $$ n^2 $$ |  $$= \mathcal{O}(n^2) $$ |

[Derivation of complexity of eliminations 1](../images/der_ComplexityGaussianEliminations1_lp2000.png)

[Derivation of complexity of eliminations 2](../images/der_ComplexityGaussianEliminations2_lp2000.png)

[Derivation of complexity of back substitions](../images/der_ComplexityBackSubstitutions_lp2000.png)

___

### A = LU decomposition


Intuition: Gaussian eliminations can be encapsulated in matrix form. (provided there are no issues)

- We will see that $L^{-1}$ encodes the elimination while $U$ encodes the result of the elimination.

#### Method

**Algorithm** (LU factorization)


Algorithm is borrowed from Kincaid and Cheney (2002) p. 155.

**Data**

- $A=(a_{ij})$: matrix
- $n$: size of matrix

**Computation**

- **for** $k=1$ to $n$ **do**
  - $\ell_{kk} \gets 1$
  - **for** $j=k$ to $n$ **do**
    - $u_{k j} \gets a_{k j}-\sum_{s=1}^{k-1} \ell_{k s} u_{s j}$
  - **for** $i=k+1$ to $n$ **do**
    - $\ell_{i k} \leftarrow\left(a_{i k}-\sum_{s=1}^{k-1} \ell_{i s} u_{s k}\right) / u_{k k}$

**Output**

- $L=(\ell_{ij})$
- $U=(u_{ij})$

**Remark**

- This algorithm works only if there is no zero pivot encountered.
    

#### Preliminaries


##### Triangular matrices



**Definition**

1. A $n$-by-$n$ matrix $L$ is called:*lower triangular* if $\ell_{ij}=0$ for $i < j$. In addition, if $\ell_{ij}=1$ for $i = j$, it is called *unit* lower triangular.

$$
\left[\begin{array}{ccccc}
\ell_{1,1} & & & & 0 \\
\ell_{2,1} & \ell_{2,2} & & & \\
\ell_{3,1} & \ell_{3,2} & \ddots & & \\
\vdots & \vdots & \ddots & \ddots & \\
\ell_{n, 1} & \ell_{n, 2} & \ldots & \ell_{n, n-1} & \ell_{n, n}
\end{array}\right]
$$

2. A $n$-by-$n$ matrix $U$ is called *upper triangular* if $u_{ij}=0$ for $i > j$. In addition, if $u_{ij}=1$ for $i = j$, it is called *unit* upper triangular.

$$
U=\left[\begin{array}{ccccc}
u_{1,1} & u_{1,2} & u_{1,3} & \ldots & u_{1, n} \\
& u_{2,2} & u_{2,3} & \ldots & u_{2, n} \\
& & \ddots & \ddots & \vdots \\
& & & \ddots & u_{n-1, n} \\
0 & & & & u_{n, n}
\end{array}\right]
$$

**Properties of triangular matrices**

**Fact** (From linear algebra)

- If an $n$-by-$n$ matrix $A$ is invertible, then the eigenvalues of $A^{-1}$ are precisely the inverse of eigenvalues of $A$. 
- Determinant of a triangular matrix is the product of its diagonal entries.
- The eigenvalues of a lower triangular matrix are precisely its diagonal entries.



**Theorem** (Triangular matrices and their algebraic structure)

Lower triangular shape is preserved under addition, scalar multiplication, matrix multiplication, and inversion. More specifically, 

1. If $L_1$ and $L_2$ are lower triangular matrices of size $n$-by-$n$, then $L_1 + L_2$ also lower triangular. 
2. If $L_1$ is a lower triangular matrix of size $n$-by-$n$ and $\alpha$ is a scalar, then $\alpha L_1$ is also lower triangular. 
3. If $L_1$ and $L_2$ are lower triangular matrices of size $n$-by-$n$, then $L_1 L_2$  also lower triangular. Furthermore, $[L_1 L_2]_{ii}=[L_1]_{ii}[L_2]_{ii}$ for $i=1,2,\cdots,n$
   - If $L_1$ and $L_2$ are unit lower triangular matrices of size $n$-by-$n$, then $L_1 L_2$  also unit lower triangular. 
4. If $L_1$ is a lower triangular matrix of size $n$-by-$n$ and it is invertible, then $L_1^{-1}$ is also lower triangular. Furthermore, $[L_1^{-1}]_{ii}=[L_1]_{ii}^{-1}$.
   - If $L_1$ is a unit lower triangular matrix of size $n$-by-$n$ and it is invertible, then $L_1^{-1}$ is also unit lower triangular. 

The same is true for upper triangular matrices.

[Proof of properties of triangular matrices 1](../images/pf_PropTriangularMatrices1_lp2000.png)

[Proof of properties of triangular matrices 2](../images/pf_PropTriangularMatrices2_lp2000.png)


#### Lemmas for LU decomposition

**Lemma 1 for LU** (matrix of row subtraction)

The elementary row operation $R_{i} \gets R_{i}+(-c)R_{j}$ can be represented by a matrix multiplication by $L_{ij}(-c)$ on the left, where

$$
[L_{ij}(-c)]_{k \ell} = \begin{cases}
1 & (k = \ell) \\
-c & (k = i, \ \ell = j) \\
0 & (\text{otherwise}),
\end{cases}
$$

or, 

$$
L_{i j}(-c)=\left[\begin{array}{ccccccc}
1 & & & & & & \\
& \ddots & & & & & \\
& & 1 & & & & \\
& & & \ddots & & & \\
& & -c & & 1 & & \\
& & & & & \ddots & \\
& & & & & & 1
\end{array}\right]
$$

**Lemma 2 for LU** (Product of row subtraction)

Let $L_{ij}(c_{ij})$ be defined as above. If $j$ is fixed, then, we have 

$$
\left[\prod_{i=j+1}^n L_{ij}(c_{ij})\right]_{k \ell} = \begin{cases}
1 & (k = \ell) \\
c_{ij} & (k = i, \ \ell = j) \\
0 & (\text{otherwise}),
\end{cases}
$$

or, 

$$
\prod_{i=j+1}^n L_{ij}(c_{ij})
=\left[\begin{array}{ccccccc}
1 & & & & & & \\
& \ddots & & & & & \\
& & 1 & & & & \\
& & c_{j+1,j} & \ddots & & & \\
& & c_{j+2,j} & & 1 & & \\
& & \vdots & & & \ddots & \\
& & c_{n,j} & & & & 1
\end{array}\right]
$$

For example in $4$-by-$4$ case with $j=1$, 

$$
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
c_{21} & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}\right)
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
c_{31} & 0 & 1 & 0 \\
0 & 0 & 0 & 1 \\
\end{array}\right)
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
c_{41} & 0 & 0 & 1 \\
\end{array}\right)
=
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
c_{21} & 1 & 0 & 0 \\
c_{31} & 0 & 1 & 0 \\
c_{41} & 0 & 0 & 1 \\
\end{array}\right)
$$


**Lemma 3 for LU** (Inverse of row reduction)

Let $L$ be an $n$-by-$n$ lower triangular matrix whose diagonal elements are all 1, and only one column has nonzero elements below diagonal. Then, $A^{-1}$ is of the same form as $A$ except the signs of elements below diagonal being flipped.

In [None]:
"""This script verifies the inversion of a triangular matrix.

1. If only one column has nonzero element below the diagonal,
    then the matrix inversion is mechanical.
2. If more columns has nonzero elements below the diagonal,
    then inversion is not that simple. 
"""

import numpy as np

AA = np.eye(3)
AA[1, 0] = 2
AA[2, 0] = -5
# A[2, 1] = 3 # uncomment this line to see case 2

B = AA.copy()
low_diag_ind = np.tril_indices_from(B, -1)
B[low_diag_ind] = - AA[low_diag_ind]

print("A: \n", AA)
print("\nB: \n", B)
print("\nA*B: \n", AA@B)
print("\nA^-1:\n", np.linalg.inv(AA))

**Lemma 4 for LU** (Product of elementary matrix)

The following patterns generalizes to any size $n$-by-$n$ as long as

1. each matrix is unit lower triangular,
2. each matrix has at most one column that is filled with nonzero entries below diagonal, and
3. the order is kept, namely, the matrix with a column of smaller index is multiplied more to the left.

$$
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
c_{21} & 1 & 0 & 0 \\
c_{31} & 0 & 1 & 0 \\
c_{41} & 0 & 0 & 1 \\
\end{array}\right)
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & c_{32} & 1 & 0 \\
0 & c_{42} & 0 & 1 \\
\end{array}\right)
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & c_{43} & 1 \\
\end{array}\right)
=
\left(\begin{array}{cccc}
1 & 0 & 0 & 0 \\
c_{21} & 1 & 0 & 0 \\
c_{31} & c_{32} & 1 & 0 \\
c_{41} & c_{42} & c_{43} & 1 \\
\end{array}\right)
$$

In [None]:
"""This script verifies product of elementary triangular matrices.

1. If the matrix with earlier column filled is multiplied more 
    to the left, the product is as easy as writing out.
2. If not, this property is lost.
"""

import numpy as np

AA = np.eye(3)
AA[1, 0] = 2
AA[2, 0] = -5
# A[2, 1] = 3 # uncomment this line to see case 2

B = np.eye(3)
B[2, 1] = 3


print("A: \n", AA)
print("\nB: \n", B)
print("\nA*B: \n", AA@B)
print("\nB*A: \n", B@AA)


**Remark** 

- Prove these lemmas very carefully. Or at least verify them carefully. 

#### Finding LU decomposition

**Example** (Finding LU decomposition)

1. Represent the Gaussian elimination of the following system of linear equations using elementary lower triangular matrices, and
2. find $A=LU$ decomposition, where $L$ is lower unit triangular and $U$ is upper triangular.

$$
\begin{aligned}
x+2 y-z & =3 \\
2 x+y-2 z & =3 \\
-3 x+y+z & =-6 .
\end{aligned}
$$

[Example of LU decomposition 1](../images/ex_LUdecomposition1_lp2000.png)

[Example of LU decomposition 2](../images/ex_LUdecomposition2_lp2000.png)

#### Solving linear system given LU decomposition



Once $A=LU$ is obtained, we solve $Ax=b$ via two steps.

1. Solve $Lc = b$ for $c$, then
2. solve $Ux = c$ for $x$.

Reason

From $Ax = LUx = b$, we have $Ux = L^{-1}b =: c$, hence $x=U^{-1}c$.

**Remark**

- Both steps can be computed efficiently because $L$ and $U$ are both triangular, hence their inversion is nothing but a (back or forward) substibution.

**Example** (Solving a linear system given LU)

Given 

$$
A 
= \left[\begin{array}{rrr}
1 & 2 & -1 \\
2 & 1 & -2 \\
-3 & 1 & 1
\end{array}\right]
=\left[\begin{array}{rrr}
1 & 0 & 0 \\
2 & 1 & 0 \\
-3 & -\frac{7}{3} & 1
\end{array}\right]\left[\begin{array}{rrr}
1 & 2 & -1 \\
0 & -3 & 0 \\
0 & 0 & -2
\end{array}\right]
=L U
$$

and 

$$
b = [3, 3, -6]^T,
$$

solve $Ax=b$.

[Example of solving linear system given LU](../images/ex_SolvingLinSystemGivenLU_lp2000.png)

#### Complexity comparisons: Gaussian elimination and LU



Though the Gaussian elimination and LU decomposition use the same idea, they are quite different in practical manner. 

| | Gaussian elimination | LU factorization |
|---|---|---|
| $b$ | included in the augmented matrix | not included |
| approximate complexity <br> for $Ax=b$ | $$ \frac 2 3 n^3 $$ | $$ \frac 2 3 n^3$$ |
| approximate complexity <br> for multiple problems: <br> $Ax_i=b_i$ <br> ($i=1,2,\cdots,k$)| $$ \frac 2 3 k n^3 $$ | $$ \frac 2 3 n^3 + 2 k n^2$$ |



**Remark**

- The above table summarizes approximate complexity rather than very precise ones. The reason we have $2kn^2$ for multiple problems using LU decomposition (second order term rather than third order compared to a single problem) is that we need to compute two more substitions ($Lc=b$ and $Ux=c$) for each problem. On the other hand, for Gaussian elimination, we have to recompute all over again for each problem, hence complexity $(2/3) kn^3$.
- The multiple problems scenario ($Ax_i=b_i$ for $i=1,2,\cdots,k$) is common in applications because it is often the case $A$ comes from discretizing the integral or differential operator that governs the application and $b$ comes from different data. For example, accoring to Sauer (2017), in structural engineering, $b$ is called *loading vector* and the solution $x$ gives *stress*. We may want to see how stress looks like for many different loading vectors.

### Source of errors and conditioning

#### Norms

**Motivation**

We want to study how errors amplify in the course of finding solutions. For this, we need to clarify what we mean by the size of a vector. Also, can we extract a single number that summarizes the behavior of a matrix?

**Definition** (Vector norms)

Given a vector space $(V, +, \cdot)$ over a field $\mathbb{F}$ (i.e., $\mathbb{R}$ or $\mathbb{C}$), we call a function $\| \ \cdot \ \|:V \to [0,\infty)$ a *norm* if it satisfies the following. (You may focus only on more concrete setting, say $\mathbb{R}^{n}$ with usual addition and scalar multiplication over $\mathbb{R}$.)

- $\| x \| \ge 0$ and $\| x \| = 0$ only if $x=0$ (zero vector),
- for each scalar $\alpha$ and a vector $x$, we have $\| \alpha x \| = |\alpha| \| x \|$,
- for any vectors $x,y$, triangle inequality holds: $\|x + y \| \le \| x \| + \| y \|$.

**Definition** (Operator norm)

Let $A:V \to W$ be a map, where $V, W$ is a (finite dimensional) vector space that a norm is defined, $\|\ \cdot \ \|_V$ and $\|\ \cdot \ \|_W$ respectively. Then, the *operator norm* of $A$ is defined by

$$
\| A \| := \mathrm{max} \{ \frac{\| A x \|_W }{\| x\|_V} \ : \ x \in V, x \neq 0\}.
$$

**Remark** (Operator norm of a matrix)

- Matrix multiplication can be thought of a map between $\mathbb{R}^{n}$. Hence, it makes sense that a matrix has a natural norm derived from the operator norm.
- Operator norm depends on the vector norm being used.
- If $V=W$, we usually choose the same norm even though it is possible to couple two different norms for $\| A \|$. In this case, we omit the subscript and write $\| x \|$ for $x \in V$.

**Terminology** (Consistency of matrix norm)

By the definition of the operator norm, we have, for any vector $x$,

$$
\| A x \|_W \le \| A \| \| x\|_V,
$$

or if $A$ maps between the same space $V$ with the same vector norm $\| \ \cdot \ \|$,

$$
\| A x \| \le \| A \| \| x\|.
$$

Referring to this property, we say that $\|\ A \|$ are *consistent* with vector norms appearing.

**Definition** (Norms on $\mathbb{R}^{n}$ and square matrices)

Given $x=(x_1, x_2, \cdots, x_n)\in {\mathbb{R}^n}$ and a matrix $A \in {\mathbb{R}^{n\times n}}$, 

- Vector norm

$$
\| x \|_\infty := \mathrm{max}\{ |x_i| \ : \ 1 \le i \le n \},
$$

$$
\| x \|_1 := \sum_{i=1}^n |x_i|,
$$

$$
\| x \|_2 := \sqrt{\sum_{i=1}^n |x_i|^2},
$$

$$
\| x \|_p := \left(\sum_{i=1}^n |x_i|^p \right)^{1/p}, \quad \text{ where } p \ge 1
$$

- Matrix norm

$$
\| A \|_\infty := \mathrm{max}_{1\le i \le n} \{ \| r_i \|_1 \ : \ r_i \text{ is the }i\text{-th row of } A  \}.
$$

$$
\| A \|_1 := \mathrm{max}_{1\le i \le n} \{ \| c_i \|_1 \ : \ c_i \text{ is the }i\text{-th column of } A  \}.
$$

$$
\| A \|_2 := \mathrm{max} \{ \sqrt{\lambda} \ : \ \lambda \text{ is an eigenvalue of } A^T A \}
$$



**Remark** 

- There are many other matrix norms. But we focus on most basic norms.
- It can be shown that $\| A \|_\infty$ is the operator norm of $A$ when we use $\| x \|_\infty$ for vector norm.

#### Error magnification and condition number

**Notation** (Temporary abuse of norm notation)

- In the following examples, we are going to use general norm notation, $\| A \|$ and $\| x \|$ since they indeed work in general norm settings. However, when we carry out concrete calculations, we will compute the infinity norms. The reason for this is pedagogical: the infinity norms are easy to compute whille still conveying the idea.

**Definition** (Residual, backward error, forward error)

Given a linear system $Ax=b$ and an approximate solution $\tilde x$, 

- $r:=b-A \tilde x$ is called *residual*,
- $\| r \| = \| b-A \tilde x \|$ is called *backward error*,
- $\| x - \tilde x \|$ is called *forward error*.

**Remark** 

- Many people say residual when then really mean backward error if there is no confusion. 
- Many people simply say *the error* when they refer to the forward error.
- We can measure the errors and backward errors in various norms. But, in this class, we will use infinity norms in the examples. 

**Example** (Forward and backward errors)

Find the forward and backward errors for the approximate solution $[-1, 3.0001]$ of the system

$$
\begin{aligned}
x_1+x_2 & =2 \\
1.0001 x_1+x_2 & =2.0001.
\end{aligned}
$$

[Example: Forward and backward error](../images/ex_ForBackErrLinSystem_lp2000.png)

**Definition** (Relative backward/forward error and error magnification factor)

Given a linear system $Ax=b$, an approximate solution $\tilde x$, the error $e=x-\tilde x$, and the residual $r=b-A\tilde x$, 

- the *relative backward error* is defined by

$$
\frac{\|r\|}{\|b\|}
$$

- the relative forward error is defined by

$$
\frac{\left\|e \right\|}{\|x\|}
$$

- the *error magnification factor* for $Ax=b$ is defined by

$$
\frac{\text { relative forward error }}{\text { relative backward error }}
$$


**Definition** (Condition number)

The condition number $\mathbf{cond}(A)$ of a square matrix $A$ is the maximum possible error magnification factor for solving $Ax=b$, over all right hand side $b$.

**Theorem** (Condition number)

For a square matrix, we have

$$
\mathbf{cond}(A)=\| A \| \| A^{-1} \|.
$$

[Proof: Condition number 1](../images/pf_CondNumMatrix1_lp3000.png)

[Proof: Condition number 2](../images/pf_CondNumMatrix2_lp3000.png)


**Example** 

Given the following linear system with the approximate solution $[-1, 3.0001]$, find relative backward error, relative forward error, and error magnification factor.
Also, find the largest possible error magnification factor for the following system. 

$$
\begin{aligned}
x_1+x_2 & =2 \\
1.0001 x_1+x_2 & =2.0001.
\end{aligned}
$$

[Example: Relative forward and backward error](../images/ex_RelErrMagnFacLinSystem1_lp2000.png)

[Example: Error maginfication factor](../images/ex_RelErrMagnFacLinSystem2_lp2000.png)

**Remark** (Comments on condition numbers)

- Fortunately, large condition numbers are unusual. (Sauer (2017) p. 94)
- However, in many applications, numerically implemented differential operators often result in a linear system with rapidly growing condition number as the discretization gets finer (on top of larger size of the linear system).

#### Pivoting

<!-- **Motivating example**

Recall the homework problem, where we solved the following linear system by programming 2D version of Gaussian elimination.

$$
\left\{
\begin{aligned}
10^{-20} x_1+x_2 & =1 \\
x_1+2 x_2 & =4
\end{aligned}
\right. 
$$

If we programmed "right," it should produce $(x_1, x_2)\approx(0,1)$, while the package `numpy.linalg.solve` gave $(x_1, x_2)\approx(2,1)$. 

What is going on? -->

<!-- **Remark**

- The source of error is that the first coefficient `A[0,0]` is so small that the computer (using IEEE double precision) "thinks" it is 0. 

$$
\left\{
\begin{aligned}
10^{-20} x_1+x_2 & =1 \\
x_1+2 x_2 & =4
\end{aligned}
\right. 
\Rightarrow
\left\{
\begin{aligned}
0 x_1+x_2 & =1 \\
x_1+2 x_2 & =4
\end{aligned}
\right. 
$$

Then, the first equation leads to $x_2=1$, which in turn, leads to $x_1 = 0$. Hence, we have $(x_1, x_2)\approx(0,1)$. 

- However, if we carry the small coefficient $10^{-20}$ and meticulously solve the equation, we obtain $(x_1, x_2)\approx(2,1)$. -->

<!-- **Idea** (Swapping or partial pivoting)

- Push down a row whose pivot is small, and pull up the one with the largest coefficient in that column.
- This procedure is called *partial pivoting*. -->

### PA = LU decomposition

**Motivation**

1. Not all matrices have $LU$ factorization.
2. One of the main sources of error can be resolved by swapping.

**Example** ($LU$ may not exist)

The following matrix does not have $LU$ factorization.

$$
A 
= \left[\begin{array}{rr}
0 & 1 \\
1 & 0 \\
\end{array}\right]
$$

[Example of impossibility of LU factorization](../images/ex_LUimpossible_lp2000.png)

**Idea**

Search a good pivot candidate and swap the rows.

#### Partial pivoting

**Algorithm** (Partial pivoting)

Given an $n$-by-$n$ matrix $A$, the partial pivoting conducted on the $j$-th column reads:

1. find $p\in\{j, j+1, \cdots, n\}$ such that $|a_{pj}| \ge |a_{ij}|$ for all $i\in\{j, j+1, \cdots, n\}$.
2. Exchange the rows $R_j \leftrightarrow R_p$.

**Algorithm** (LU decomposition with partial pivoting)

Given an $n$-by-$n$ matrix $A$,

- **for** $j=1,2,\cdots,n-1$, **do**
  - conduct partial pivoting on $j$-th column (and record the rows exchanged)
  - eliminate subdiagonal entries of $j$-th column


**Remark** (Consequence of partial pivoting)

- The multipliers involved in elimination is always less than or equal to 1 in absolute value: $|c_{ij}|\le 1$ in $L_{ij}(c_{ij})$ notation of row reduction.  
- Equivalently, every entry of $L$ is less than or equal to 1 in absolute value. (Recall that the subdiagonal entries of $L$ are the multipliers of row reduction.)
- Partial pivoting also deals with encounters of 0 pivot (by swapping with other row with nonzero entry on the column).

**Example** ($PA = LU$ decomposition)

In [None]:
import numpy as np

def p_pivot(A, j, verbose=True):
    """ Retrun a matrix of partial-pivoted.
    
    Input:
        A: a square matrix
        j: the column index to be pivoted. Indexing starts from 0.
    Output:
        None: the matrix is modified in place due to "pass by reference".
    """

    n = A.shape[0]
    p = np.argmax(np.abs(A[j:, j])) + j
    if p != j:
        tmp = A[p].copy()
        A[p] = A[j]
        A[j] = tmp
        if verbose:
            print(f"Rows exchanged: {j} <--> {p}.")
    else:
        if verbose:
            print(f"No row exchange.")

In [None]:
def elim_col(A, j, verbose=True):
    """ Eliminate the j-th column of A.
    
    Input:
        A: a square matrix
        j: the column index to be eliminated. Indexing starts from 0.
    Output:
        None: the matrix is modified in place due to "pass by reference".
    """
    if verbose:
        print(f"Eliminating column {j}.")

    n = A.shape[0]
    for i in range(j+1, n):
        m = A[i, j]/A[j, j]
        A[i] = A[i] - m*A[j]

In [None]:
"""Illustrate Gaussian elimination with partial pivoting."""

A = np.array([[1, -1, 3], 
              [-1, 0, -2], 
              [2, 2, 4]], dtype=np.float64)
b = np.array([-3, 1, 0]).reshape(-1, 1)

# Augmented matrix
M = np.hstack((A, b))
print("Augmented matrix: \n", M)

p_pivot(M, 0)
print(M)
elim_col(M, 0)
print(M)

p_pivot(M, 1)
print(M)
elim_col(M, 1)
print(M)

#### Permutation matrices

**Definition** (Permutation matrix)

A permutation matrix is a square matrix consisting of all zeros, except for a single 1 in every row and column.


**Example** (Permutation matrix)

(2-by-2)

$$
\left[\begin{array}{ll}
1 & 0 \\
0 & 1
\end{array}\right],\left[\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right]
$$

(3-by-3) 

$$
\begin{aligned}
& {\left[\begin{array}{lll}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{array}\right],\left[\begin{array}{lll}
0 & 1 & 0 \\
1 & 0 & 0 \\
0 & 0 & 1
\end{array}\right],\left[\begin{array}{lll}
1 & 0 & 0 \\
0 & 0 & 1 \\
0 & 1 & 0
\end{array}\right]} \\
& {\left[\begin{array}{lll}
0 & 0 & 1 \\
0 & 1 & 0 \\
1 & 0 & 0
\end{array}\right],\left[\begin{array}{lll}
0 & 0 & 1 \\
1 & 0 & 0 \\
0 & 1 & 0
\end{array}\right],\left[\begin{array}{lll}
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 0 & 0
\end{array}\right]}
\end{aligned}
$$

**Theorem** (Permutation matrix and row exchange)

Let $P$ be the $n$-by-$n$ permutation matrix formed by a particular set of row exchanges applied to the $n$-by-$n$ identity matrix. Then, for any $n$-by-$n$ matrix $A$, $P A$ is the matrix obtained by applying exactly the same set of row exchanges to $A$.

In [None]:
"""Illustrate Gaussian elimination with partial pivoting."""

A = np.arange(1, 10).reshape(3, 3)
I = np.eye(3)

P = I.copy()
P[1] = I[2]
P[2] = I[1]

print("A: \n", A)
print("\nP: \n", P)
print("\nPA: \n", P@A)

#### PA = LU factorization

**Theorem** ($PA =LU$ factorization)

Suppose $A$ is *any* $n$-by-$n$ matrix. Then, there exists a permutation matrix $P$, a unit lower triangular matrix $L$, and an upper triangular matrix $U$ such that

$$
PA = LU,
$$

where all matrices appearing are of size $n$-by-$n$. Furthermore, if $A$ is invertible, then $U$ has nonzero diagonal entries.

**Remark** (Proof of $PA=LU$ factorization; can be skipped in undergraduate class)

- We skip the proof. While there are some steps that not not obvious, the main intuition is explain in the algorithm given above.
- The proof hinges on the following facts.
  - If a permutation matrxi is obtained by exchanging only two rows of identity matrix, it is called a *simple permutation*.
  - A simple permutation matrix is its own inverse.
  - Partial pivoting involves only simple permutations of rows below diagonals of the column where the elimination are going on.
  - These two ensures the inversion of row reduction matrices is kept simple just as in $LU$.

**Remark** (Implementation of $PA=LU$)

- We only need to store a vector, call it $s$, representing the partial pivoting, not the whole permutation matrix.
- We can store $L$ and $U$ in one matrix $\tilde U = U + L - I$. 
  - Use the empty space of $U$ to put subdigonal entries of $L$.
  - We already know the diagonal entries of $L$ is always 1.
- The embedded entries of $L$ move correctly along subsequent swapping if they are inserted upon row reduction done. (See Sauer (2017) pp. 102-103)
- We can *read off* $PA=LU$ from $\tilde U$ and $s$ thanks to the previous properties. (reading off $P$ needed to be checked.)
- We can use $s$ when we conduct back substitution to recover the correct order of unknowns.
- Following these, all bookkeeping of elimination and pivoting are automatic and contained in the matrix equation $PA=LU$. (Sauer (2017) p. 103)

**Example** (Finding $PA=LU$ factorization)

Find the $PA=LU$ factorization of 

$$
A=\left[\begin{array}{rrr}
2 & 1 & 5 \\
4 & 4 & -4 \\
1 & 3 & 1
\end{array}\right]
$$

#### Solving system from $PA=LU$

We want to solve $Ax=b$ while we have information $PA=LU$. By multiplying $Ax=b$ by $P$ on the left, we have $P A x=P b$, or

$$
L U x=P b.
$$

Hence, we can do the same things as in $LU$ factorization without partial pivoting.

**Algorithm** (Solve $Ax=b$ given $PA=LU$)

1. Solve $Lc=Pb$ for $c$.
2. Solve $Ux=c$.

**Remark**

- The first step in the above recovers the correct order from swapping conducted on $A$.

**Example** (Solving a system using $PA=LU$)

Find the solution of $Ax=b$, where

$$
A=\left[\begin{array}{rrr}
2 & 1 & 5 \\
4 & 4 & -4 \\
1 & 3 & 1
\end{array}\right], \quad b=\left[\begin{array}{l}
5 \\
0 \\
6
\end{array}\right] 
$$

and $PA=LU$ decomposition

$$
\left[\begin{array}{lll}
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 0 & 0
\end{array}\right]\left[\begin{array}{rrr}
2 & 1 & 5 \\
4 & 4 & -4 \\
1 & 3 & 1
\end{array}\right]=\left[\begin{array}{rrr}
1 & 0 & 0 \\
\frac{1}{4} & 1 & 0 \\
\frac{1}{2} & -\frac{1}{2} & 1
\end{array}\right]\left[\begin{array}{rrr}
4 & 4 & -4 \\
0 & 2 & 2 \\
0 & 0 & 8
\end{array}\right]
$$



___

##### Application of LU decomposition

- Solving systems of linear equations
- Determinant
- Inverting matrices

Reference: [Wikipedia](https://en.wikipedia.org/wiki/LU_decomposition#Applications)