---
title: "Lecture 8: QR factorizations"
author: "Jamie Haddock"
format: 
    revealjs:
        output-file: Lecture8_slides
        slide-number: true
        chalkboard: 
            buttons: false
        preview-links: auto
        logo: figs/hmc.png
        css: input/slides.css
        incremental: true
        smaller: true
        code-fold: true
    html: 
        code-fold: true
    pdf:
        documentclass: article
        toc: true
        number-sections: true
        geometry:
          - top=1in
          - left=1in
          - bottom=1in
          - right=1in
format-links: false
jupyter: julia-1.9
filters: 
  - input/remove-pause.lua
execute:
  echo: true
  eval: true
---

## Difference of orthogonal vectors

::: {.callout-caution icon=false}
## Exercise: Difference of orthogonal vectors
Suppose that $\mathbf{q}_1$ and $\mathbf{q}_2$ are orthogonal vectors.  Prove then that $$\|\mathbf{q}_1 - \mathbf{q}_2\|^2 = \|\mathbf{q}_1\|^2 + \|\mathbf{q}_2\|^2.$$
:::
<details><summary>Answer:</summary> 

We can expand the norm using the inner product as $$\|\mathbf{q}_1 - \mathbf{q}_2\|^2 = (\mathbf{q}_1 - \mathbf{q}_2)^\top (\mathbf{q}_1 - \mathbf{q}_2) = \mathbf{q}_1^\top \mathbf{q}_1 - 2 \mathbf{q}_1^\top \mathbf{q}_2 + \mathbf{q}_2^\top \mathbf{q}_2 = \|\mathbf{q}_1\|^2 + \|\mathbf{q}_2\|^2.$$
</details>

. . .

There is no possibility of subtractive cancellation here.

::: {.callout-warning icon=false}
## Fact: 
Addition and subtraction of vectors are guaranteed to be well-conditioned when the vectors are orthogonal.
:::

## ONC matrices

::: {.callout-note icon=false}
## Definition: ONC matrix
An **ONC matrix** is one whose columns are an orthonormal set of vectors.
:::

. . .

The following result follows from identifying the possible inner products between columns an ONC matrix.

::: {.callout-warning icon=false}
## Theorem: ONC matrix
Suppose $\mathbf{Q}$ is a real $n \times k$ ONC matrix.  Then

1. $\mathbf{Q}^\top \mathbf{Q} = \mathbf{I}$ (the $k\times k$ identity matrix)
2. $\|\mathbf{Q}\mathbf{x}\|_2 = \|\mathbf{x}\|_2$ for all $k$ vectors $\mathbf{x}$
3. $\|\mathbf{Q}\|_2 = 1$
:::

<details><summary>Proof:</summary> 

1. The first result follows since $(\mathbf{Q}^\top \mathbf{Q})_{ij} = \mathbf{q}_i^\top \mathbf{q}_j$.
2. $\|\mathbf{Q}\mathbf{x}\|_2^2 = (\mathbf{Q}\mathbf{x})^\top \mathbf{Q}\mathbf{x} = \mathbf{x}^\top \mathbf{Q}^\top \mathbf{Q} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x}\|_2^2$
3. $\|\mathbf{Q}\|_2 = \max_{\|\mathbf{x}\| = 1} \|\mathbf{Q}\mathbf{x}\|_2 = \max_{\|\mathbf{x}\| = 1} \|\mathbf{x}\|_2 = 1$
</details>

## Orthogonal matrix

::: {.callout-note icon=false}
## Definition: Orthogonal matrix
An **orthogonal matrix** is a square matrix with orthonormal columns.
:::

. . .

Orthogonal matrices have even strong properties than ONC matrices.

::: {.callout-warning icon=false}
## Theorem: Orthogonal matrix
Suppose $\mathbf{Q}$ is an $n \times n$ real orthogonal matrix.  Then
1. $\mathbf{Q}^\top = \mathbf{Q}^{-1}$
2. $\mathbf{Q}^\top$ is also an orthogonal matrix
3. $\kappa(\mathbf{Q}) = 1$ in the 2-norm
4. For any othre $n \times n$ matrix $\mathbf{A}$, $\|\mathbf{A}\mathbf{Q}\|_2 = \|\mathbf{A}\|_2$
5. If $\mathbf{U}$ is another $n \times n$ orthogonal matrix, then $\mathbf{Q}\mathbf{U}$ is also orthogonal.
:::

# Orthogonal factorization

We now come to another important variant of matrix factorization (like the LU factorization).

## QR factorization

::: {.callout-note icon=false}
## Definition: QR factorization
Every real $m \times n$ matrix $\mathbf{A}$ ($m \ge n$) can be written as $\mathbf{A} = \mathbf{Q} \mathbf{R}$ where $\mathbf{Q}$ is an $m \times m$ orthogonal matrix and $\mathbf{R}$ is an $m \times n$ upper triangular matrix.
:::

. . .

In linear algebra, you may have learned to compute the QR factorization through the Gram-Schmidt process, but it turns out this approach is numerically unstable so we'll learn a different technique!

---

Suppose $m \gg n$ and visualize the QR factorization,
$$\mathbf{A} = \begin{bmatrix} \mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_m \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & \cdots & r_{1m} \\ 0 & r_{22} & \cdots & r_2m \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & r_{nn} \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots &  & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix}.$$

. . .

Note that the many rows of all zeros at the bottom of $\mathbf{R}$ mean that $\mathbf{q}_{n+1}, \mathbf{q}_{n+2}, \cdots, \mathbf{q}_m$ do not contribute to the factorization.

. . .

::: {.callout-note icon=false}
## Definition: Thin QR factorization
The thin QR factorization is $\mathbf{A} = \hat{\mathbf{Q}}\hat{\mathbf{R}}$ where $\hat{\mathbf{Q}}$ is an $m \times n$ ONC matrix and $\hat{\mathbf{R}}$ is an $n \times n$ upper triangular matrix.
:::

---

In [22]:
A = rand(1.:9.,6,4)
@show m,n = size(A);

(m, n) = size(A) = (6, 4)


In [23]:
using LinearAlgebra

Q,R = qr(A);
Q

6×6 LinearAlgebra.QRCompactWYQ{Float64, Matrix{Float64}, Matrix{Float64}}

In [24]:
R

4×4 Matrix{Float64}:
 -16.4012  -10.7919   -10.548    -4.69477
   0.0      -6.74799   -1.65491  -5.97728
   0.0       0.0        5.00011   3.51747
   0.0       0.0        0.0       0.926656

. . .

Strangely, $\mathbf{Q}$ is $6 \times 6$ (full QR) and $\mathbf{R}$ is $4 \times 4$ (thin QR).  Howevver, $\mathbf{Q}$ is given in a nonstandard form and converting to a standard matrix will recover the thin QR $\hat{\mathbf{Q}}$.

In [25]:
Q̂ = Matrix(Q)

6×4 Matrix{Float64}:
 -0.54874    0.284816   0.136645   -0.382483
 -0.182913  -0.300241   0.914733    0.139313
 -0.182913  -0.89301   -0.281437   -0.222907
 -0.54874    0.136623  -0.112399   -0.393041
 -0.487769   0.039114  -0.216046    0.759469
 -0.304855  -0.105222  -0.0779476   0.230949

---

$\mathbf{Q}$ is an orthogonal matrix and $\hat{\mathbf{Q}}$ is an ONC matrix.

In [26]:
opnorm(Q'*Q - I)

7.451145070743261e-16

In [27]:
opnorm(Q̂'*Q̂ - I)

6.932657863084677e-16

## Least squares and QR

Suppose we have a thin QR factorization of $\mathbf{A} = \hat{\mathbf{Q}} \hat{\mathbf{R}}$ and we are solving least-squares via the normal equations:

\begin{align*}
\mathbf{A}^\top \mathbf{A} \mathbf{x} &= \mathbf{A}^\top \mathbf{b} \\
\hat{\mathbf{R}}^\top \hat{\mathbf{Q}}^\top \hat{\mathbf{Q}} \hat{\mathbf{R}} \mathbf{x} &= \hat{\mathbf{R}}^\top \hat{\mathbf{Q}}^\top \mathbf{b} \\
\hat{\mathbf{R}}^\top \hat{\mathbf{R}} \mathbf{x} &= \hat{\mathbf{R}}^\top \hat{\mathbf{Q}}^\top \mathbf{b}
\end{align*}

. . .

Now, if $\mathbf{A}$ is full-rank, then $\hat{\mathbf{R}}$ is nonsingular and we have $$\hat{\mathbf{R}} \mathbf{x} = \hat{\mathbf{Q}}^\top \mathbf{b}.$$

. . .

Thus, the algorithm for solving least-squares by thin QR is:

1. Compute the thin QR factorization $\mathbf{A} = \hat{\mathbf{Q}}\hat{\mathbf{R}}$.
2. Compute $\mathbf{z} = \hat{\mathbf{Q}}^\top \mathbf{b}$.
3. Solve the $n \times n$ linear system $\hat{\mathbf{R}} \mathbf{x} = \mathbf{z}$ for $\mathbf{x}$ via backsubstitution.  

---

This algorithm is essentially what is implemented in the Julia `\` operator.


In [28]:
#| echo: false

using FundamentalsNumericalComputation

In [29]:
"""
    lsqrfact(A,b)
Solve a linear least-squares problem by QR factorization.  Returns the minimizer of ||b - Ax||.
"""
function lsqrfact(A,b)
    Q,R = qr(A)
    z = Q'*b
    x = FNC.backsub(R,z)
    return x
end

lsqrfact

Recall previously we saw that the normal equations can be unstable -- this method doesn't change that fact, but allows us to solve them up to the loss of accuracy predicted due to the instability.

In [30]:
t = range(0,3,length=400)
f = [ x->sin(x)^2, x->cos((1+1e-7)*x)^2, x->1. ]
A = [ f(t) for t in t, f in f ]
x = [1., 2, 1]
b = A*x;

In [31]:
observed_error = norm(lsqrfact(A,b) - x)/norm(x);
@show observed_error;
κ = cond(A)
@show error_bound = κ*eps();

observed_error = 4.665273501889628e-9
error_bound = κ * eps() = 4.053030228488391e-9


# Computing QR factorizations

One can compute a thin QR factorization using the outer product formula (like we did with LU) factorization, which is essentially the Gram-Schmidt process.  However, this algorithm is unstable, and a better approach is to use products of orthogonal matrices to introduce zeros into the lower triangular portion of the matrix.  (We exploit the fact that products of orthogonal matrices are orthogonal.)

## Householder reflections

::: {.callout-note icon=false}
## Definition: Householder reflector
A **Householder reflector** is a matrix of the form $$\mathbf{P} = \mathbf{I} - 2\mathbf{v}\mathbf{v}^\top,$$ where $\mathbf{v}$ is any unit vector (in the 2-norm).
:::

. . .

::: {.callout-warning icon=false}
## Theorem: Householder reflector part 1
A Householder reflector is:

1. symmetric
2. orthogonal
:::

---

Note that $\mathbf{P}\mathbf{x} = \mathbf{x} - 2 \mathbf{v}(\mathbf{v}^\top\mathbf{x})$.  Visualizing this equation explains why these are called *reflectors*.

![](figs/householder.png){height=400}

---

Now, we may choose $\mathbf{v}$ so that this reflection $\mathbf{P}\mathbf{z}$ is very sparse.  In fact, we choose $\mathbf{v}$ so that $$\mathbf{P}\mathbf{z} = \begin{bmatrix} \pm \|\mathbf{z}\| \\ 0 \\ \vdots \\ 0 \end{bmatrix} = \pm \|\mathbf{z}\| \mathbf{e}_1.$$

. . .

::: {.callout-warning icon=false}
## Theorem: Householder reflection part 2
If $\mathbf{w} = \|\mathbf{z}\| \mathbf{e}_1 - \mathbf{z}$ and $\mathbf{v} = \mathbf{w}/\|\mathbf{w}\|$ then $$\mathbf{P}\mathbf{z} = \|\mathbf{z}\| \mathbf{e}_1.$$
:::



<!--
[verbose test]{.content-hidden when-format="revealjs" when-format="pptx"}

::: {.callout-caution icon=false}
## Exercise: 

:::

<details><summary>Answer:</summary> </details>


::: {.callout-note icon=false}
## Definition: 
 
:::


::: {.callout-tip icon=false}
## Note: 
 
:::
-->