# Gram-Schmidt Procedure and QR Factorization

In [1]:
using DrWatson;
@quickactivate "MATH361Lectures";

In [2]:
import MATH361Lectures
using LinearAlgebra

## Solving Linear Least Squares with Cholesky

The following Julia function uses the Cholesky factorization method to solve the problem of least squares, that is, given $A$ and $b$, we find $x$ that solves

$$\text{argmin}_{v \in \mathbb{R}^{n}} \|Av - b \|_{2}$$

In [3]:
function lsqcholesky(A,b)
   L,U = cholesky(A'*A);
   w = MATH361Lectures.forwardsub(L,A'*b);
   x = MATH361Lectures.backsub(U,w);
   return x
end

lsqcholesky (generic function with 1 method)

In [4]:
A = [1 2 -4; 3 -1 1; 1 -2 1; 3 -2 -1; 4 2 -1];
b = [-1; 2; -2; 1; 3];
x = lsqcholesky(A,b)

3-element Vector{Float64}:
 0.6296350152682856
 0.6441762396393778
 0.5746691871455578

Compare this with what we obtain using the Julia backslash operator:

In [5]:
x_bs = A \ b

3-element Vector{Float64}:
 0.6296350152682856
 0.6441762396393775
 0.5746691871455577

Additionally, for the solution we have obtained, let's compute $\|Ax - b\|_{2}$.

In [6]:
norm(A*x-b,2)

2.2560728637642335

In [7]:
norm(A*x_bs-b,2)

2.2560728637642335

Observe what happens if we introduce a small perturbation to $x$:

In [6]:
x_pert = x + [0.00001,-0.00002,0.00005]

3-element Vector{Float64}:
 0.6296450152682855
 0.6441562396393777
 0.5747191871455578

In [7]:
norm(A*x_pert - b,2)

2.256072880563336

We see that there is a small increase in the two norm for $\|Ax_{\text{pert}} - b\|_{2}$. To confirm that it is indeed a small perturnation, let's compute $\|x - x_{\text{pert}}\|_{2}$.  

In [8]:
norm(x - x_pert,2)

5.4772255750510576e-5

## Background for QR Factorization

### Orthonormal Vectors

Recall that the dot product of two column vectors ${\bf u} = [u_{1}, u_{2}, \ldots , u_{n}]^{T}$ and ${\bf v} = [v_{1}, v_{2}, \ldots , v_{n}]^{T}$ is

$${\bf u}\cdot {\bf v} = {\bf u}^{T} {\bf v} = u_{1}v_{1} + u_{2}v_{2} + \cdots + u_{n}v_{n}.$$

Observe that if ${\bf u}$ is a vector, then $\|{\bf u}\|_{2}^{2} = {\bf u}^{T} {\bf u}$, and also that ${\bf u}^{T}{\bf v}={\bf v}^{T}{\bf u}$.


Two vectors ${\bf u}$ and ${\bf v}$ are said to be **orthogonal** if their dot product is zero, that is, if ${\bf u}\cdot {\bf v} = {\bf u}^{T} {\bf v} = 0$. We say that a vector is normalized (in the 2-norm) if $\|{\bf u}\|_{2} = 1$.

### Orthonormal Set of Vectors

A set of vectors $\{{\bf q}_{1}, {\bf q}_{2}, \ldots, {\bf q}_{n}\}$ is an **orthogonal set** if ${\bf q}_{i}^{T}{\bf q}_{j} = 0$ whenever $i\neq j$.  Furthermore, an orthogonal set of vectors is an **orthonormal set** if, in addition $\|{\bf q}_{i}\|_{2} = 1 $ for all $i$.  

### Orthogonal Matrices

A matrix $Q$ is **orthogonal** if it's columns form a orthogonal set of vectors.  

A matrix $Q$ is **ONC** if it's columns form an orthonormal set. Equivalently, a matrix $Q$ is ONC if $Q^{T}Q = I$.

As an example, any permutation matrix $P$ is ONC. 

### The Gram Schmidt Procedure

Given any set of linearly independent vectors, ${{\bf a}_{1}, {\bf a}_{2}, \ldots , {\bf a}_{n}}$, there is a procedure known as the Gram Schmidt procedure that produces an orthonormal set ${{\bf q}_{1}, {\bf q}_{2}, \ldots , {\bf q}_{n}}$ with the same span as the original independent set. The Gram Schmidt procedure works as follows:

Set

$$
\begin{align*}
{\bf q}_{1} &= \frac{{\bf a}_{1}}{\|{\bf a}_{1}\|_{2}}, \\
{\bf q}_{2} &= \frac{{\bf a}_{2} - ({\bf q}_{1}^{T}{\bf a}_{2}) {\bf q}_{1}}{\|{\bf a}_{2} - ({\bf q}_{1}^{T}{\bf a}_{2}) {\bf q}_{1}\|_{2}}, \\
{\bf q}_{3} &= \frac{{\bf a}_{3} - ({\bf q}_{1}^{T}{\bf a}_{3}) {\bf q}_{1} - ({\bf q}_{2}^{T}{\bf a}_{3}) {\bf q}_{2}}{\|{\bf a}_{3} - ({\bf q}_{1}^{T}{\bf a}_{3}) {\bf q}_{1} - ({\bf q}_{2}^{T}{\bf a}_{3}){\bf q}_{2}\|_{2}}, \\
 &\vdots  \\
{\bf q}_{n} &= \frac{{\bf a}_{n} - ({\bf q}_{1}^{T}{\bf a}_{n}) {\bf q}_{1} - ({\bf q}_{2}^{T}{\bf a}_{n}) {\bf q}_{2} - \cdots - ({\bf q}_{n-1}^{T}{\bf a}_{n}) {\bf q}_{n-1}}{\|{\bf a}_{n} - ({\bf q}_{1}^{T}{\bf a}_{n}) {\bf q}_{1} - ({\bf q}_{2}^{T}{\bf a}_{n}){\bf q}_{2} - \cdots - ({\bf q}_{n-1}^{T}{\bf a}_{n}) {\bf q}_{n-1}\|_{2}}, 
\end{align*}
$$

Now, note that we can "reverse" the result of the Gram Schmidt procedure by solving for the ${\bf a}_{i}$ vectors as linear combinations of the ${\bf q}_{i}$ vectors:

$$
\begin{align*}
{\bf a}_{1} &= \|{\bf a}_{1}\|_{2}{\bf q}_{1}, \\
{\bf a}_{2} &= ({\bf q}_{1}^{T}{\bf a}_{2}){\bf q}_{1} +  \|{\bf a}_{2} - ({\bf q}_{1}^{T}{\bf a}_{2}) {\bf q}_{1}\|_{2}{\bf q}_{2}, \\
{\bf a}_{3} &= ({\bf q}_{1}^{T}{\bf a}_{3}) {\bf q}_{1} + ({\bf q}_{2}^{T}{\bf a}_{3}) {\bf q}_{2} + \|{\bf a}_{3} - ({\bf q}_{1}^{T}{\bf a}_{3}) {\bf q}_{1} - ({\bf q}_{2}^{T}{\bf a}_{3}){\bf q}_{2}\|_{2} {\bf q}_{3}, \\
 &\vdots  \\
{\bf a}_{n} &= ({\bf q}_{1}^{T}{\bf a}_{n}) {\bf q}_{1} + ({\bf q}_{2}^{T}{\bf a}_{n}) {\bf q}_{2} + \cdots + ({\bf q}_{n-1}^{T}{\bf a}_{n}) {\bf q}_{n-1} + \|{\bf a}_{n} - ({\bf q}_{1}^{T}{\bf a}_{n}) {\bf q}_{1} - ({\bf q}_{2}^{T}{\bf a}_{n}){\bf q}_{2} - \cdots - ({\bf q}_{n-1}^{T}{\bf a}_{n}) {\bf q}_{n-1}\|_{2} {\bf q}_{n}, 
\end{align*}
$$

We can simplify the expressions in the last cell by defining

$$
\begin{align*}
r_{kk} &= \|{\bf a}_{k} - ({\bf q}_{1}^{T}{\bf a}_{k}) {\bf q}_{1} - ({\bf q}_{2}^{T}{\bf a}_{k}){\bf q}_{k} - \cdots - ({\bf q}_{k-1}^{T}{\bf a}_{k}) {\bf q}_{k-1}\|_{2}, \\
r_{kl} &= {\bf q}_{k}^{T} {\bf a}_{l}, \ \ \text{ for } l > k 
\end{align*}
$$

Then we obtain:

$$
\begin{align*}
{\bf a}_{1} &= r_{11}{\bf q}_{1}, \\
{\bf a}_{2} &= r_{12}{\bf q}_{1} +  r_{22}{\bf q}_{2}, \\
{\bf a}_{3} &= r_{13}{\bf q}_{1} + r_{23} {\bf q}_{2} + r_{33}{\bf q}_{3}, \\
 &\vdots  \\
{\bf a}_{n} &= r_{1n} {\bf q}_{1} + r_{2n} {\bf q}_{2} + \cdots + r_{n-1 n} {\bf q}_{n-1} + r_{nn} {\bf q}_{n}, 
\end{align*}
$$

Which can be simplified even further to $A = QR$ if we think of the vectors ${\bf a}_{1},{\bf a}_{2},\ldots,{\bf a}_{n}$ as forming the columns of the matrix $A$, the vectors ${\bf q}_{1},{\bf q}_{2},\ldots,{\bf q}_{n}$ as forming the columns of a matrix $Q$, and the scalars $r_{ij}$ as forming the entries of an **upper triangular** matrix $R$. 

The following Julia function implements the Gram Schmidt procedure on the columns of an $m\times n$ matrix to produce an ONC matrix $Q$:

In [11]:
function gsqr(A)
   m,n = size(A);
   Q = Matrix{Float64}(A);
   R = Matrix{Float64}(I,n,n);
   R[1,1] = norm(Q[:,1],2); 
   Q[:,1] = (1/R[1,1]) *  Q[:,1]; # get first column of Q
   for j = 2:n # loop through columns
        for i = 1:j-1
            R[i,j] = dot(Q[:,i],Q[:,j]);
            Q[:,j] = Q[:,j] .- R[i,j]*Q[:,i];
        end
        R[j,j] = norm(Q[:,j],2);
        Q[:,j] = (1/R[j,j])*Q[:,j];
    end
    return Q, R
end

gsqr (generic function with 1 method)

Let's look at an example:

In [9]:
A = [1 2 3;-1 1 -1; 0 1 0;2 -1 2]

4×3 Matrix{Int64}:
  1   2   3
 -1   1  -1
  0   1   0
  2  -1   2

In [12]:
Q,R = gsqr(A);

In [13]:
Q

4×3 Matrix{Float64}:
  0.408248   0.82885    0.382546
 -0.408248   0.318788  -0.255031
  0.0        0.382546  -0.82885
  0.816497  -0.255031  -0.318788

In [15]:
R

3×3 Matrix{Float64}:
 2.44949  -0.408248  3.26599
 0.0       2.61406   1.6577
 0.0       0.0       0.765092

We will check manually that indeed $Q$ is ONC:

In [16]:
Q[:,1]'*Q[:,1]

1.0000000000000002

In [17]:
Q[:,2]'*Q[:,2]

0.9999999999999998

In [18]:
Q[:,3]'*Q[:,3]

0.9999999999999997

In [19]:
Q[:,1]'*Q[:,2]

6.873611268194275e-17

In [20]:
Q[:,1]'*Q[:,3]

-1.3590030047989218e-15

In [21]:
Q[:,2]'*Q[:,3]

3.417559741731522e-16

In [14]:
Q'*Q

3×3 Matrix{Float64}:
  1.0          6.87361e-17  -1.359e-15
  6.87361e-17  1.0           3.41756e-16
 -1.359e-15    3.41756e-16   1.0

Here are a few important points:

1) The Gram Schmidt procedure results in reduced QR factorization. Of course this is all that is needed for solving the linear least squares problem. 

2) The Gram Schmidt method is not the most numerically stable method for QR factorization. In the next lecture, we will look at another approach that uses so-called [Householder reflectors](https://en.wikipedia.org/wiki/Householder_transformation) in order to obtain (full) QR factorization. In preparation for the next lecture, please watch the video on [QR factorization](https://youtu.be/9iA8P1mg170).   