---
# Section 3.4: The Gram-Schmidt Process
---

Given a linearly independent set of $m$ vectors in $\mathbb{R}^n$,

$$
\{v_1, v_2, \ldots, v_m\} \subseteq \mathbb{R}^n,
$$

the **Gram-Schmidt process** produces an orthonormal set of vectors,

$$
\{q_1, q_2, \ldots, q_m\} \subseteq \mathbb{R}^n,
$$

such that

$$
\mathrm{span}\{v_1, v_2, \ldots, v_m\} = \mathrm{span}\{q_1, q_2, \ldots, q_m\}.
$$

---

## Orthonormal set of vectors

A set of vectors $\{q_1, q_2, \ldots, q_m\}$ is said to be _orthonormal_ if the vectors are pairwise orthogonal and each vector has Euclidean norm one; that is,

$$
\langle q_i, q_j \rangle =
\begin{cases}
0 & \text{if $i \ne j$} \\
1 & \text{if $i = j$},
\end{cases}
$$

where $\langle q_i, q_j \rangle = q_i^T q_j$ is the inner product (or dot product) of the vectors $q_i$ and $q_j$.

---

## Orthonormal and isometric matrices

A matrix $Q \in \mathbb{R}^{n \times m}$, with $n \ge m$, is called _isometric_ (or an _isometry_) if its columns are orthonormal; that is, if $Q^T Q = I$.

If $Q$ is a _square_ isometric matrix, then $Q$ is an orthogonal matrix since $Q^{-1} = Q^T$, so we also have $Q Q^T = I$.

However, if $Q$ is an isometric matrix that has more rows than columns, then $Q$ cannot have an inverse since it is not square. Moreover, $Q Q^T \ne I$.

---

## Example

In [None]:
using LinearAlgebra

In [None]:
"""
`isorand(n, m)` generates a random n-by-m isometric matrix.
"""
function isorand(n, m)
    F = qr(randn(n, m))
    return F.Q[:,1:m]
end

Q = isorand(5, 3)

In [None]:
?isorand

In [None]:
Q'Q

In [None]:
Q*Q'

---

## Exercise

Let $Q \in \mathbb{R}^{n \times m}$ ($n > m$) be an isometry with columns $q_1, q_2, \ldots, q_m$ and let

$$S = \mathrm{span}\{q_1,q_2,\ldots,q_m\}$$

be the subspace spanned by these vectors.

1. Show that $Q Q^T v = 0$ if $v$ is orthogonal to $q_1, q_2, \ldots, q_m$.

2. Show that $Q Q^T w = w$ if $w \in S$. Therefore, $Q Q^T$ behaves like the identity matrix on the subspace $S$.

3. Show that $(Q Q^T)^2 = Q Q^T$. Thus, $Q Q^T$ is a _projector_ (in fact, it is an _orthogonal projector_).

### Part 1

Suppose that $v \perp q_i$ for $i=1,\ldots,m$. In other words, $\langle v, q_i \rangle = 0$ for $i=1,\ldots,m$. Note that

$$
Q^T v =
\begin{bmatrix}
\langle q_1, v \rangle \\
\vdots \\
\langle q_m, v \rangle
\end{bmatrix} = 0.
$$

Thus, $Q Q^T v = Q \cdot 0 = 0$.

### Part 2

Let $w \in S$. Then $w$ is a linear combination of the vectors $q_1,\ldots,q_m$, so

$$
w = c_1 q_1 + \cdots + c_m q_m.
$$

Then, letting $e_i$ represent the $i$th column of the $n \times n$ identity matrix, we have

$$
\begin{align}
Q Q^T w 
&= Q Q^T (c_1 q_1 + \cdots + c_m q_m) \\
&= Q (c_1 Q^T q_1 + \cdots + c_m Q^T q_m) \\
&= Q (c_1 e_1 + \cdots + c_m e_m) \\
&= Q \begin{bmatrix} c_1 \\ \vdots \\ c_m \end{bmatrix} \\
&= c_1 q_1 + \cdots + c_m q_m \\
&= w.
\end{align}
$$

### Part 3

Since $Q^T Q = I$, we have

$$
(Q Q^T)^2 = Q Q^T Q Q^T = Q I Q^T = Q Q^T.
$$

---

## Example

In [None]:
Q = isorand(5, 3)

In [None]:
x = randn(5)

# Project x onto S = span{q1,...,qm}
w = Q*(Q'x)

In [None]:
Q*(Q'w) ≈ w

In [None]:
# Since Q*Q' is an orthogonal projector,
# x - w is orthogonal to q1,...,qm.

v = x - w

Q'v

---

## Exercise

Show that if $Q \in \mathbb{R}^{n \times m}$ is an isometry, then

1. $\langle Q x, Q y \rangle = \langle x, y \rangle$ for all $x, y \in \mathbb{R}^{m}$,

2. $\|Q x\|_2 = \|x\|_2$ for all $x \in \mathbb{R}^m$.

### Part 1

Since $Q^T Q = I$, we have that 

$$
\langle Q x, Q y \rangle = (Q x)^T (Q y) = x^T Q^T Q y = x^T y = \langle x, y \rangle.
$$

### Part 2

By part 1,

$$
\| Q x \|_2 = \sqrt{\langle Q x, Q x \rangle} = \sqrt{\langle x, x \rangle} = \| x \|_2.
$$

---

> ## Theorem: (Condensed $QR$ for $n \times m$ matrix)
>
> Let $A \in \mathbb{R}^{n \times m}$, $n \ge m$. Then there exists $\hat{Q} \in \mathbb{R}^{n \times m}$ isometric and $\hat{R} \in \mathbb{R}^{m \times m}$ upper-triangular such that 
> 
> $$ A = \hat{Q}\hat{R}. $$
>
> Moreover, if the columns of $A$ are linearly independent, then this factorization of $A$ is unique under the condition that $\hat{R}$ has positive diagonal entries.

---

## Example

We can obtain the condensed $QR$ factorization of a matrix $A$ using the Julia function `qr` as follows.

In [None]:
function condensed_qr(A)
    n, m = size(A)
    F = qr(A)
    Q̂ = Matrix(F.Q)
    R̂ = F.R
    return Q̂, R̂
end

In [None]:
n, m = 5, 3
A = randn(n, m)
Q̂, R̂ = condensed_qr(A);

In [None]:
Q̂

In [None]:
R̂

In [None]:
A ≈ Q̂*R̂

---

## Exercise

Let $q_1,\ldots,q_m$ be orthonormal vectors in $\mathbb{R}^n$. Prove that $q_1,\ldots,q_m$ are linearly independent.

### Proof.

Suppose we have a linear combination of the vectors $q_1,\ldots,q_m$ that gives us the zero vector. That is, $\exists c_1,\ldots,c_m \in \mathbb{R}$ such that

$$
c_1 q_1 + \cdots + c_m q_m = 0.
$$

To show that $q_1,\ldots,q_m$ are linearly independent, we need to show that $c_1 = \cdots = c_m = 0$.

Let's take the inner product of $q_1$ with $c_1 q_1 + \cdots + c_m q_m$, which gives us

$$
\begin{eqnarray}
\langle q_1, c_1 q_1 + \cdots + c_m q_m \rangle &=& \langle q_1, 0 \rangle \\
c_1 \langle q_1, q_1 \rangle + \cdots + c_m \langle q_1, q_m \rangle &=& 0 \\
c_1 \cdot 1 + c_2 \cdot 0 + \cdots + c_m \cdot 0 &=& 0 \\
c_1 &=& 0.
\end{eqnarray}
$$

Thus,

$$
c_2 q_2 + \cdots + c_m q_m = 0.
$$

Taking the inner product of this equation with $q_2$ leads us to conclude that $c_2 = 0$. And so on, we have that $c_1 = c_2 = \ldots = c_m = 0$.

---

## The Classical Gram-Schmidt Process

Let $S$ be a subspace of $\mathbb{R}^n$ and let $v_1,\ldots,v_m$ be a basis of $S$; that is, $v_1,\ldots,v_m$ are linearly independent and span $S$, which implies that $\dim S = m$.

Gram-Schmidt will produce orthonormal vectors $q_1,\ldots,q_m$ that form a basis of $S$ and satisfy

$$
\begin{eqnarray}
\mathrm{span}\{q_1\} &=& \mathrm{span}\{v_1\} \\
\mathrm{span}\{q_1,q_2\} &=& \mathrm{span}\{v_1,v_2\} \\
\mathrm{span}\{q_1,q_2,q_3\} &=& \mathrm{span}\{v_1,v_2,v_3\} \\
&\vdots& \\
\mathrm{span}\{q_1,q_2,\ldots,q_m\} &=& \mathrm{span}\{v_1,v_2,\ldots,v_m\}. \\
\end{eqnarray}
$$

---

### Step 1

Since $\mathrm{span}\{q_1\} = \mathrm{span}\{v_1\}$, we let

$$
r_{11} = \|v_1\|_2, \quad q_1 = \frac{v_1}{r_{11}}.
$$

---

### Step 2

Next, we let $w$ be the projection of $v_2$ onto $\mathrm{span}\{q_1\}$:

$$
w = \begin{bmatrix} q_1 \end{bmatrix} \begin{bmatrix} q_1 \end{bmatrix}^T v_2 = \langle q_1, v_2 \rangle q_1.
$$

Then, we let $\hat{q}_2 = v_2 - w$. Notice that $\hat{q}_2$ is orthogonal to $q_1$ since

$$
\begin{align}
\langle q_1, \hat{q}_2 \rangle 
&= \langle q_1, v_2 - \langle q_1, v_2 \rangle q_1 \rangle \\
&= \langle q_1, v_2 \rangle - \langle q_1, v_2 \rangle \langle q_1, q_1 \rangle \\
&= \langle q_1, v_2 \rangle - \langle q_1, v_2 \rangle \\
&= 0.
\end{align}
$$

Therefore, we let

$$
r_{12} = \langle q_1, v_2 \rangle, \quad \hat{q}_2 = v_2 - r_{12} q_1, \quad r_{22} = \|\hat{q}_2\|_2, \quad q_2 = \frac{\hat{q}_2}{r_{22}},
$$

and we have $\mathrm{span}\{q_1, q_2\} = \mathrm{span}\{v_1, v_2\}$ (see text for details).

---

### Step $k$

Now suppose we have orthonormal vectors $q_1,\ldots,q_{k-1}$ that satisfy

$$
\mathrm{span}\{q_1,\ldots,q_i\} = \mathrm{span}\{v_1,\ldots,v_i\}, \quad i=1,\ldots,k-1.
$$

We let $w$ be the projection of $v_k$ onto $\mathrm{span}\{q_1,\ldots,q_{k-1}\}$,

$$
w =
\begin{bmatrix} q_1 & \cdots & q_{k-1} \end{bmatrix}
\begin{bmatrix} q_1 & \cdots & q_{k-1} \end{bmatrix}^T v_k =
\langle q_1, v_k \rangle q_1 + \cdots + \langle q_{k-1}, v_k \rangle q_{k-1},
$$

and we let $\hat{q}_k = v_k - w$. Then $\hat{q}_k$ is orthogonal to $q_1,\ldots,q_{k-1}$.

Therefore, letting

$$
r_{ik} = \langle q_i, v_k \rangle, \quad i=1,\ldots,k-1,
$$

we have

$$
\hat{q}_k = v_k - \sum_{i=1}^{k-1} r_{ik} q_i.
$$

Finally, we let

$$
r_{kk} = \|\hat{q}_k\|_2, \quad q_k = \frac{\hat{q}_k}{r_{kk}},
$$

and we have $\mathrm{span}\{q_1, \ldots, q_k\} = \mathrm{span}\{v_1, \ldots, v_k\}$.

---

## An implementation of Gram-Schmidt

In [None]:
function gram_schmidt(V)
    n, m = size(V)
    
    Q = copy(V)
    R = zeros(m, m)

    for k=1:m
        # R[i,k] = dot(Q[:,i],Q[:,k]), for i=1:k-1
        for i=1:k-1
            for j=1:n
                R[i,k] += Q[j,i]*Q[j,k]
            end
        end
        
        # Q[:,k] = Q[:,k] - sum(R[i,k]*Q[:,i] for i=1:k-1)
        for i=1:k-1
            for j=1:n
                Q[j,k] -= R[i,k]*Q[j,i]
            end
        end
        
        R[k,k] = norm(Q[:,k])
        Q[:,k] /= R[k,k]
    end
    
    return Q, R
end

In [None]:
n, m = 5, 3
A = randn(n, m)
Q, R = gram_schmidt(A);

In [None]:
Q

In [None]:
Q'Q

In [None]:
R

In [None]:
A ≈ Q*R

---

## Gram-Schmidt equals $QR$

Note that

$$
\begin{eqnarray}
v_1 &=& r_{11} q_1 \\
v_2 &=& r_{12} q_1 + r_{22} q_2 \\
&\vdots& \\
v_m &=& r_{1m} q_1 + r_{2m} q_2 + \cdots + r_{mm} q_m.
\end{eqnarray}
$$



Thus, if we let

$$
\begin{eqnarray}
V &=& \begin{bmatrix} v_1 & \cdots & v_m \end{bmatrix} \in \mathbb{R}^{n \times m}, \\\\
Q &=& \begin{bmatrix} q_1 & \cdots & q_m \end{bmatrix} \in \mathbb{R}^{n \times m}, \\\\
R &=&
\begin{bmatrix}
r_{11} & r_{12} & \cdots & r_{1m} \\
       & r_{22} & \cdots & r_{2m} \\
       &        & \ddots & \vdots \\
       &        &        & r_{mm}
\end{bmatrix} \in \mathbb{R}^{m \times m},
\end{eqnarray}
$$

we have that

$$
V = Q R.
$$

Therefore, we have another way to compute the $QR$ decomposition of a matrix, and any method for computing a $QR$ decomposition provides a way to orthogonalize a set of vectors.

---

## The Gram-Schmidt Process and Roundoff Errors

We expect $Q^T Q = I$ in exact arithmetic. Thus, we can measure the deviation from orthonormality due to roundoff errors by computing

$$
\|I - Q^T Q\|_2.
$$

In [None]:
# Create a Vandermonde matrix
vandermonde(n,m) = [(j/m)^(i-1) for i=1:n, j=1:m]

n, m = 5, 3
V = vandermonde(n, m)

In [None]:
using Printf

@printf("%3s %3s %10s %14s %14s\n", "n", "m", "cond(V)", "Gram-Schmidt", "Householder")
for (n, m) in [(6,4), (9,6), (12,8), (15,10), (18,12)]
    V = vandermonde(n,m)
    Q, R = gram_schmidt(V)
    Q̂, R̂ = condensed_qr(V)
    @printf("%3d %3d %10.1e %14.1e %14.1e\n", n, m, cond(V), 
        opnorm(I - Q'Q), opnorm(I - Q̂'Q̂))
end

---

## Modified Gram-Schmidt

The computation

```julia
for i=1:k-1
    for j=1:n
        R[i,k] += Q[j,i]*Q[j,k]
    end
end
for i=1:k-1
    for j=1:n
        Q[j,k] -= R[i,k]*Q[j,i]
    end
end
```

produces the exact same output as the computation

```julia
for i=1:k-1
    for j=1:n
        R[i,k] += Q[j,i]*Q[j,k]
    end
    for j=1:n
        Q[j,k] -= R[i,k]*Q[j,i]
    end
end
```

in exact arithmetic. In the presence of roundoff errors, the latter computation does a better job in terms of orthogonality.

In [None]:
function modified_gram_schmidt(V)
    n, m = size(V)
    
    Q = copy(V)
    R = zeros(m, m)

    for k=1:m
        for i=1:k-1
            for j=1:n
                R[i,k] += Q[j,i]*Q[j,k]
            end
            for j=1:n
                Q[j,k] -= R[i,k]*Q[j,i]
            end
        end
        
        R[k,k] = norm(Q[:,k])
        Q[:,k] /= R[k,k]
    end
    
    return Q, R
end

In [None]:
using Printf

@printf("%3s %3s %10s %14s %14s %14s\n", "n", "m", "cond(V)", 
        "Gram-Schmidt", "Modified G-S", "Householder")
for (n, m) in [(6,4), (9,6), (12,8), (15,10), (18,12)]
    V = vandermonde(n,m)
    Q, R = gram_schmidt(V)
    Qm, Rm = modified_gram_schmidt(V)
    Q̂, R̂ = condensed_qr(V)
    @printf("%3d %3d %10.1e %14.1e %14.1e %14.1e\n", n, m, cond(V), 
        opnorm(I - Q'Q), opnorm(I - Qm'Qm), opnorm(I - Q̂'Q̂))
end

---

## Gram-Schmidt with Reorthogonalization



In [None]:
function orthogonalize!(s, Q, k, n)
    for i=1:k-1
        s[i] = 0
        for j=1:n
            s[i] += Q[j,i]*Q[j,k]
        end
    end
    for i=1:k-1
        for j=1:n
            Q[j,k] -= s[i]*Q[j,i]
        end
    end
end

function gram_schmidt(A; num_orthog=1)
    n, m = size(A)
    
    s = Vector{Float64}(undef, m-1)
    R = zeros(m, m)
    Q = copy(A)
    
    for k=1:m
        
        for j=1:num_orthog
            orthogonalize!(s, Q, k, n)
            R[1:k-1,k] += s[1:k-1]
        end
        
        R[k,k] = norm(Q[:,k])
        Q[:,k] /= R[k,k]
    end
    
    return Q, UpperTriangular(R)
end

In [None]:
n, m = 50, 30
A = randn(n, m)
Q, R = gram_schmidt(A, num_orthog=2)
A ≈ Q*R

In [None]:
using Printf

@printf("%3s %3s %10s %14s %14s %14s %14s\n", "n", "m", "cond(V)", 
        "Gram-Schmidt", "Modified G-S", "Reorthog", "Householder")
for (n, m) in [(6,4), (9,6), (12,8), (15,10), (18,12)]
    V = vandermonde(n,m)
    Q, R = gram_schmidt(V)
    Qm, Rm = modified_gram_schmidt(V)
    Qr, Rr = gram_schmidt(V, num_orthog=2)
    Q̂, R̂ = condensed_qr(V)
    @printf("%3d %3d %10.1e %14.1e %14.1e %14.1e %14.1e\n", n, m, cond(V), 
        opnorm(I - Q'Q), opnorm(I - Qm'Qm), opnorm(I - Qr'Qr), opnorm(I - Q̂'Q̂))
end

---

## Flop counts

1. $QR$ by Householder reflectors and forming $Q$ requires about $4nm^2 - 4m^3/3$ flops.

2. Gram-Schmidt with reorthogonalization requires about $4nm^2$ flops.

Thus, the method by reflectors requires about $4m^3/3$ fewer flops than Gram-Schmidt.

However, Gram-Schmidt has the advantage that the vectors $v_1,\ldots,v_m$ do not all need to be known from the beginning; an example of this situation is the Arnoldi process for computing eigenvectors (see Chapter 6 in the text) in which the vector $v_k$ cannot be determined until after $q_1,\ldots,q_{k-1}$ have been computed.

---