# Orthogonalisation

Sources:

  1)  `Matrix Methods for Computational Modeling and Data Analytics` author: Mark Embree, Virginia Tech
  
  2)  `Linear Algebra : Theory, Intuition, Code` author: Mike X Cohen, publisher: sincXpress

**Motivation**


---


## Orthogonal Basis

A set of vectors $\left\{\mathbf{q}_1, \ldots,\  \mathbf{q}_n \right\}$ is a orthonormal if these conditions are met:

1) Vectors are mutually orthogonal : $\mathbf{q}_j^T \cdot \mathbf{q}_k = 0 \ : \ j \neq k$

2) $||\mathbf{q}_j|| = 1 : \ j = 1, \ldots,\ n$

Using the set of orthonormal vectors $\left\{\mathbf{q}_1, \ldots,\  \mathbf{q}_n \right\}$ a matrix $\mathbf{Q}$ is defined. 

$$
\mathbf{Q} = \left[\begin{array}{cccc}
\vert & \vert & \cdots & \vert \\
\mathbf{q}_1 & \mathbf{q}_2 & \cdots & \mathbf{q}_n \\
\vert & \vert & \cdots & \vert
\end{array}\right]
$$

The matrix product $\mathbf{Q}^T \cdot \mathbf{Q}$ has elements which are the inner products between vectors $\mathbf{q}_j$ and $\mathbf{q}_k$ .

$$
\mathbf{Q}^T \cdot \mathbf{Q} = \left[\begin{array}{cccc}
\mathbf{q}_1^T \cdot \mathbf{q}_1  & \mathbf{q}_1^T \cdot \mathbf{q}_2  & \cdots & \mathbf{q}_1^T \cdot \mathbf{q}_n \\
\mathbf{q}_2^T \cdot \mathbf{q}_1  & \mathbf{q}_2^T \cdot \mathbf{q}_2  & \ddots & \mathbf{q}_2^T \cdot \mathbf{q}_n \\
\vdots & \ddots & \ddots & \mathbf{q}_{n-1}^T \cdot \mathbf{q}_n \\
\mathbf{q}_n^T \cdot \mathbf{q}_1  & \cdots  & \mathbf{q}_n^T \cdot \mathbf{q}_{n-1}  & \mathbf{q}_n^T \cdot \mathbf{q}_n
\end{array}\right] = \left[\begin{array}{cccc}
1  & 0  & \cdots & 0 \\
0  & 1 & \ddots & 0 \\
\vdots & \ddots & \ddots & 0 \\
0 & \cdots  & 0  & 1
\end{array}\right] = \mathbf{I}
$$

**Definition / unitary matrix**

A $n \times n$ matrix $\mathbf{Q}$ is *unitary* if $\mathbf{Q}^T \cdot \mathbf{Q} = \mathbf{I}  : \ \in \mathbb{R}^{n \times n}$ 


**Definition / sub-unitary matrix**

A $m \times n \ : \ m \gt n$ matrix $\mathbf{Q}$ is *sub-unitary* if $\mathbf{Q}^T \cdot \mathbf{Q} = \mathbf{I} : \ \in \mathbb{R}^{m \times m}$ 

---

**Problem#1**

For $\mathbf{Q} \in \mathbb{R}^{m \times n}$ and $m \lt n$ : is $\mathbf{Q}^T \cdot \mathbf{Q} = \mathbf{I}$ possible ?

No: it is only possible to have $m \lt n$ lineary independent column vectors of $\mathbf{Q}$.

**Problem#2**

Let $\mathbf{Q} \in \mathbb{R}^{m \times n} \ : \ m \gt n$ be *sub-unitary*. Compute $\mathbf{Q} \cdot \mathbf{Q}^T \in \mathbb{R}^{m \times m}$.

Using the *layer perspective* of matrix multiplication the product of two matrices $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times k}$ can expressed as the sum of `n` sub-matrices (layers). Each submatrix is an outer product of the `j-th` column vector of $\mathbf{A}$ and the `j-th` row vector of $\mathbf{B}$. Applying the layer perspective to the case of 
$\mathbf{Q} \cdot \mathbf{Q}^T$ yields the following equation:

$$
\mathbf{\Pi}=\mathbf{Q} \cdot \mathbf{Q}^T = \sum_{j=1}^{n} \underbrace{\mathbf{q}_j \cdot \mathbf{q}_j^T}_{j'th \ layer}
$$

Next we show $\mathbf{\Pi} \cdot \mathbf{\Pi} = \mathbf{\Pi}^2 = \mathbf{\Pi}$:

$$
\mathbf{\Pi} \cdot \mathbf{\Pi} = \mathbf{\Pi}^2 = \mathbf{Q} \cdot \mathbf{Q}^T \cdot \mathbf{Q} \cdot \mathbf{Q}^T = \mathbf{Q} \cdot \underbrace{\left(\mathbf{Q}^T \cdot \mathbf{Q}\right)}_{\mathbf{I}} \cdot \mathbf{Q}^T = \mathbf{Q} \cdot \mathbf{Q}^T = \mathbf{\Pi}
$$

This property can generalised:

$$
\mathbf{\Pi}^p = \mathbf{\Pi}
$$

---

## Constructing a orthonormal basis from some other basis (Gram-Schmidt process)

Starting point is a set of basis vectors (in general not orthogonal) $\left\{\mathbf{a}_1,\ \ldots,\ \mathbf{a}_n \right\} $ for a subspace $V \subset \mathbb{R}^{m}$.
From this basis an orthonormal basis $\left\{\mathbf{q}_1,\ \ldots,\ \mathbf{q}_n \right\} $ for the same subspace shall be constructed.

The construction of orthonormal basis vectors is a step-wise process. Each step generates a new vector orthonormal to all vectors generated in <ins>previous</ins> steps.

**step#1**

Generating the first vector $\mathbf{q}_1$ is easy. Just take vector $\mathbf{a}_1$ and normalise it:

$$
\mathbf{q}_1 = \frac{\mathbf{a}_1}{||\mathbf{a}_1||}
$$

**step#2**

Vector $\mathbf{a}_2$ is used to generate the next vector $\mathbf{q}_2$ which is orthonormal to $\mathbf{q}_1$. First the part of $\mathbf{a}_2$ which is in the direction of $\mathbf{q}_1$ is removed from $\mathbf{a}_2$. This results in a vector $\mathbf{u}_2$ which is already orthogonal to $\mathbf{u}_1$. However it is not yet orthonormal. Thus $\mathbf{u}_2$ is normalised to give $\mathbf{q}_2$.

$$
\mathbf{u}_2 = \mathbf{a}_2 - \frac{\mathbf{a}_2^T \cdot \mathbf{q}_1}{||\mathbf{q}_1^2||} \cdot \mathbf{q}_1
$$

$$
\mathbf{q}_2 = \frac{\mathbf{u}_2}{||\mathbf{u}_2||}
$$

Just to make sure it is shown that vectors $\mathbf{u}_1$ and $\mathbf{u}_2$ are orthogonal. We compute

$$\begin{gather}
\mathbf{q}_1^T \cdot \mathbf{u}_2 = \mathbf{q}_1^T \cdot \left(\mathbf{a}_2 - \frac{\mathbf{a}_2^T \cdot \mathbf{q}_1}{||\mathbf{q}_1^2||} \right)
\ = \mathbf{q}_1^T \cdot \mathbf{a}_2  - \mathbf{q}_1^T \cdot  \frac{\mathbf{a}_2^T \cdot \mathbf{q}_1}{||\mathbf{q}_1^2||} \cdot \mathbf{q}_1 \\
\ = \mathbf{q}_1^T \cdot \mathbf{a}_2  - \frac{\mathbf{a}_2^T \cdot \mathbf{q}_1}{||\mathbf{q}_1^2||} \cdot ||\mathbf{q}_1||^2 \\
\ = \mathbf{q}_1^T \cdot \mathbf{a}_2  - \mathbf{a}_2^T \cdot \mathbf{q}_1 = \mathbf{q}_1^T \cdot \mathbf{a}_2 -\mathbf{q}_1^T \cdot \mathbf{a}_2 = 0
\end{gather}
$$

**step#k**

Assuming $k \lt n$:

Having arrived at this step we have already generated orthonormal basis vectors $\left\{\mathbf{q}_1,\ \mathbf{q}_2, \ldots,\ \mathbf{q}_{k-1}  \right\}$

The procedure is goes like this:

1) Eliminate the part of $\mathbf{a}_k$ which is in the direction of $\mathbf{q}_1$. The residual vector is orthogonal to $\mathbf{q}_1$.

2) Eliminate the part of the previous residual vector in the direction of $\mathbf{q}_2$. The residual vector is orthogonal to $\mathbf{q}_2$ and $\mathbf{q}_1$

3) Eliminate the part of the previous residual vector in the direction of $\mathbf{q}_3$. The residual vector is orthogonal to $\mathbf{q}_3$ and  $\mathbf{q}_2$ and $\mathbf{q}_1$.

Repeat ...

4) Finally eliminate the part of the previous residual vector in the direction of $\mathbf{q}_{k-1}$. The residual vector is orthogonal to $\mathbf{q}_{k-1}, \ \ldots, \ \mathbf{q}_2,\ \mathbf{q}_1$.

5) Normalise the residual vector and assign it to $\mathbf{q}_k$.

$$\begin {gather}
\mathbf{u}_k = \mathbf{a}_k - \sum_{l=1}^{k-1} \frac{\mathbf{a}_k ^T \cdot \mathbf{q}_l}{||\mathbf{q}_l||^2} \cdot \mathbf{q}_l \\
\to \ normalise \\
\mathbf{q}_k = \frac{1}{||\mathbf{u}_k||} \cdot \mathbf{u}_k
\end{gather}
$$

The procedure is also known as the `Gram-Schmidt` process.

---

### Example / Gram-Schmidt process

taken from:  `Matrix Methods for Computational Modeling and Data Analytics` author: Mark Embree, Virginia Tech

Three linearly independent vectors $\left\{\mathbf{a}_1,\ \mathbf{a}_2,\ \mathbf{a}_3 \right\}$ form a basis of a subspace. The basis vectors need not be orthogonal / orthonormal (any kind of basis is sufficient).

Using the `Gram-Schmidt` process we generate an orthonormal basis $\left\{\mathbf{q}_1,\ \mathbf{q}_2,\ \mathbf{q}_3 \right\}$ for the same subspace.

**Computing $\mathbf{q}_1$**

$$\begin{gather}
\mathbf{u}_1 = \mathbf{a}_1 \\
\mathbf{q}_1 = \frac{1}{||\mathbf{u}_1||} \cdot \mathbf{u}_1 
\end{gather}
$$

**Computing $\mathbf{q}_2$**

$$\begin{gather}
\mathbf{u}_2 = \mathbf{a}_2 - \frac{\mathbf{a}_2^T \cdot \mathbf{q}_1}{||\mathbf{q}_1||} \\
\mathbf{q}_2 = \frac{1}{||\mathbf{u}_2||} \cdot \mathbf{u}_2 
\end{gather}
$$

**Computing $\mathbf{q}_3$**

$$\begin{gather}
\mathbf{u}_3 = \mathbf{a}_3 - \frac{\mathbf{a}_3^T \cdot \mathbf{q}_1}{||\mathbf{q}_1||} - \frac{\mathbf{a}_3^T \cdot \mathbf{q}_2}{||\mathbf{q}_2||}\\
\mathbf{q}_3 = \frac{1}{||\mathbf{u}_3||} \cdot \mathbf{u}_3 
\end{gather}
$$

