# Angle between vectors
Cauchy-Schwarz implies the following for nonzero real vectors
$$-1\le \frac{\langle\vec{x},\vec{y}\rangle}{\|\vec{x}\|\|\vec{y}\|}\le 1.$$
This allow us to define an angle between nonzero real vectors:

$$\cos(\theta) = \frac{\langle\vec{x},\vec{y}\rangle}{\|\vec{x}\|\|\vec{y}\|},\;\;\theta = \cos^{-1}\left(\frac{\langle\vec{x},\vec{y}\rangle}{\|\vec{x}\|\|\vec{y}\|}\right).$$

Notice that the angle you compute depends on the inner product you use. The Cauchy-Schwarz inequality guarantees that the right hand side of the first expression is always between -1 and 1, so the inverse cosine is well-defined.

Vectors that are perpendicular have a $90^\circ$ angle between them, and the cosine of $90^\circ$ is 0, so we say that vectors are **orthogonal** when

$$\langle\vec{x},\vec{y}\rangle = 0.$$

When the vectors are in $\mathbb{R}^n$ or $\mathbb{C}^n$ we sometimes use the word 'orthogonal' _only_ when the dot product is zero, and use the word 'conjugate' when some other inner product is zero.

If someone uses the word 'orthogonal' when referring to vectors in $\mathbb{R}^n$ or $\mathbb{C}^n$, you should assume they are using the dot product unless explicitly told otherwise.

Inner products on complex vector spaces map from $V\times V$ to $\mathbb{C}$, so if you use the foregoing definition of the angle between vectors then the angle will be complex.

There is no standard definition of the angle between complex vectors, but it is common to use

$$\theta = \cos^{-1}\left(\frac{\text{Real}\left\{\langle\vec{x},\vec{y}\rangle\right\}}{\|\vec{x}\|\|\vec{y}\|}\right).$$

We still say that complex vectors are orthogonal **only** when their inner product is 0, not when the angle between them is $\pm\pi/2$.

A set of nonzero vectors that are mutually orthogonal is linearly independent. (If mutually orthogonal then linearly independent.)

Assume $\vec{v}_1,\ldots,\vec{v}_n$ are mutually orthogonal and nonzero. Consider a linear combination equal to zero:

$$c_1\vec{v}_1+\ldots+c_n\vec{v}_n=\vec{0}.$$

Now take the inner product of both sides with $\vec{v}_i$ and use the fact that the vectors are orthogonal

$$\langle\vec{v}_i,c_1\vec{v}_1+\ldots+c_n\vec{v}_n\rangle=c_1\langle\vec{v}_i,\vec{v}_1\rangle + \ldots + c_i\langle\vec{v}_i,\vec{v}_i\rangle + \ldots + c_n\langle\vec{v}_i,\vec{v}_n\rangle = c_i\langle\vec{v}_i,\vec{v}_i\rangle=0.$$

Mutual orthogonality implies that all the coefficients must be zero, i.e. the vectors are linearly independent.

A set of vectors that are mutually orthogonal and are also unit vectors (with respect to the norm derived from the inner product) are called **orthonormal**.

# Orthogonal Matrices

Consider a real matrix $\mathbf{Q}$. What does it mean when

$$\mathbf{Q}^T\mathbf{Q} = \mathbf{I}?$$

The matrix on the left is a Gram matrix consisting of dot products of the columns of $\mathbf{Q}$. The matrix equation tells us that these columns are orthogonal (with respect to the dot product) and that their norm (2-norm) is 1.

If $\mathbf{Q}$ is square, then the equation also implies that $\mathbf{Q}^T=\mathbf{Q}^{-1}$, which also implies

$$\mathbf{QQ}^T = \mathbf{I}.$$

Real square matrices with $\mathbf{Q}^T=\mathbf{Q}^{-1}$ are called **orthogonal** matrices. It would make more sense to call them 'orthonormal' matrices, but we don't.

A _complex_ square matrix $\mathbf{U}$ that satisfies

$$\mathbf{U}^T\bar{\mathbf{U}} = \mathbf{I}$$

is called a **unitary** matrix. The matrix on the left is the Gram matrix formed from the columns of $\mathbf{U}$ using the complex dot product.

The matrix on the right is real, so we can take the complex conjugate of both sides to get

$$\mathbf{U}^*\mathbf{U} = \mathbf{I}.$$

This is _by far_ the more common way of defining a unitary matrix, but it loses the connection to the complex dot product.

# Orthogonal subspaces

Two subspaces $U$ and $V$ of an inner-product space $W$ are orthogonal when every vector in $U$ is orthogonal to every vector in $V$.

Practically speaking if we have a basis $\vec{u}_1,\ldots,\vec{u}_p$ of $U$ and a basis $\vec{v}_1,\ldots,\vec{v}_q$ of $V$, then the subspaces are orthogonal when $\langle\vec{u}_i,\vec{v}_j\rangle = 0$ $\forall i,j.$

The **orthogonal complement** $U^\perp$ of a subspace $U$ within a space $W$ is the set of vectors

$$U^\perp = \{w\in W : w\perp U\}.$$

The orthogonal complement is itself a subspace (no proof).

We already know that the linear system $\mathbf{A}\vec{x} = \vec{b}$ only has a solution when the vector $\vec{b}$ is in the range of $\mathbf{A}$.

Since the range is the orthogonal complement of the cokernel, we can now say that the linear system only has a solution when the vector $\vec{b}$ is orthogonal to the cokernel of $\mathbf{A}$.

This is called the 'Fredholm alternative.'

# Orthogonal Projections

Suppose that $\vec{v}$ is not in a subspace $W$. The **orthogonal projection** of $\vec{v}$ onto $W$ is the vector $\vec{w}\in W$ that makes the difference $\vec{z} = \vec{v}-\vec{w}$ orthogonal to $W$.

(A vector is orthogonal to a subspace whenever it is orthogonal to every vector in that subspace. Equivalently, when it is orthogonal to every vector in a basis for that subspace.)

![Olver & Shakiban Figure 4.4](OS4p4.png)

How can we compute the projection? Start with a basis $\vec{w}_1,\ldots,\vec{w}_k$ for the subspace.
Then require $\vec{v}-\vec{w}$ to be orthogonal to every one of the basis vectors:

$$0=\langle\vec{w}_i,\vec{v}-\vec{w}\rangle \Rightarrow \langle\vec{w}_i,\vec{v}\rangle = \langle\vec{w}_i,\vec{w}\rangle.$$

If we're using the dot product on $\mathbb{R}^n$, this system can be written as 

$$\mathbf{A}^T\vec{w} = \mathbf{A}^T\vec{v}$$

where the columns of $\mathbf{A}$ are $\vec{w}_1,\ldots,\vec{w}_k$.
Now we want $\vec{w}$ to be in the subspace, i.e.

$$\vec{w} = c_1\vec{w}_1+\ldots+c_k\vec{w}_k = \mathbf{A}\vec{c},\text{ so }\mathbf{A}^T\mathbf{A}\vec{c} = \mathbf{A}^T\vec{v}.$$

The general solution is 
$$\vec{w} = \mathbf{A}\vec{c} = \mathbf{A}\left(\mathbf{A}^T\mathbf{A}\right)^{-1}\mathbf{A}^T\vec{v}.$$

The matrix $\mathbf{A}^T\mathbf{A}$ is a Gram matrix and the columns of $\mathbf{A}$ are a basis (implying that the columns are linearly independent), so the Gram matrix is invertible. The matrix $\mathbf{A}\left(\mathbf{A}^T\mathbf{A}\right)^{-1}\mathbf{A}^T$ is an example of an **orthogonal projection matrix** when the columns of $\mathbf{A}$ are linearly independent. Despite the name, it is not an orthogonal matrix.

This orthogonal projection matrix projects onto the range (also called the image or column space) of $\mathbf{A}$.

The projection of $\vec{v}$ onto the cokernel of $\mathbf{A}$ is $\vec{v} - \vec{w} = (\mathbf{I}-\mathbf{A}\left(\mathbf{A}^T\mathbf{A}\right)^{-1}\mathbf{A}^T)\vec{v}$. 

The matrix $\mathbf{I}-\mathbf{A}\left(\mathbf{A}^T\mathbf{A}\right)^{-1}\mathbf{A}^T$ is also an orthogonal projection matrix, but it projects onto the orthogonal complement of the range of $\mathbf{A}$.

If $\vec{w}_1,\ldots,\vec{w}_k$ are an orthonormal basis for $W$, then $\mathbf{A}^T\mathbf{A}=\mathbf{I}$ and the solution is

$$\vec{w} = \mathbf{A}\vec{c} = \mathbf{AA}^T\vec{b}.$$

If $\mathbf{A}$ has orthonormal columns, then $\mathbf{AA}^T$ is called an **orthogonal projection matrix**.

This is the usual formula for an orthogonal projection matrix, where it's assumed that we already have an orthonormal basis available.

Every vector $\vec{v}$ in the space $V$ can be written as the sum of a vector $\vec{w}$ from the subspace $W$ and a vector $\vec{z}$ from $W^\perp$, i.e. the orthogonal complement of $W$.
We say

$$V = W \oplus W^\perp.$$

Proof: $\vec{w}$ is the orthogonal projection of $\vec{v}$ into $W$, and $\vec{z} = \vec{v}-\vec{w}$ is in $W^\perp$ by the definition of the orthogonal projection.

If the dimension of $V$ is $n$ and the dimension of $W$ is $r\le n$, then the dimension of $W^\perp$ must be $n-r$. (No proof, but it's not hard.)

Theorem: The cokernel of $\mathbf{A}$ is the orthogonal complement of the range of $\mathbf{A}$. (Corollary: The kernel of $\mathbf{A}$ is the orthogonal complement of the corange of $\mathbf{A}$.)

Let's show that every vector in the range is orthogonal to the cokernel.
1. Suppose that $\vec{v}\in$ Range$(\mathbf{A})$. This means that there is some $\vec{x}$ (not necessarily unique) such that $\vec{v} = \mathbf{A}\vec{x}$.
2. Suppose that $\vec{w}\in$ Cokernel$(\mathbf{A})$. This means that $\vec{w}^T\mathbf{A} = \vec{0}^T$.
3. Now consider the dot product $\vec{w}^T\vec{v}$. We can write this as $\vec{w}^T\mathbf{A}\vec{x} = \vec{0}^T\vec{x}=0$. 
This shows that every vector in the range is orthogonal to every vector in the cokernel.

To show that the subspaces are complements we need to show that together they span the whole space.
To do this we use the fundamental theorem: The dimension of the range is $r$ and the dimension of the cokernel is $m-r$, so the dimensions add up to $m$.

# Gram-Schmidt

So far we only know how to find a basis for the kernel of a matrix, and the basis that we find is not guaranteed to be orthogonal.
Orthogonal bases are very useful, so we now look at a method that starts with a basis and generates an orthogonal basis.

Suppose that we start with a non-orthogonal basis $\vec{x}_1,\ldots,\vec{x}_n$, and we want to find an orthogonal basis for the same subspace.
The Gram-Schmidt process (algorithm) does this.

1. $\vec{v}_1 = \vec{x}_1$.
2. $\vec{v}_2 = \vec{x}_2 - c_{2,1}\vec{v}_1$. We want to choose $c_{2,1}$ in such a way that $\vec{v}_1$ and $\vec{v}_2$ are orthogonal, i.e. $$0=\langle\vec{v}_2,\vec{v}_1\rangle = \langle\vec{x}_2,\vec{v}_1\rangle - c_{2,1}\langle\vec{v}_1,\vec{v}_1\rangle.$$ So we set $$c_{2,1} = \langle\vec{x}_2,\vec{v}_1\rangle/\|\vec{v}_1\|^2.$$
3. $\vec{v}_3 = \vec{x}_3 - c_{3,1}\vec{v}_1 -c_{3,2}\vec{v}_2$. We want to choose the $c$ in such a way that $\vec{v}_3$ is orthogonal to $\vec{v}_1$ and $\vec{v}_2$, i.e. $$0=\langle\vec{v}_3,\vec{v}_1\rangle = \langle\vec{x}_3,\vec{v}_1\rangle - c_{3,1}\langle\vec{v}_1,\vec{v}_1\rangle -c_{3,2}\langle\vec{v}_2,\vec{v}_1\rangle$$ $$0=\langle\vec{v}_3,\vec{v}_2\rangle = \langle\vec{x}_3,\vec{v}_2\rangle - c_{3,1}\langle\vec{v}_2,\vec{v}_1\rangle -c_{3,2}\langle\vec{v}_2,\vec{v}_2\rangle$$ Solving for the coefficients we get $$c_{3,1} = \langle\vec{x}_3,\vec{v}_1\rangle/\|\vec{v}_1\|^2,\;\;c_{3,2} = \langle\vec{x}_3,\vec{v}_2\rangle/\|\vec{v}_2\|^2.$$

The vectors produced by Gram-Schmidt are orthogonal by construction. They satisfy

$$\vec{v}_1=\vec{x}_1$$

$$\vec{v}_k = \vec{x}_k - \frac{\langle\vec{x}_k,\vec{v}_{1}\rangle}{\|\vec{v}_{1}\|^2}\vec{v}_1-\ldots-\frac{\langle\vec{x}_k,\vec{v}_{k-1}\rangle}{\|\vec{v}_{k-1}\|^2}\vec{v}_{k-1}.$$

The above process only produces **orthogonal** vectors; to get **orthonormal** vectors just set $\vec{u}_k = \vec{v}_k/\|\vec{v}_k\|$.

Notice that $\vec{v}_k$ is a linear combination of $\vec{x}_1,\ldots,\vec{x}_k$. Furthermore these linear combinations have at least one nonzero coefficient; since the $\vec{x}_i$ are linearly independent, it's not possible to get $\vec{v}_k=\vec{0}$.

Gram-Schmidt can be used to test whether a set of vectors is linearly independent; if at any point you get $\vec{v}_k=\vec{0}$ then it means that $\vec{x}_1,\ldots,\vec{x}_k$ must be linearly dependent. Of course this only works in exact arithmetic.

Suppose that we put our vectors $\vec{v}_1,\ldots,\vec{v}_{k-1}$ as columns of the matrix $\mathbf{A}$. The matrix that projects orthogonally onto the range of $\mathbf{A}$ is

$$\mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T.$$

$\mathbf{A}^T\mathbf{A}$ is the Gram matrix formed from $\vec{v}_1,\ldots,\vec{v}_{k-1}$ using the dot product. Since the vectors are orthogonal, this is a diagona matrix with diagonal entries

$$(\mathbf{A}^T\mathbf{A})_{ii} = \|\vec{v}_i\|^2.$$

The inverse of this matrix is also diagonal with diagonal entries

$$(\mathbf{A}^T\mathbf{A})_{ii}^{-1} = \frac{1}{\|\vec{v}_i\|^2}.$$

The columns of $\mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}$ are $\vec{v}_i/\|\vec{v}_i\|^2$.

Recall from Lecture 1: Any product $\mathbf{BC}$ can also be written as a sum of outer products

$$\mathbf{BC} = \vec{b}_1\otimes\vec{c}_1 + \vec{b}_2\otimes\vec{c}_2 + \ldots + \vec{b}_p\otimes\vec{c}_p = \sum_{k=1}^p\vec{b}_k\otimes\vec{c}_k$$

where $\vec{b}_j$ are the columns of $\mathbf{B}$ and $\vec{c}_i$ are the rows of $\mathbf{C}$. Now apply this using $\mathbf{B} = \mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}$ and $\mathbf{C} = \mathbf{A}^T$:

$$\mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T = \frac{1}{\|\vec{v}_1\|^2}\vec{v}_1\vec{v}_1^T+\ldots+\frac{1}{\|\vec{v}_{k-1}\|^2}\vec{v}_{k-1}\vec{v}_{k-1}^T$$

The matrix that projects onto the cokernel of $\mathbf{A}$ is $\mathbf{I} - \mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T$. This is exactly what Gram-Schmidt is doing.

$$\vec{v}_k = \vec{x}_k - \frac{\langle\vec{x}_k,\vec{v}_{1}\rangle}{\|\vec{v}_{1}\|^2}\vec{v}_1-\ldots-\frac{\langle\vec{x}_k,\vec{v}_{k-1}\rangle}{\|\vec{v}_{k-1}\|^2}\vec{v}_{k-1} = (\mathbf{I} - \mathbf{A}(\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T)\vec{x}_k.$$

$\vec{v}_k$ is the orthogonal projection of $\vec{x}_k$ onto the orthogonal complement of the span of $\vec{v}_1,\ldots,\vec{v}_{k-1}$. I.e. we get $\vec{v}_k$ by an orthogonal projection.