---
# Section 3.5: Geometric Approach to the Least Squares Problem
---

Let $S$ be a nonempty subset of $\mathbb{R}^n$.

The _orthogonal complement_ of the set $S$ is

$$
S^\perp = \big\{ x \in \mathbb{R}^n : \langle x, y \rangle = 0, \forall y \in S \big\}.
$$

This is the set of vectors $x$ that are orthogonal to every vector $y$ in the set $S$.

---

## Exercise

Prove that $S^\perp$ is a subspace of $\mathbb{R}^n$.

### Proof.

First we show that $S^\perp$ is closed under vector addition. Let $x, z \in S^\perp$, and we want to show that $x + z \in S^\perp$. Let $y \in S$ chosen arbitrarily. Then we know that $\langle x, y \rangle = 0$ and $\langle z, y \rangle = 0$. Thus,

$$
\langle x + z, y \rangle = \langle x, y \rangle + \langle z, y \rangle = 0 + 0 = 0.
$$

Thus, $\langle x + z, y \rangle = 0$ for all $y \in S$, so $x + z \in S^\perp$.


Next we show that $S^\perp$ is closed under scalar multiplication. Let $x \in S^\perp$ and $\alpha \in \mathbb{R}$, and we want to show that $\alpha x \in S^\perp$. Let $y \in S$ be chosen arbitrarily. Then we know that $\langle x, y \rangle = 0$. Thus,

$$
\langle \alpha x, y \rangle = \alpha \langle x, y \rangle = \alpha \cdot 0 = 0.
$$

Thus, $\langle \alpha x, y \rangle = 0$ for all $y \in S$, so $\alpha x \in S^\perp$.

Since $S^\perp$ is closed under vector addition and scalar multiplication, $S^\perp$ is a subspace of $\mathbb{R}^n$.

---

## Example

Let $S$ be the $xy$-plane in $\mathbb{R}^3$. Then

$$
S = \left\{ \begin{bmatrix} x \\ y \\ z \end{bmatrix} \in \mathbb{R}^3 : z = 0 \right\}.
$$

The orthogonal complement of $S$ is

$$
S^\perp = \left\{ \begin{bmatrix} x \\ y \\ z \end{bmatrix} \in \mathbb{R}^3 : x = y = 0 \right\},
$$

which is the $z$-axis in $\mathbb{R}^3$.

---

> ### Theorem: (Subspace Decomposition of $\mathbb{R}^n$)
>
> Let $S$ be a subspace of $\mathbb{R}^n$. Then
>
> $$ \mathbb{R}^n = S \oplus S^\perp. $$
>
> That is, every $x \in \mathbb{R}^n$ can be written **uniquely** as $x = y + z$, where $y \in S$ and $z \in S^\perp$.
> 
> The vector $y$ is the _orthogonal projection_ of $x$ into $S$, and $z$ is the _orthogonal projection_ of $x$ into $S^\perp$.

---

## Exercise

Let $S$ be the $xy$-plane in $\mathbb{R}^3$ and let

$$
u = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}.
$$

Find the unique vectors $v \in S$ and $w \in S^\perp$ such that $u = v + w$.

### Solution

Projecting $u$ orthogonally onto $S$ gives us

$$
v = \begin{bmatrix} 1 \\ 2 \\ 0 \end{bmatrix}.
$$

Then letting $w = u - v$, we have

$$
w = \begin{bmatrix} 0 \\ 0 \\ 3 \end{bmatrix}.
$$

Note that $w \in S^\perp$, so we are done.

---

## Two Fundamental Subspaces

Let $A \in \mathbb{R}^{m \times n}$. The **null space** of $A$ is

$$
\mathcal{N}(A) = \big\{ x \in \mathbb{R}^n : A x = 0 \big\}.
$$

The **range** (or **column space**) of $A$ is

$$
\mathcal{R}(A) = \big\{ A x : x \in \mathbb{R}^n \big\}.
$$

Note that $\mathcal{N}(A)$ is a subspace of $\mathbb{R}^n$ and $\mathcal{R}(A)$ is a subspace of $\mathbb{R}^m$.

---

> ### The Fundamental Theorem of Linear Algebra:
>
> $$\mathcal{R}(A)^\perp = \mathcal{N}(A^T)$$

---

## Exercise

Prove the Fundamental Theorem of Linear Algebra.

### Proof.

First we want to show that $\mathcal{R}(A)^\perp \subseteq \mathcal{N}(A^T)$.

Let $y \in \mathcal{R}(A)^\perp$. Then $\langle y, A x \rangle = 0$, for all $x \in \mathbb{R}^n$. Since

$$
\langle y, A x \rangle = y^T A x = (A^T y)^T x = \langle A^T y, x \rangle
$$

we have that $\langle A^T y, x \rangle = 0$ for all $x \in \mathbb{R}^n$. Thus, $A^T y$ must be the zero vector, which implies that $y \in \mathcal{N}(A^T)$.

Next we want to show that $\mathcal{N}(A^T) \subseteq \mathcal{R}(A)^\perp$.

Let $y \in \mathcal{N}(A^T)$. Then $A^T y = 0$, so $\langle A^T y, x \rangle = 0$, for all $x \in \mathbb{R}^n$. Then, by the above observation, $\langle y, A x \rangle = 0$ for all $x \in \mathbb{R}^n$. Therefore, $y \in \mathcal{R}(A)^\perp$.

Therefore, $\mathcal{R}(A)^\perp = \mathcal{N}(A^T)$. $\blacksquare$

---

## The Discrete Least Squares Problem

The least squares problem of minimizing $\|b - A x\|_2$ can be written as the problem of finding the orthogonal projection of the vector $b$ into $\mathcal{R}(A)$. That is,

$$
\min_{x \in \mathbb{R}^n} \|b - A x\|_2 = \min_{y \in \mathcal{R}(A)} \|b - y\|_2.
$$



---

> ### Theorem: (Normal Equations)
>
> Let $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$. Then $x \in \mathbb{R}^n$ solves the least squares problem
>
> $$ \min_{x \in \mathbb{R}^n} \|b - A x\|_2 $$
>
> if and only if $x$ satisfies the **normal equations**
>
> $$ A^T A x = A^T b. $$

### Proof.

Note that $\mathbb{R}^m = \mathcal{R}(A) \oplus \mathcal{R}(A)^\perp = \mathcal{R}(A) \oplus \mathcal{N}(A^T)$. Then, there exists unique vectors $y \in \mathcal{R}(A)$ and $r \in \mathcal{N}(A^T)$ such that 

$$b = y + r,$$

where $y$ is the orthogonal projection of $b$ into $\mathcal{R}(A)$ and $r$ is the orthogonal projection of $b$ into $\mathcal{N}(A^T)$.

First we assume that $x$ solves the least squares problem. Then $y = A x$ and $r = b - A x$. Since $r \in \mathcal{N}(A^T)$, we have that $A^T r = 0$, which implies that

$$ A^T(b - A x) = 0. $$

Therefore, $x$ satisfies the normal equations $A^T A x = A^T b$.

Next we assume that $x$ satisfies the normal equations $A^T A x = A^T b$. Let $y = A x$ and $r = b - y$. Then $b = y + r$, with $y \in \mathcal{R}(A)$ and $r \in \mathcal{N}(A^T)$. Therefore, $y$ is the orthogonal projection of $b$ into $\mathcal{R}(A)$, which implies that $x$ solves the least squares problem. $\blacksquare$

---

## Solving the Normal Equations

Suppose that $A \in \mathbb{R}^{m \times n}$ ($m > n$) has linearly independent columns. Then $A^T A$ is positive definite, so we can solve the normal equations

$$ A^T A x = A^T b $$

using **Cholesky's method**.

If $A$ is well-conditioned (i.e., the condition number $\kappa_2(A)$ is small), then this approach is safe.

However, if $\kappa_2(A)$ is not small, then

$$ \kappa_2(A^T A) = \kappa_2(A)^2 $$

implies that $A^T A$ will be ill-conditioned, so solving $A^T A x = A^T b$ is not a good idea since it will likely be less accurate than the $QR$ approach.

---

## Example

In [None]:
using LinearAlgebra

# Create a Vandermonde matrix
vandermonde(n,m) = [(j/m)^(i-1) for i=1:n, j=1:m]

n, m = 15, 10
A = vandermonde(n, m)

In [None]:
cond(A)

In [None]:
AtA = A'A

In [None]:
cond(AtA)

In [None]:
cond(A)^2

In [None]:
b = randn(n);

In [None]:
# Using the QR approach to solve the least squares problem

F = qr(A)

c = F.Q'b

xqr = F.R\c[1:m]

In [None]:
norm(b - A*xqr)

In [None]:
# Using Cholesky's method to solve the normal equations

F = cholesky(AtA)

xne = F\(A'b)

In [None]:
norm(b - A*xne)

In [None]:
norm(b - A*xqr) < norm(b - A*xne)

We see that the $QR$ approach produces a solution with a smaller residual than the normal equation approach. Thus, it appears that the $QR$ approach is more accurate for solving this least squares problem.

---