---
# Section 3.5: Geometry and the Normal Equations
---

## Orthogonal complement

Let $S \subseteq \mathbb{R}^n$.

The **orthogonal complement** is

$$
S^\perp = \left\{ x \in \mathbb{R}^n : \langle x, y \rangle = 0, \forall y \in S \right\}.
$$

This is the set of vectors $x$ that are orthogonal to every vector $y$ in $S$.

---

## Example

The $x_1x_2$-plane in $\mathbb{R}^3$ is

$$
S = \left\{ x \in \mathbb{R}^3 : x_3 = 0 \right\}.
$$

It's orthogonal complement is the $x_3$-axis,

$$
S^\perp = \left\{ x \in \mathbb{R}^3 : x_1 = x_2 = 0 \right\}.
$$

---

The orthogonal complement $S^\perp$ is always a **subspace** of $\mathbb{R}^n$ (i.e., it is closed under vector addition and scalar multiplication):

1. If $x_1, x_2  \in S^\perp$, then $x_1 + x_2 \in S^\perp$.
2. If $\alpha \in \mathbb{R}$ and $x \in S^\perp$, then $\alpha x \in S^\perp$.

---

## Exercise

Prove that $S^\perp$ is a subspace.

---

> ### Theorem: (Subspace Decomposition of $\mathbb{R}^n$)
> Let $S$ be a subspace of $\mathbb{R}^n$. Then $\mathbb{R}^n$ is the **direct sum** of $S$ and $S^\perp$, which we write as
> $$\mathbb{R}^n = S \oplus S^\perp.$$
> That is, every $x \in \mathbb{R}^n$ can be written *uniquely* as $x = y + z$ where $y \in S$ and $z \in S^\perp$.

The vectors $y$ and $z$ satisfy

$$
\begin{align}
y &= \operatorname{proj}_S(x) = \operatorname*{argmin}_{w \in S} \|w - x\|_2, \\
z &= \operatorname{proj}_{S^\perp}(x) = \operatorname*{argmin}_{w \in S^\perp} \|w - x\|_2. \\
\end{align}
$$

That is, $y$ is the closest vector in $S$ to $x$, and $z$ is the closest vector in $S^\perp$ to $x$.

---

## Example

$$
S = \left\{ x \in \mathbb{R}^3 : x_3 = 0 \right\},
\qquad
S^\perp = \left\{ x \in \mathbb{R}^3 : x_1 = x_2 = 0 \right\}.
$$

$$
x = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \in \mathbb{R}^3, \qquad
y = \begin{bmatrix} 1 \\ 2 \\ 0 \end{bmatrix} \in S, \qquad
z = \begin{bmatrix} 0 \\ 0 \\ 3 \end{bmatrix} \in S^\perp.
$$

---

## Two Fundamental Subspaces

Let $A \in \mathbb{R}^{m \times n}$.

The **null space** of $A$ is 

$$
\mathcal{N}(A) = \left\{ x \in \mathbb{R}^n : Ax = 0 \right\}.
$$

The **range space** of $A$ is 

$$
\mathcal{R}(A) = \left\{ Ax : x \in \mathbb{R}^n \right\}.
$$

$\mathcal{N}(A)$ is a subspace of $\mathbb{R}^n$ and $\mathcal{R}(A)$ is a subspace of $\mathbb{R}^m$.

---

> ### Fundamental Theorem of Linear Algebra
> $$\mathcal{R}(A)^\perp = \mathcal{N}(A^T)$$

---

## The least-squares problem

Notice that

$$
\min_{x \in \mathbb{R}^n} \|b - Ax\|_2 = \min_{y \in \mathcal{R}(A)} \|b - y\|_2.
$$

Thus we are looking for the closest vector $y$ in $\mathcal{R}(A)$ to $b$, so

$$
y = \operatorname{proj}_{\mathcal{R}(A)}(b).
$$

---

> ### Theorem: (Normal Equations)
> Let $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$. Then $x \in \mathbb{R}^n$ solves the least-squares problem
$$\min_{x} \|b - Ax\|_2$$
if and only if
$$A^TA x = A^Tb.$$

The linear system $A^TAx = A^Tb$ is known as the **normal equations**.

---

## Solving the normal equations

Suppose $A \in \mathbb{R}^{m \times n}$, $m > n$ has linearly independent columns.

Then $A^TA$ is positive definite, so we can solve $A^TAx = A^Tb$ using Cholesky's method.

If $A$ is well-conditioned ($\kappa_2(A)$ is small), then this approach is safe.

However, if $\kappa_2(A)$ is not small, then

$$
\kappa_2(A^TA) = \kappa_2(A)^2,
$$

which implies that $A^TA$ could be badly ill-conditioned, so solving $A^TAx = A^Tb$ is not a good idea.

---