You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The minimizer $\hat y$ is the unique vector in $\mathbb{R}^n$ that satisfies
$\hat y \in S$
$y - \hat y \perp S$
The vector $\hat y$ is called the orthogonal projection of $y$ onto $S$.
The next figure provides some intuition
:width: 50%
Proof of sufficiency
We'll omit the full proof.
But we will prove sufficiency of the asserted conditions.
To this end, let $y \in \mathbb{R}^n$ and let $S$ be a linear subspace of $\mathbb{R}^n$.
Let $\hat y$ be a vector in $\mathbb{R}^n$ such that $\hat y \in S$ and $y - \hat y \perp S$.
Let $z$ be any other point in $S$ and use the fact that $S$ is a linear subspace to deduce
$$
| y - z |^2
= | (y - \hat y) + (\hat y - z) |^2
= | y - \hat y |^2 + | \hat y - z |^2
$$
Hence $| y - z | \geq | y - \hat y |$, which completes the proof.
Orthogonal Projection as a Mapping
For a linear space $Y$ and a fixed linear subspace $S$, we have a functional relationship
$$
y \in Y; \mapsto \text{ its orthogonal projection } \hat y \in S
$$
By the OPT, this is a well-defined mapping or operator from $\mathbb{R}^n$ to $\mathbb{R}^n$.
In what follows we denote this operator by a matrix $P$
$P y$ represents the projection $\hat y$.
This is sometimes expressed as $\hat E_S y = P y$, where $\hat E$ denotes a wide-sense expectations operator and the subscript $S$ indicates that we are projecting $y$ onto the linear subspace $S$.
The operator $P$ is called the orthogonal projection mapping onto$S$
:width: 50%
It is immediate from the OPT that for any $y \in \mathbb{R}^n$
$P y \in S$ and
$y - P y \perp S$
From this we can deduce additional useful properties, such as
$| y |^2 = | P y |^2 + | y - P y |^2$ and
$| P y | \leq | y |$
For example, to prove 1, observe that $y = P y + y - P y$ and apply the Pythagorean law.
Orthogonal Complement
Let $S \subset \mathbb{R}^n$.
The orthogonal complement of $S$ is the linear subspace $S^{\perp}$ that satisfies
$x_1 \perp x_2$ for every $x_1 \in S$ and $x_2 \in S^{\perp}$.
Let $Y$ be a linear space with linear subspace $S$ and its orthogonal complement $S^{\perp}$.
We write
$$
Y = S \oplus S^{\perp}
$$
to indicate that for every $y \in Y$ there is unique $x_1 \in S$ and a unique $x_2 \in S^{\perp}$
such that $y = x_1 + x_2$.
Moreover, $x_1 = \hat E_S y$ and $x_2 = y - \hat E_S y$.
This amounts to another version of the OPT:
Theorem. If $S$ is a linear subspace of $\mathbb{R}^n$, $\hat E_S y = P y$ and $\hat E_{S^{\perp}} y = M y$, then
$$
P y \perp M y
\quad \text{and} \quad
y = P y + M y
\quad \text{for all } , y \in \mathbb{R}^n
$$
The next figure illustrates
:width: 50%
Orthonormal Basis
An orthogonal set of vectors $O \subset \mathbb{R}^n$ is called an orthonormal set if $| u | = 1$ for all $u \in O$.
Let $S$ be a linear subspace of $\mathbb{R}^n$ and let $O \subset S$.
If $O$ is orthonormal and $\mathop{\mathrm{span}} O = S$, then $O$ is called an orthonormal basis of $S$.
$O$ is necessarily a basis of $S$ (being independent by orthogonality and the fact that no element is the zero vector).
One example of an orthonormal set is the canonical basis ${e_1, \ldots, e_n}$
that forms an orthonormal basis of $\mathbb{R}^n$, where $e_i$ is the $i$ th unit vector.
If ${u_1, \ldots, u_k}$ is an orthonormal basis of linear subspace $S$, then
$$
x = \sum_{i=1}^k \langle x, u_i \rangle u_i
\quad \text{for all} \quad
x \in S
$$
To see this, observe that since $x \in \mathop{\mathrm{span}}{u_1, \ldots, u_k}$, we can find
scalars $\alpha_1, \ldots, \alpha_k$ that verify
:label: pob
x = \sum_{j=1}^k \alpha_j u_j
Taking the inner product with respect to $u_i$ gives
We assume throughout that $N > K$ and $X$ has full column rank.
If you work through the algebra, you will be able to verify that $| y - X b |^2 = \sum_{n=1}^N (y_n - b' x_n)^2$.
Since monotone transforms don't affect minimizers, we have
$$
\mathop{\mathrm{arg,min}}{b \in \mathbb{R}^K} \sum{n=1}^N (y_n - b' x_n)^2
= \mathop{\mathrm{arg,min}}_{b \in \mathbb{R}^K} | y - X b |
$$
By our results about overdetermined linear systems of equations, the solution is
$$
\hat \beta := (X' X)^{-1} X' y
$$
Let $P$ and $M$ be the projection and annihilator associated with $X$:
$$
P := X (X' X)^{-1} X'
\quad \text{and} \quad
M := I - P
$$
The vector of fitted values is
$$
\hat y := X \hat \beta = P y
$$
The vector of residuals is
$$
\hat u := y - \hat y = y - P y = M y
$$
Here are some more standard definitions:
The total sum of squares is $:= | y |^2$.
The sum of squared residuals is $:= | \hat u |^2$.
The explained sum of squares is $:= | \hat y |^2$.
TSS = ESS + SSR.
We can prove this easily using the OPT.
From the OPT we have $y = \hat y + \hat u$ and $\hat u \perp \hat y$.
Applying the Pythagorean law completes the proof.
Orthogonalization and Decomposition
Let's return to the connection between linear independence and orthogonality touched on above.
A result of much interest is a famous algorithm for constructing orthonormal sets from linearly independent sets.
The next section gives details.
(gram_schmidt)=
Gram-Schmidt Orthogonalization
Theorem For each linearly independent set ${x_1, \ldots, x_k} \subset \mathbb{R}^n$, there exists an
orthonormal set ${u_1, \ldots, u_k}$ with
$$
\mathop{\mathrm{span}} {x_1, \ldots, x_i}
\mathop{\mathrm{span}} {u_1, \ldots, u_i}
\quad \text{for} \quad
i = 1, \ldots, k
$$
The Gram-Schmidt orthogonalization procedure constructs an orthogonal set ${ u_1, u_2, \ldots, u_n}$.
One description of this procedure is as follows:
For $i = 1, \ldots, k$, form $S_i := \mathop{\mathrm{span}}{x_1, \ldots, x_i}$ and $S_i^{\perp}$
Set $v_1 = x_1$
For $i \geq 2$ set $v_i := \hat E_{S_{i-1}^{\perp}} x_i$ and $u_i := v_i / | v_i |$
The sequence $u_1, \ldots, u_k$ has the stated properties.
A Gram-Schmidt orthogonalization construction is a key idea behind the Kalman filter described in {doc}A First Look at the Kalman filter <../introduction_dynamics/kalman>.
In some exercises below you are asked to implement this algorithm and test it using projection.
QR Decomposition
The following result uses the preceding algorithm to produce a useful decomposition.
Theorem If $X$ is $n \times k$ with linearly independent columns, then there exists a factorization $X = Q R$ where
$R$ is $k \times k$, upper triangular, and nonsingular
$Q$ is $n \times k$ with orthonormal columns
Proof sketch: Let
$x_j := \mathop{\mathrm{col}_j} (X)$
${u_1, \ldots, u_k}$ be orthonormal with same span as ${x_1, \ldots, x_k}$ (to be constructed using Gram--Schmidt)
$Q$ be formed from cols $u_i$
Since $x_j \in \mathop{\mathrm{span}}{u_1, \ldots, u_j}$, we have
For matrices $X$ and $y$ that overdetermine $\beta$ in the linear
equation system $y = X \beta$, we found the least squares approximator $\hat \beta = (X' X)^{-1} X' y$.
Using the QR decomposition $X = Q R$ gives
$$
\begin{aligned}
\hat \beta
& = (R'Q' Q R)^{-1} R' Q' y \\
& = (R' R)^{-1} R' Q' y \\
& = R^{-1} (R')^{-1} R' Q' y
= R^{-1} Q' y
\end{aligned}
$$
Numerical routines would in this case use the alternative form $R \hat \beta = Q' y$ and back substitution.
Exercises
Exercise 1
Show that, for any linear subspace $S \subset \mathbb{R}^n$, $S \cap S^{\perp} = {0}$.
Exercise 2
Let $P = X (X' X)^{-1} X'$ and let $M = I - P$. Show that
$P$ and $M$ are both idempotent and symmetric. Can you give any
intuition as to why they should be idempotent?
Exercise 3
Using Gram-Schmidt orthogonalization, produce a linear projection of $y$ onto the column space of $X$ and verify this using the projection matrix $P := X (X' X)^{-1} X'$ and also using QR decomposition, where:
Clearly, $0 \in S$ and $0 \in S^\perp$.
If $x \in S$ and $x \in S^\perp$, then we have in particular
that $\langle x, x \rangle = 0$. But then $x = 0$.
Exercise 2
Symmetry and idempotence of $M$ and $P$ can be established
using standard rules for matrix algebra. The intuition behind
idempotence of $M$ and $P$ is that both are orthogonal
projections. After a point is projected into a given subspace, applying
the projection again makes no difference. (A point inside the subspace
is not shifted by orthogonal projection onto that space because it is
already the closest point in the subspace to itself).
Exercise 3
Here's a function that computes the orthonormal vectors using the GS
algorithm given in the lecture.
using LinearAlgebra, Statistics
---
tags: [remove-cell]
---
using Test # Put this before any code in the lecture.
function gram_schmidt(X)
U = similar(X, Float64) # for robustness
function normalized_orthogonal_projection(b, Z)
# project onto the orthogonal complement of the col span of Z
orthogonal = I - Z * inv(Z'Z) * Z'
projection = orthogonal * b
# normalize
return projection / norm(projection)
end
for col in 1:size(U, 2)
# set up
b = X[:, col] # vector we're going to project
Z = X[:, 1:(col - 1)] # first i-1 columns of X
U[:, col] = normalized_orthogonal_projection(b, Z)
end
return U
end
Here are the arrays we'll work with
y = [1, 3, -3]
X = [1 0; 0 -6; 2 2];
First let's do ordinary projection of $y$ onto the basis spanned
by the columns of $X$.
Py1 = X * inv(X'X) * X' * y
---
tags: [remove-cell]
---
@testset "Test Py1" begin
@test Py1 ≈ [-0.56521739, 3.26086956, -2.2173913]
end
Now let's orthogonalize first, using Gram--Schmidt:
U = gram_schmidt(X)
Now we can project using the orthonormal basis and see if we get the
same thing:
Py2 = U * U' * y
---
tags: [remove-cell]
---
@testset "Test Py2" begin
@test Py1 ≈ Py2
end
The result is the same. To complete the exercise, we get an orthonormal
basis by QR decomposition and project once more.
Q, R = qr(X)
Q = Matrix(Q)
Py3 = Q * Q' * y
---
tags: [remove-cell]
---
@testset "Test Py3" begin
@test Py1 ≈ Py3
end