---
title: 10.1 Applications
subject:  SVD
subtitle: 
short_title: 10.1 Applications
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: 
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/09_Ch_10_SVD_Apps/111-Apps.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 18 - Singular Values and the Singular Value Decomposition.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4

## Learning Objectives

By the end of this page, you should know:
- 

## Warmup

The diagonalization theorems we've seen for complete and symmetric matrices have played a role in many interesting applications. Unfortunately, not all matrices can be factored as $A = PDP^{-1}$ for a diagonal matrix $D$; for example such a factorization makes no sense if $A$ is not square! Fortunately, a factorization $A = P \Delta Q^T$ is possible for any matrix $m \times n$ matrix $A$! A special factorization of this type, called the _singular value decomposition_, is one of the most useful and widely applicable matrix factorizations in linear algebra.

The singular value decomposition is based on the following key property of matrix diagonalization which we'll show can be captured in general rectangular matrices:

:::{note} Key Observation
The absolute values of the eigenvalues of a symmetric matrix $A$ measure the amounts that $A$ stretches or shrinks certain vectors (the eigenvectors). If $A \vv x = \lambda \vv x$ and $\|\vv x\| = 1$, then

$$
\|A \vv x\| = \lambda \|\vv x\| = |\lambda| \|\vv x\| = |\lambda|.
$$

If $\lambda_1$ is the eigenvalue with the greatest magnitude, i.e., if $|\lambda_1| \geq |\lambda_i|$ for $i=1,\ldots,n$, then a corresponding unit eigenvector $\vv v_1$ identifies the direction in which stretching is greatest. That is, the length of $A \vv x$ is maximized when $\vv x = \vv v_1$, and $\|A\vv v_1\| = |\lambda_1|$.
:::

The above description is reminiscent of the [optimization principle](../08_Ch_9_Symmetric_Matrices/103-opt_princ.ipynb#opt_eig_thm) we saw for characterizing eigenvalues of symmetric matrices, albeit with a focus on maximizing length $\|A\vv x\|$ rather than the quadratic form $\vv x^T A \vv x$. What we'll see next is that this description of $\vv v_1$ and $|\lambda_1|$ has an analogue for rectangular matrices that will lead to the singular value decomposition.

::::{prf:example}
:label: eg_1
The matrix $A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix}$ defines a linear map $\vv x \mapsto A \vv x$ from $\mathbb{R}^3$ to $\mathbb{R}^2$. If we consider the effects of this map on the unit sphere $\{\vv x \in \mathbb{R}^3 \mid \|\vv x\| = 1\}$, we observe that multiplication by $A$ transforms this sphere in $\mathbb{R}^3$ into an ellipse in $\mathbb{R}^2$:

:::{figure}../figures/10-sphere_ellipse.jpg
:label:sphere_ellipse
:alt:Sphere to Ellipse
:width: 400px
:align: center
:::

Our task is to find a unit vector $\vv x$ at which the length $\|A\vv x\|$ is maximized, and compute this maximum length. That is, we want to solve the optimization problem:

$$
\text{maximize} \|A \vv x\|
$$

over choices of $\vv x$ satisfying $\|\vv x\|=1$. Our first observation is that the quantity $\|A\vv x\|^2$ is maximized by the same $\|\vv x\|$ that maximizes $\|A\vv x\|$, but that $\|A\vv x\|^2$ is easier to work with. Specifically, note that

$$
\|A\vv x\|^2 = \langle A \vv x, A\vv x \rangle = (A\vv x)^T(A\vv x) = \vv x^TA^TA\vv x = \vv x^T(A^TA)\vv x.
$$

So our task is to now find a unit vector $\|\vv x\|=1$ that maximizes the quadratic form $\vv x^T(A^TA)\vv x$ defined by the symmetric (positive semidefinite) matrix $A^{T}A$: we know how to do this. By our [theorem](../08_Ch_9_Symmetric_Matrices/103-opt_princ.ipynb#opt_eig_thm) characterizing eigenvalues of symmetric matrices from an optimization perspective, we know the maximum value is the largest eigenvalue $\lambda_1$ of the matrix $A^TA$, and is attained at the unit eigenvector $\vv v_1$ of $A^TA$ corresponding to $\lambda_1$.

For the matrix in this example:

$$
A^TA = \begin{bmatrix}
4 & 8 \\
11 & 7  \\
14 & -2 
\end{bmatrix}
\begin{bmatrix}
4 & 11 & 14 \\
8 & 7 & -2
\end{bmatrix} = 
\begin{bmatrix}
80 & 100 & 40 \\
100 & 170 & 140 \\
40 & 140 & 200
\end{bmatrix}
$$

and the eigenvalue/vector pairs are:

\begin{align*}
\lambda_1 = 360, \quad \vv v_1 &= \begin{bmatrix} \frac{1}{3} \\ \frac{2}{3} \\ \frac{2}{3} \end{bmatrix}, \quad
\lambda_2 = 90, \quad \vv v_2 &= \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \\ \frac{2}{3} \end{bmatrix}, \quad
\lambda_3 = 0, \quad \vv v_3 &= \begin{bmatrix} \frac{2}{3} \\ -\frac{2}{3} \\ \frac{1}{3} \end{bmatrix}.
\end{align*}

The maximum value of $\vv x^T(A^TA)\vv x = \|A\vv x\|^2$ is thus $\lambda_1 = 360$, and attained when $\vv x = \vv v_1$. The vector $A\vv v_1$ is a point on the ellipse in [](#sphere_ellipse) farthest from the origin, namely

$$
A\vv v_1 = \begin{bmatrix}
4 & 11 & 14 \\
8 & 7 & -2
\end{bmatrix}
 \begin{bmatrix} \frac{1}{3} \\ \frac{2}{3} \\ \frac{2}{3} \end{bmatrix} = 
\begin{bmatrix}
18 \\
6
\end{bmatrix}.
$$

For $\|\vv x\| = 1$, the maximum value of $\|A\vv x\|$ is $\|A\vv v_1\| = \sqrt{360} = 6\sqrt{10}$.

This example suggests that the effect of a matrix $A$ on the unit sphere in $\mathbb{R}^3$ is related to the quadratic form $\vv x^T A^TA \vv x$. What we'll see next is that _the entire geometric behavior of the map $\vv x \mapsto A \vv x$_ is captured by this quadratic form.

::::

## The Singular Values of an m×n Matrix

Consider an $m \times n$ real matrix $A \in \mathbb{R}^{m \times n}$. Then $A^TA$ is an $n \times n$ symmetric matrix, and can be orthogonally diagonalized. Let $V = \bm \vv v_1 & \cdots & \vv v_n\em$ be an orthogonal matrix composed of orthonormal eigenvectors of $A^TA$, and let $\lambda_1, \ldots, \lambda_n$ be the associated eigenvalues of $A^TA$. Then for $i = 1, \ldots, n$,

\begin{align*}
\|A \vv v_i\|^2 &= (A \vv v_i)^T(A \vv v_i) = \vv v_i^T(A^TA \vv v_i) \\
&= \vv v_i^T(\lambda_i \vv v_i) \\
&= \lambda_i \vv v_i^T \vv v_i = \lambda_i \|\vv v_i\|^2 \\
&= \lambda_i.
\end{align*}

This tells us that all of the eigenvalues $\lambda_i = \|A \vv v_i\|^2 \geq 0$, since norms can only take on nonnegative values. $A^TA$ is a positive semidefinite matrix. Let's assume that we've ordered our eigenvalues in decreasing order:

$$
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0.
$$

The _singular values of $A$_ are the positive square roots of the nonzero eigenvalues $\lambda_i > 0$ of $A^TA$ denoted $\sigma_i$. That is, let $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r > 0$ and $\lambda_{r+1} = \lambda_{r+2} = \cdots = \lambda_n = 0$ be a partition of the eigenvalues such that $\lambda_i > 0$ for $i = 1, \ldots, r$, and $\lambda_i = 0$ for $i = r+1, \ldots, n$. Then $A$ has $r$ singular values, defined as
$$
\sigma_i = \sqrt{\lambda_i}, \quad i = 1, \ldots, r.
$$

:::{warning}
Some texts include the zero eigenvalues $\lambda_{r+1}, \ldots, \lambda_n$ of $A^TA$ as singular values of $A$. This is simply a different convention and is mathematically equivalent. However, we find our definition to be more natural for our purposes.
:::

::::{prf:example} 
:label: major_minor_eg
Using the same $A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix}$ as the previous example,
we have $\sigma_1 = \sqrt{360} = 6\sqrt{10}$, $\sigma_2 = \sqrt{90} = 3\sqrt{10}$. In this case, $A$ only has two singular values as $\lambda_3 = 0$. For this example, $r = 2$, and $\lambda_1 = 360 > \lambda_2 = 90 > \lambda_3 = 0$.

From the previous example, the first singular value of $A$ is the maximum of $\|A\vv x\|$ over all $\|\vv x\| = 1$, attained at $\vv v_1$. Our optimization based characterization of eigenvalues of symmetric matrices tells us that the second singular value of $A$ is the maximum of $\|A\vv x\|$ over all unit vectors _orthogonal_ to $\vv v_1$: this is attained by $\vv u_2$, the second eigenvector of $A^TA$. For $\vv u_2$ from the previous example:

$$
A \vv u_2 = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix} \begin{bmatrix} -\frac{2}{3} \\ -\frac{1}{3} \\ \frac{2}{3} \end{bmatrix} = \begin{bmatrix} 3 \\ -9 \end{bmatrix}
$$

This point is on the minor axis of the ellipse in [](#sphere_ellipse), just as $A \vv v_1$ is on the major axis (see [](#major_minor) below). The two singular values of $A$ are the lengths of the major and mini semiaxes of thes ellipse.
:::{figure}../figures/10-major_minor.jpg
:label:major_minor
:alt:Major and Minor Axes
:width: 300px
:align: center
:::
That $A\vv v_1$ and $A\vv v_2$ are orthogonal is no accident, as the next theorem shows.
::::

::::{prf:theorem}
:label: major_minor_thm
Suppose that $\vv u_1, \ldots, \vv u_n$ is an orthonormal basis for $\mathbb{R}^n$ composed of the eigenvectors of $A^TA$, ordered so that the corresponding eigenvalues of $A^TA$ satisfy
$$
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r > \lambda_{r+1} = \cdots = \lambda_n = 0,
$$
where $r$ denotes the number of nonzero eigenvalues of $A^TA$, i.e. the number of singular values $\sigma_i = \sqrt{\lambda_i} > 0$, $i=1,\ldots,r$ of $A$. Then, $A \vv u_1, \ldots, A\vv u_r$ is an orthogonal basis for Col$(A)$, and rank$(A) = r$.

:::{prf:proof} Proof of [](#major_minor_thm)
:label: proof-major_minor_thm
:class: dropdown
Because $\vv v_i$ and $\lambda _j \vv v_j$ are orthogonal for $i \neq j$,
$$
(A\vv v_i)^T(A \vv v_j) = \vv v_i^TA^TA \vv v_j = \vv v_i^T\lambda_j \vv v_j = 0.
$$

Thus, $A\vv v_1, \ldots, A\vv v_r$ are mutually orthogonal, and hence linearly independent. They are also clearly contained in $\text{Col}(A)$. Now, for any $\vv y \in \text{Col}(A)$, there must be an $\vv x \in \mathbb{R}^n$ such that $\vv y = A\vv x$. Expanding $\vv x$ in the basis $\vv v_1, \ldots, \vv v_n$, as $\vv x = c_1\vv v_1 + \cdots + c_n\vv v_n$ for some $c_1, \ldots, c_n \in \mathbb{R}$, we have:

\begin{align*}
\vv y &= A\vv x = A(c_1\vv v_1 + \cdots + c_n\vv v_n) = c_1A\vv v_1 + \cdots + c_rA\vv v_r + c_{r+1}A\vv v_{r+1} + \cdots + c_nA\vv v_n \\
&= c_1A\vv v_1 + \cdots + c_rA\vv v_r.
\end{align*}

We used that $\|A\vv v_i\|^2 = \lambda_i = 0$ for $i=r+1,\ldots,n \Leftrightarrow A\vv v_i = 0$ for $i=r+1,\ldots,n$ in the last equality.

Therefore, we have that $\vv y \in \text{span}\{A\vv v_1, \ldots, A\vv v_r\}$. Thus $A\vv v_1, \ldots, A\vv v_r$ is both linearly independent and a spanning set for Col$(A)$, meaning it is an orthogonal basis for Col$(A)$. Hence, by the Fundamental Theorem of Linear Algebra,

$$
\text{rank}(A) = \text{dim Col}(A) = r.
$$
:::

::::



\textbf{Numerical Note:} In certain cases, the rank of $A$ may be very sensitive to small changes in the entries of $A$. The classic approach of counting the \# of pivot columns in $A$ does not work well if $A$ is row reduced by a computer, as roundoff errors often create a row echelon form with full rank. In practice, the most reliable way of computing the rank of a large matrix $A$ is to count the number of singular values larger than a small threshold $\epsilon$ (typically on the order of $10^{-12}$ for very accurate computations). In this case, singular values smaller than $\epsilon$ are treated as zeros for all practical purposes, and the effective rank of $A$ is computed by counting the remaining nonzero singular values.

\section*{The Singular Value Decomposition}

The decomposition of $A$ involves an $r \times r$ diagonal matrix $\Sigma$ of the form

\[
\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r).
\]

We note that because $r = \text{dim Col}(A) = \text{dim Range}(A)$ by the FTLA, we must have that $r \leq \min\{m,n\}$ if $A \in \mathbb{R}^{m \times n}$.

\begin{theorem}
Let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix of rank $r > 0$. Then $A$ can be factored as
\[
A = U \Sigma V^T,
\]
where $U \in \mathbb{R}^{m \times r}$ has orthonormal columns, so $U^TU = I_r$, $\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r)$ is a diagonal matrix with the singular values of $A$ $\sigma_i$ along the diagonal, and $V \in \mathbb{R}^{n \times r}$ has orthonormal columns, so $V^TV = I_r$.
\end{theorem}

Such a factorization of $A$ is called its singular value decomposition, and the columns of $U$ are called the left singular vectors of $A$, while the columns of $V$ are called the right singular vectors of $A$.

\begin{proof}
Let $\lambda_i$ and $v_i$ be the eigenvalues/vectors of $A^TA$ as described previously, so that $Av_1, \ldots, Av_r$ is an orthogonal basis for col$(A)$. Normalize each $Av_i$ to form an orthonormal basis for col$(A)$:

\[
u_i := \frac{1}{\|Av_i\|} Av_i = \frac{1}{\sigma_i} Av_i
\]

and hence $Av_i = \sigma_i u_i$ for $i=1,\ldots,r$. Define the matrices

\[
U = [u_1 \cdots u_r] \in \mathbb{R}^{m \times r} \quad \text{and} \quad V = [v_1 \cdots v_r] \in \mathbb{R}^{n \times r}
\]

By construction, the columns of $U$ are orthonormal: $U^TU = I_r$, and similarly for the columns of $V$: $V^TV = I_r$.

\end{proof}

Let's define the following "full" matrices:

\[
\hat{U} = [U \; U^\perp] \in \mathbb{R}^{m \times m} \quad \text{and} \quad \hat{V} = [V \; V^\perp] \in \mathbb{R}^{n \times n}
\]

Here, $V^\perp = [v_{r+1} \cdots v_n]$ has orthonormal columns spanning the orthogonal complement of span$\{v_1, \ldots, v_r\}$, so that the columns of $\hat{V}$ form an orthonormal basis of $\mathbb{R}^n$.

Similarly, let $U^\perp$ have orthonormal columns spanning the orthogonal complement of span$\{u_1, \ldots, u_r\}$, so the columns of $\hat{U}$ form an orthonormal basis for $\mathbb{R}^m$.

Finally, define $\hat{\Sigma} = \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix}_{m \times n}$. We first show that

\[
A = \hat{U} \hat{\Sigma} \hat{V}^T, \quad \text{or equivalently (since $\hat{V}$ is orthogonal),} \quad A\hat{V} = \hat{U}\hat{\Sigma}.
\]

First,
\[
A\hat{V} = [Av_1 \cdots Av_r \; Av_{r+1} \cdots Av_n] = [\sigma_1u_1 \cdots \sigma_ru_r \; 0 \cdots 0].
\]

Then, notice:

\[
\hat{U} \hat{\Sigma} = [u_1 \cdots u_r \; u_{r+1} \cdots u_m] 
\begin{bmatrix}
\sigma_1 & & 0 & 0 \cdots 0 \\
& \ddots & & \vdots \\
0 & & \sigma_r & 0 \cdots 0 \\
0 & \cdots & 0 & 0 \cdots 0 \\
\vdots & & \vdots & \vdots \\
0 & \cdots & 0 & 0 \cdots 0
\end{bmatrix}
= [\sigma_1u_1 \cdots \sigma_ru_r \; 0 \cdots 0]
\]

So that $A\hat{V} = \hat{U}\hat{\Sigma}$, or equivalently, $A = \hat{U}\hat{\Sigma}\hat{V}^T$. But, now, notice:

\[
A = \hat{U} \hat{\Sigma} \hat{V}^T = [U \; U^\perp] \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V^T \\ (V^\perp)^T \end{bmatrix} = U \Sigma V^T, \quad \text{proving our result.}
\]

\textbf{NOTE:} Some textbooks define the singular value decomposition of $A$ as $A = \hat{U} \hat{\Sigma} \hat{V}^T$. This is necessary when allowing for singular values equal to zero. When only considering nonzero singular values, as we do, $A = U\Sigma V^T$ is the appropriate definition. This is sometimes called the compact SVD of $A$, but we will just call it the SVD.

\textbf{Example:} Let's use the results of the previous example to construct the SVD of
\[
A = \begin{bmatrix} 4 & 8 & 14 \\ 8 & 7 & -2 \end{bmatrix}.
\]

\textbf{Step 1:} Find an orthogonal diagonalization of $A^TA$. In general, for $A$ with many columns, this is done numerically, but we use the data from before:

$A^TA = V \Lambda V^T = [v_1\ v_2\ v_3] \begin{bmatrix}
\lambda_1 & & \\
& \lambda_2 & \\
& & \lambda_3
\end{bmatrix} \begin{bmatrix}
v_1^T \\
v_2^T \\
v_3^T
\end{bmatrix}$

with $(\lambda_i, v_i)$ as spectral eigenpairs with $\lambda_1 = 360$, $\lambda_2 = 90$, $\lambda_3 = 0$.

\textbf{Step 2:} Setup $V$ and $\Sigma$. Arrange the nonzero eigenvalues of $A^TA$ in decreasing order and compute the singular values. For this example:

$\sigma_1 = \sqrt{360}$ and $\sigma_2 = 3\sqrt{10}$,

and

$\Sigma = diag(\sigma_1, \sigma_2) = \begin{bmatrix}
6\sqrt{10} & 0 \\
0 & 3\sqrt{10}
\end{bmatrix}$. Hence $rank A = 2$, and $V \in \mathbb{R}^{3\times2}$

The corresponding eigenvectors define the columns of $V$:

$V = [v_1\ v_2] = \begin{bmatrix}
\frac{\sqrt{3}}{3} & -\frac{1}{3} \\
\frac{2}{3} & -\frac{1}{3} \\
\frac{2}{3} & \frac{2}{3}
\end{bmatrix}$

\textbf{Step 3:} Construct $U$. Since $rank A = 2$, $U \in \mathbb{R}^{2\times2}$. The columns of $U$ are given by $Av_1$ and $Av_2$. Recall that we should ensure that $\|Av_1\| = \sigma_1$ and $\|Av_2\| = \sigma_2$. So $U = [u_1\ u_2]$ with

$u_1 = \frac{Av_1}{\sigma_1} = \frac{1}{6\sqrt{10}} \begin{bmatrix}
18 \\
6
\end{bmatrix} = \begin{bmatrix}
3/\sqrt{10} \\
1/\sqrt{10}
\end{bmatrix}$ and

$u_2 = \frac{Av_2}{\sigma_2} = \frac{1}{3\sqrt{10}} \begin{bmatrix}
3 \\
-9
\end{bmatrix} = \begin{bmatrix}
1/\sqrt{10} \\
-3/\sqrt{10}
\end{bmatrix}$

Finally, the SVD of $A$ is:

$A = \begin{bmatrix}
3/\sqrt{10} & 1/\sqrt{10} \\
1/\sqrt{10} & -3/\sqrt{10}
\end{bmatrix} \begin{bmatrix}
6\sqrt{10} & 0 \\
0 & 3\sqrt{10}
\end{bmatrix} \begin{bmatrix}
\frac{\sqrt{3}}{3} & \frac{2}{3} & \frac{2}{3} \\
-\frac{1}{3} & -\frac{1}{3} & \frac{2}{3}
\end{bmatrix}$

$\underbrace{}_U \underbrace{}_\Sigma \underbrace{}_{V^T}$

You can check that indeed $A = U\Sigma V^T$ here, and that $U^TU = V^TV = I_2$

\textbf{ONLINE NOTES:} Please add example 4 from LAA 7.1, subtly modified to use the compact SVD form.

\section{Linear Algebra Applications of the SVD}

The next few classes will focus on engineering and AI applications of the SVD. For now, we highlight some more technical linear algebra applications: these are all immensely important from a practical perspective and form subprotines for most real-world applications of linear algebra.

\subsection{The Condition Number}
Most numerical calculations that require solving a linear equation $Ax=b$ are as reliable as possible when the SVD of $A$ is used. Since the matrices $U$ and $V$ have orthonormal columns, they do not affect lengths or angles between vectors. For example, for $U\in\mathbb{R}^{m\times m}$, we have:

\[\langle Ux, Uy \rangle = x^T U^T U y = x^T y = \langle x, y \rangle\]

for any $x,y\in\mathbb{R}^m$. Therefore, any numerical issues that arise will be due to the diagonal entries of $\Sigma$, i.e., due to the singular values of $A$.

In particular, if some of singular values are much larger than others, this means certain directions are stretched out much more than others, which can lead to rounding errors. A common measure of such stretching is given by the singular values of $A$. If $A\in\mathbb{R}^{m\times n}$ is an $m\times n$ matrix so that $r=\text{rank}(A)\leq n$, we define the \textbf{condition number of A} to be the ratio $\kappa(A) = \frac{\sigma_1}{\sigma_r}$ of the largest to smallest singular values of $A$. If $A$ is not invertible, it is conventional to set $\kappa(A)=\infty$, although the ratio $\frac{\sigma_1}{\sigma_r}$ is a useful measure of the numerical stability of computing with a rectangular matrix $A\in\mathbb{R}^{m\times n}$.

\textbf{ONLINE NOTES:} Using numpy, show that ill-conditioned $A$ can lead to bad solutions to $Ax=b$ even if $A$ is invertible. Case 1: assume we use $\sigma_1 = b + n$ for $n$ some small measurement noise. Case 2: just make $A$ super ill-conditioned and show that $x=A^{-1}b$ computed using numpy doesn't actually satisfy $Ax=b$.

\subsection{Computing Bases of Fundamental Subspaces}

Given an SVD for an $m\times n$ matrix $A\in\mathbb{R}^{m\times n}$ with $\text{rank}(A)=r$, let $u_1,\ldots,u_r$ be the left singular vectors, $v_1,\ldots,v_r$ the right singular vectors, and $\sigma_1,\ldots,\sigma_r$ the singular values.

Recall that we showed that $u_1,\ldots,u_r$ forms a basis for $\text{Col}(A)$. Let $u_{r+1},\ldots,u_m$ be an orthonormal basis for $\text{Col}(A)^\perp$ so that $u_1,\ldots,u_m$ form a basis for $\mathbb{R}^m$, computed for example using the Gram-Schmidt Process. Then, by the FTLA we have that $\text{Col}(A)^\perp = \text{Null}(A^T) = \text{span}\{u_{r+1},\ldots,u_m\}$, i.e., these vectors form an orthonormal basis for $\text{Null}(A^T)=\text{Null}(A)^\perp$.

Next, recall that $v_1,\ldots,v_r,v_{r+1},\ldots,v_n$, the eigenvectors of $A^TA$, form an orthonormal

basis of $\mathbb{R}^n$. Since $Av_i = 0$ for $i > r$, we have that $v_{r+1},\ldots,v_n$ span a subspace of $\text{Null}(A)$ of dimension $n-r$. But, by the FTLA, $\dim\text{Null}(A) = n - \text{rank}(A) = n-r$.

Therefore, $v_{r+1},\ldots,v_n$ are an orthonormal basis for $\text{Null}(A)$.

Finally, $\text{Null}(A)^\perp = \text{Col}(A^T) = \text{Row}(A)$. But $\text{Null}(A)^\perp = \text{span}\{v_1,\ldots,v_r\}$ since the $v_i$ are an orthonormal basis for $\mathbb{R}^n$, and thus $v_1,\ldots,v_r$ are an orthonormal basis for $\text{Row}(A)$.

Summarizing, we have:

\begin{itemize}
\item $\text{Col}(A) = \text{span}\{u_1,\ldots,u_r\}$
\item $\text{Col}(A)^\perp = \text{Null}(A^T) = \text{span}\{u_{r+1},\ldots,u_m\}$
\item $\text{Col}(A^T) = \text{Row}(A) = \text{span}\{v_1,\ldots,v_r\}$
\item $\text{Col}(A^T)^\perp = \text{Null}(A) = \text{span}\{v_{r+1},\ldots,v_n\}$
\end{itemize}

[Image of four fundamental subspaces and the action of A]

Specializing these observations to square matrices, we have the following theorem summarizing invertible matrices:

\textbf{Theorem:} The following statements are equivalent for a square $n\times n$ matrix $A\in\mathbb{R}^{n\times n}$:
\begin{enumerate}
\item $\text{Col}(A)^\perp = \text{Null}(A^T) = \{0\}$
\item $\text{Null}(A)^\perp = \text{Col}(A^T) = \text{Row}(A) = \mathbb{R}^n$
\item $\text{Col}(A^T)^\perp = \text{Null}(A) = \{0\}$
\item $\text{Null}(A^T)^\perp = \text{Col}(A) = \mathbb{R}^n$
\item $A$ has rank $=n$
\item $A$ has $n$ (nonzero) singular values
\end{enumerate}

\subsection{The Pseudoinverse of A}

Recall the least squares problem of finding a vector $\hat{x}$ that minimizes the objective $\|Ax-b\|^2$. We saw that the least squares solution is given by the solution to the normal equations:

$A^TA\hat{x} = A^Tb$. (NE)

Let's rewrite (NE) using the SVD $A = U\Sigma V^T$, $A^T = V\Sigma^T U^T = V\Sigma U^T$ ($\Sigma = \Sigma^T$)

$A^TA\hat{x} = V\Sigma U^T U\Sigma V^T\hat{x} = V\Sigma^2 V^T\hat{x} = V\Sigma U^Tb$
$\underbrace{}_{I}$ $\underbrace{}_{=\Sigma}$ $(a)$ $(b)$

Let's start by left multiplying (a) and (b) by $V^T$ to take advantage of $V^TV = I$:

$V^T(V\Sigma^2V^T\hat{x}) = V^T(V\Sigma U^Tb) \Rightarrow \Sigma^2V^T\hat{x} = \Sigma U^Tb$.

Now, let's isolate $V^T\hat{x}$ by multiplying both sides by $\Sigma^{-2}$:

$V^T\hat{x} = \Sigma^{-1}U^Tb$. $(*)$

Finally, note that $\hat{x}$ satisfies $(*)$ iff $\hat{x} = V\Sigma^{-1}U^Tb$ (again since $V^TV=I$)
$+ n$
for any $n \in \text{Null}(V^T) = \text{Col}(V)^\perp$. The special solution $\hat{x}^* = V\Sigma^{-1}U^Tb$
can be shown to be the minimum norm least squares solution when several $\hat{x}$
exist such that $A\hat{x} = b$. The matrix

$A^+ = V\Sigma^{-1}U^T$

is called the \textcolor{blue}{pseudoinverse of $A$}, and is also known as the \textcolor{blue}{Moore-Penrose inverse of $A$}.

If we look at $A\hat{x}^* = AA^+b$, we observe that:

$A\hat{x}^* = U\Sigma V^TV\Sigma^{-1}U^Tb = UU^Tb$,
$\underbrace{}_{I}$

i.e., $A\hat{x}^*$ is the orthogonal projection $\hat{b}$ of $b$ onto $\text{Col}(A)$.

\textbf{ONLINE NOTES:} Please add the Practice Problems on p.471 of LAA at the end of Ch. 7.4 + solutions

\section*{PRACTICE PROBLEM}

Let $\mathbf{p}_1 = \begin{bmatrix} 1 \\ 0 \\ 2 \end{bmatrix}$, 
$\mathbf{p}_2 = \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix}$, 
$\mathbf{n}_1 = \begin{bmatrix} 1 \\ 1 \\ -2 \end{bmatrix}$, and 
$\mathbf{n}_2 = \begin{bmatrix} -2 \\ 1 \\ 3 \end{bmatrix}$; 
let $H_1$ be the hyperplane (plane) in $\mathbb{R}^3$ passing through the point $\mathbf{p}_1$ and having normal vector $\mathbf{n}_1$; and let $H_2$ be the hyperplane passing through the point $\mathbf{p}_2$ and having normal vector $\mathbf{n}_2$. Give an explicit description of $H_1 \cap H_2$ by a formula that shows how to generate all points.

\section*{SOLUTION TO PRACTICE PROBLEM}

First, compute $\mathbf{n}_1 \cdot \mathbf{p}_1 = -3$ and $\mathbf{n}_2 \cdot \mathbf{p}_2 = 7$. The hyperplane $H_1$ is the solution set of the equation $x_1 + x_2 - 2x_3 = -3$, and $H_2$ is the solution set of the equation $-2x_1 + x_2 + 3x_3 = 7$. Then

\[H_1 \cap H_2 = \{\mathbf{x} : x_1 + x_2 - 2x_3 = -3 \text{ and } -2x_1 + x_2 + 3x_3 = 7\}\]

This is an implicit description of $H_1 \cap H_2$. To find an explicit description, solve the system of equations by row reduction:

\[\begin{bmatrix}
1 & 1 & -2 & -3 \\
-2 & 1 & 3 & 7
\end{bmatrix} \sim
\begin{bmatrix}
1 & 0 & -\frac{5}{3} & -\frac{10}{3} \\
0 & 1 & -\frac{1}{3} & \frac{1}{3}
\end{bmatrix}\]

Thus $x_1 = -\frac{10}{3} + \frac{5}{3}x_3$, $x_2 = \frac{1}{3} + \frac{1}{3}x_3$, $x_3 = x_3$. Let 
$\mathbf{p} = \begin{bmatrix} -\frac{10}{3} \\ \frac{1}{3} \\ 0 \end{bmatrix}$ and 
$\mathbf{v} = \begin{bmatrix} \frac{5}{3} \\ \frac{1}{3} \\ 1 \end{bmatrix}$. The general solution can be written as $\mathbf{x} = \mathbf{p} + x_3\mathbf{v}$. Thus $H_1 \cap H_2$ is the line through $\mathbf{p}$ in the direction of $\mathbf{v}$. Note that $\mathbf{v}$ is orthogonal to both $\mathbf{n}_1$ and $\mathbf{n}_2$.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/09_Ch_10_SVD_Apps/111-Apps.ipynb)