---
title: 10.2 SVD Applications
subject:  Singular Value Decomposition (SVD)
subtitle: 
short_title: 10.2 SVD Applications
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: 
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/09_Ch_10_SVD_Apps/112-Lin_Alg_Apps.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 18 - Singular Values and the Singular Value Decomposition.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be ALA 8.7 and LAA 7.4

## Learning Objectives

By the end of this page, you should know:
- what is the condition number of a matrix
- how to compute bases of the fundamental subsapces
- what is the pseudoinverse of a matrix and how to compute it

## Introduction

The next few lectures will focus on engineering and AI applications of the SVD. For now, we highlight some more technical linear algebra applications: these are all immensely important from a practical perspective and form subroutines for most real-world applications of linear algebra.

### The Condition Number 

Most numerical calculations that require solving a linear equation $A\vv x=\vv b$ are as reliable as possible when the SVD of $A$ is used. Since the matrices $U$ and $V$ have orthonormal columns, they do not affect lengths or angles between vectors. For example, for $U\in\mathbb{R}^{m\times r}$, we have:

$$
\langle U \vv x, U \vv y \rangle = \vv x^T U^T U \vv y = \vv x^T \vv y = \langle \vv x, \vv y \rangle
$$

for any $\vv x,\vv y\in\mathbb{R}^m$. Therefore, any numerical issues that arise will be due to the diagonal entries of $\Sigma$, i.e., due to the singular values of $A$.

In particular, if some of singular values are much larger than others, this means certain directions are stretched out much more than others, which can lead to rounding errors. A natural way to quantify this notion is using the singular values of $A$. 

:::{prf:definition} Condition Number
:label:cond_defn
If $A\in\mathbb{R}^{n\times n}$ is an $n\times n$ matrix so that $r=\text{rank}(A) = n$, we define the _condition number of A_ to be the ratio $\kappa(A) = \frac{\sigma_1}{\sigma_n}$ of the largest to smallest singular values of $A$. If $A$ is not invertible, it is conventional to set $\kappa(A)=\infty$, although the ratio $\frac{\sigma_1}{\sigma_r}$ is a useful measure of the numerical stability of computing with a rectangular matrix $A\in\mathbb{R}^{m\times n}$.
:::

**TO DO**: Using numpy, show that ill-conditioned $A$ can lead to bad solutions to $Ax=b$ even if $A$ is invertible. Case 1: assume we use $\sigma_1 = b + n$ for $n$ some small measurement noise. Case 2: just make $A$ super ill-conditioned and show that $x=A^{-1}b$ computed using numpy doesn't actually satisfy $Ax=b$.

## Computing Bases of Fundamental Subspaces

Given an SVD for an $m\times n$ matrix $A\in\mathbb{R}^{m\times n}$ with $\text{rank}(A)=r$, let $\vv u_1,\ldots,\vv u_r$ be the left singular vectors, $\vv v_1,\ldots,\vv v_r$ the right singular vectors, and $\sigma_1,\ldots,\sigma_r$ the singular values.

Recall that we showed that $\vv u_1,\ldots,\vv u_r$ forms a basis for $\text{Col}(A)$. Let $\vv u_{r+1},\ldots,\vv u_m$ be an orthonormal basis for $\text{Col}(A)^\perp$ so that $\vv u_1,\ldots,\vv u_m$ form a basis for $\mathbb{R}^m$, computed for example using the Gram-Schmidt Process. Then, by the Fundamental Theorem of Linear Algebra (FTLA), we have that $\text{Col}(A)^\perp = \text{Null}(A^T) = \text{span}\{\vv u_{r+1},\ldots,\vv u_m\}$, i.e., these vectors form an orthonormal basis for $\text{Null}(A^T)=\text{LNull}(A)$.

Next, recall that $\vv v_1,\ldots,\vv v_r,\vv v_{r+1},\ldots,\vv v_n$, the eigenvectors of $A^TA$, form an orthonormal
basis of $\mathbb{R}^n$. Since $A\vv v_i = 0$ for $i = r+1, \ldots, n$, we have that $\vv v_{r+1},\ldots,\vv v_n$ span a subspace of $\text{Null}(A)$ of dimension $n-r$. But, by the FTLA, $\text{dim Null}(A) = n - \text{rank}(A) = n-r$.

Therefore, $\vv v_{r+1},\ldots,\vv v_n$ are an orthonormal basis for $\text{Null}(A)$.

Finally, $\text{Null}(A)^\perp = \text{Col}(A^T) = \text{Row}(A)$. But $\text{Null}(A)^\perp = \text{span}\{\vv v_1,\ldots,\vv v_r\}$ since the $\vv v_i$ are an orthonormal basis for $\mathbb{R}^n$, and thus $\vv v_1,\ldots,\vv v_r$ are an orthonormal basis for $\text{Row}(A)$.

::::{note} Summary
Summarizing the above discussion, we have
- $\text{Col}(A) = \text{span}\{\vv u_1,\ldots,\vv u_r\}$
- $\text{Col}(A)^\perp = \text{LNull}(A) = \text{span}\{\vv u_{r+1},\ldots,\vv u_m\}$
- $\text{Col}(A^T) = \text{Row}(A) = \text{span}\{\vv v_1,\ldots,\vv v_r\}$
- $\text{Col}(A^T)^\perp = \text{Null}(A) = \text{span}\{\vv v_{r+1},\ldots,\vv v_n\}$
:::{figure}../figures/11-fund_sub.jpg
:label:fund_sub
:alt:Fundamental Subspaces
:width: 400px
:align: center
:::
::::


Specializing these observations to square matrices, we have the following theorem characterizing invertible matrices:

:::{prf:theorem}
The following statements are equivalent for a square $n\times n$ matrix $A\in\mathbb{R}^{n\times n}$:
1. $\text{Col}(A)^\perp = \text{Null}(A^T) = \{\vv 0\}$
2. $\text{Null}(A)^\perp = \text{Col}(A^T) = \text{Row}(A) = \mathbb{R}^n$
3. $\text{Col}(A^T)^\perp = \text{Null}(A) = \{\vv 0\}$
4. $\text{Null}(A^T)^\perp = \text{Col}(A) = \mathbb{R}^n$
5. $A$ has rank $=n$
6. $A$ has $n$ (nonzero) singular values
:::

## The Pseudoinverse of $A$

Recall the least squares problem of finding a vector $\vv x$ that minimizes the objective $\|A \vv x-\vv b\|^2$. We saw that the least squares solution is given by the solution to the _normal equations_:

\begin{equation}
\label{ne}
A^TA\hat{\vv x} = A^T\vv b \quad \text{(NE)}
\end{equation}

Let's rewrite [(NE)](#ne) using the SVD $A = U\Sigma V^T$, $A^T = V\Sigma^T U^T = V\Sigma U^T$ ($\Sigma = \Sigma^T$)

$$
A^TA\hat{\vv x} = V\Sigma \underbrace{U^T U}_{I}\Sigma V^T\hat{\vv x} = \underbrace{V\Sigma^2 V^T\hat{\vv x}}_{(a)} = \underbrace{V\Sigma U^T}_{(b)}\vv b
$$

Let's start by left multiplying (a) and (b) by $V^T$ to take advantage of $V^TV = I_r$:
$$
V^T(V\Sigma^2V^T\hat{\vv x}) = V^T(V\Sigma U^T \vv b) \Rightarrow \Sigma^2V^T\hat{\vv x} = \Sigma U^T \vv b.
$$

Now, let's isolate $V^T\hat{\vv x}$ by multiplying both sides by $\Sigma^{-2}$:

\begin{equation}
\label{eqn_1}
V^T\hat{\vv x} = \Sigma^{-1}U^T\vv b.
\end{equation}

Finally, note that $\hat{\vv x}$ satisfies [](#eqn_1) if $\hat{\vv x} = V\Sigma^{-1}U^T\vv b + \vv n$ (again since $V^TV=I$) for any $\vv n \in \text{Null}(V^T) = \text{Col}(V)^\perp$. The special solution $\hat{\vv x}^* = V\Sigma^{-1}U^T\vv b$ can be shown to be the minimum norm least squares solution when several $\hat{\vv x}$
exist such that $A\hat{\vv x} = \vv b$. The matrix

$$
A^+ = V\Sigma^{-1}U^T
$$

is called the _pseudoinverse of $A$_, and is also known as the _Moore-Penrose inverse of $A$_.

If we look at $A\hat{\vv x}^* = AA^+ \vv b$, we observe that:

$$
A\hat{\vv x}^* = U\Sigma \underbrace{V^TV}_{I}\Sigma^{-1}U^Tb = UU^T\vv b,$$

i.e., $A\hat{\vv x}^*$ is the orthogonal projection $\hat{\vv b}$ of $\vv b$ onto $\text{Col}(A)$.

**TO DO**: Practice problems at the end of Ch. 7.4 + solutions

\section*{PRACTICE PROBLEMS}

\begin{enumerate}
    \item Given a singular value decomposition, $A = U\Sigma V^T$, find an SVD of $A^T$. How are the singular values of $A$ and $A^T$ related?
    \item For any $n \times n$ matrix $A$, use the SVD to show that there is an $n \times n$ orthogonal matrix $Q$ such that $A^TA = Q^T(A^TA)Q$.
\end{enumerate}

\textit{Remark:} Practice Problem 2 establishes that for any $n \times n$ matrix $A$, the matrices $AA^T$ and $A^TA$ are orthogonally similar.

\section*{SOLUTIONS TO PRACTICE PROBLEMS}

\begin{enumerate}
    \item If $A = U\Sigma V^T$, where $\Sigma$ is $m \times n$, then $A^T = (V^T)^T \Sigma^T U^T = V\Sigma^T U^T$. This is an SVD of $A^T$ because $V$ and $U$ are orthogonal matrices and $\Sigma^T$ is an $n \times m$ ``diagonal'' matrix. Since $\Sigma$ and $\Sigma^T$ have the same nonzero diagonal entries, $A$ and $A^T$ have the same nonzero singular values. [Note: If $A$ is $2 \times n$, then $AA^T$ is only $2 \times 2$ and its eigenvalues may be easier to compute (by hand) than the eigenvalues of $A^TA$.]
    
    \item Use the SVD to write $A = U\Sigma V^T$, where $U$ and $V$ are $n \times n$ orthogonal matrices and $\Sigma$ is an $n \times n$ diagonal matrix. Notice that $U^TU = I = V^TV$ and $\Sigma^T = \Sigma$, since $U$ and $V$ are orthogonal matrices and $\Sigma$ is a diagonal matrix. Substituting the SVD for $A$ into $AA^T$ and $A^TA$ results in
    
    \[AA^T = U\Sigma V^T(U\Sigma V^T)^T = U\Sigma V^TV\Sigma^T U^T = U\Sigma\Sigma^T U^T = U\Sigma^2 U^T,\]
    
    and
    
    \[A^TA = (U\Sigma V^T)^T U\Sigma V^T = V\Sigma^T U^TU\Sigma V^T = V\Sigma^T\Sigma V^T = V\Sigma^2 V^T.\]
    
    Let $Q = VU^T$. Then
    
    \[Q^T(A^TA)Q = (VU^T)^T(V\Sigma^2 V^T)(VU^T) = UV^TV\Sigma^2 V^TVU^T = U\Sigma^2 U^T = AA^T.\]
\end{enumerate}

**Check what is the below practice problem**

Let $\mathbf{p}_1 = \begin{bmatrix} 1 \\ 0 \\ 2 \end{bmatrix}$, 
$\mathbf{p}_2 = \begin{bmatrix} -1 \\ 2 \\ 1 \end{bmatrix}$, 
$\mathbf{n}_1 = \begin{bmatrix} 1 \\ 1 \\ -2 \end{bmatrix}$, and 
$\mathbf{n}_2 = \begin{bmatrix} -2 \\ 1 \\ 3 \end{bmatrix}$; 
let $H_1$ be the hyperplane (plane) in $\mathbb{R}^3$ passing through the point $\mathbf{p}_1$ and having normal vector $\mathbf{n}_1$; and let $H_2$ be the hyperplane passing through the point $\mathbf{p}_2$ and having normal vector $\mathbf{n}_2$. Give an explicit description of $H_1 \cap H_2$ by a formula that shows how to generate all points.

\section*{SOLUTION TO PRACTICE PROBLEM}

First, compute $\mathbf{n}_1 \cdot \mathbf{p}_1 = -3$ and $\mathbf{n}_2 \cdot \mathbf{p}_2 = 7$. The hyperplane $H_1$ is the solution set of the equation $x_1 + x_2 - 2x_3 = -3$, and $H_2$ is the solution set of the equation $-2x_1 + x_2 + 3x_3 = 7$. Then

\[H_1 \cap H_2 = \{\mathbf{x} : x_1 + x_2 - 2x_3 = -3 \text{ and } -2x_1 + x_2 + 3x_3 = 7\}\]

This is an implicit description of $H_1 \cap H_2$. To find an explicit description, solve the system of equations by row reduction:

\[\begin{bmatrix}
1 & 1 & -2 & -3 \\
-2 & 1 & 3 & 7
\end{bmatrix} \sim
\begin{bmatrix}
1 & 0 & -\frac{5}{3} & -\frac{10}{3} \\
0 & 1 & -\frac{1}{3} & \frac{1}{3}
\end{bmatrix}\]

Thus $x_1 = -\frac{10}{3} + \frac{5}{3}x_3$, $x_2 = \frac{1}{3} + \frac{1}{3}x_3$, $x_3 = x_3$. Let 
$\mathbf{p} = \begin{bmatrix} -\frac{10}{3} \\ \frac{1}{3} \\ 0 \end{bmatrix}$ and 
$\mathbf{v} = \begin{bmatrix} \frac{5}{3} \\ \frac{1}{3} \\ 1 \end{bmatrix}$. The general solution can be written as $\mathbf{x} = \mathbf{p} + x_3\mathbf{v}$. Thus $H_1 \cap H_2$ is the line through $\mathbf{p}$ in the direction of $\mathbf{v}$. Note that $\mathbf{v}$ is orthogonal to both $\mathbf{n}_1$ and $\mathbf{n}_2$.

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/09_Ch_10_SVD_Apps/112-Lin_Alg_Apps.ipynb)