---
title: 10.1 Applications
subject:  SVD
subtitle: 
short_title: 10.1 Applications
authors:
  - name: Nikolai Matni
    affiliations:
      - Dept. of Electrical and Systems Engineering
      - University of Pennsylvania
    email: nmatni@seas.upenn.edu
license: CC-BY-4.0
keywords: 
math:
  '\vv': '\mathbf{#1}'
  '\bm': '\begin{bmatrix}'
  '\em': '\end{bmatrix}'
  '\R': '\mathbb{R}'
---

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/08_Ch_9_Symmetric_Matrices/102-Apps.ipynb)

{doc}`Lecture notes <../lecture_notes/Lecture 17 - Introduction to Graph Theory and Consensus Protocols.pdf>`

## Reading

Material related to this page, as well as additional exercises, can be

## Learning Objectives

By the end of this page, you should know:
- 

\section*{Warmup}

The diagonalization theorems we've seen for complete and symmetric matrices have played a role in many interesting applications. Unfortunately, not all matrices can be diagonalized. As we've seen before, if $A$ is not square, even a factorization makes no sense. If $A$ is not square, fortunately a factorization $A = P \Delta Q^T$ is possible for any matrix $A$. A special factorization of this type, called the singular value decomposition, is one of the most useful and widely applicable matrix factorizations in linear algebra.

The singular value decomposition is based on the following key points of matrix diagonalization which we'll show can be captured in general rectangular matrices:

\textcolor{magenta}{Key observation: The absolute values of the eigenvalues of a symmetric matrix $A$ measure the amounts that $A$ stretches or shrinks certain vectors (the eigenvectors). If $Ax = \lambda x$ and $\|x\| = 1$, then}

\[
\|Ax\| = \lambda \|x\| = |\lambda| \|x\| = |\lambda|.
\]

\textcolor{magenta}{If $\lambda_1$ is the eigenvalue with the greatest magnitude, i.e., if $|\lambda_1| \geq |\lambda_i|$ for $i=1,\ldots,n$, then a corresponding unit eigenvector $v_1$ identifies the direction in which stretching is greatest. That is, the length of $Ax$ is maximized when $x = v_1$, and $\|Av_1\| = |\lambda_1|$.}

This description is reminiscent of the optimization principle we saw for characterizing eigenvalues of symmetric matrices, albeit with a focus on maximizing length $\|Ax\|$ rather than the quadratic form $x^T A x$. What we'll see next is that this description of $v_1$ and $|\lambda_1|$ has an analogue for rectangular matrices that will lead to the singular value decomposition.

Example: The matrix $A = \begin{bmatrix} 4 & 11 & 14 \\ 8 & 7 & -2 \end{bmatrix}$ defines a linear map $x \mapsto Ax$ from $\mathbb{R}^3$ to $\mathbb{R}^2$. If we consider the effects of this map on the unit sphere $\{x \in \mathbb{R}^3 \mid \|x\| = 1\}$, we observe that multiplication by $A$ transforms this sphere in $\mathbb{R}^3$ into an ellipse in $\mathbb{R}^2$:

\begin{center}
\includegraphics[width=0.8\textwidth]{figure1.png}
\captionof{figure}{A transformation from $\mathbb{R}^3$ to $\mathbb{R}^2$.}
\end{center}

Our task is to find a unit vector $x$ that maximizes the length $\|Ax\|$ is maximized, and compute this maximum length. That is, we want to solve the optimization problem:

\[
\max_{x \in \mathbb{R}^n} \|Ax\|
\]

Our choice of $x$ satisfying $\|x\|=1$. Our first observation is that the quantity $\|Ax\|^2$ is easier to work with than $\|Ax\|$, but that $\|Ax\|^2$ is equal to $x^TA^TAx$. Specifically, note that

\[
\|Ax\|^2 = \langle Ax, Ax \rangle = (Ax)^T(Ax) = x^TA^TAx = x^T(A^TA)x.
\]

So our task is to now find a unit vector $\|x\|=1$ that maximizes the quadratic form $x^T(A^TA)x$. By the spectral (Courant-Fischer) theorem, we know how to do this. By our theorem concerning eigenpairs of symmetric matrices from an optimization perspective, we know the maximum value is the largest eigenvalue $\lambda_1$ of the matrix $A^TA$, and is attained at the unit eigenvector $v_1$ of $A^TA$ corresponding to $\lambda_1$.

For the matrix in this example:

\[
A^TA = \begin{pmatrix}
4 & 8 & 14 \\
8 & 7 & -2 \\
14 & -2 & 0
\end{pmatrix}
\begin{pmatrix}
4 & 8 & 14 \\
8 & 7 & -2 \\
14 & -2 & 0
\end{pmatrix} = 
\begin{pmatrix}
80 & 100 & 40 \\
100 & 170 & 140 \\
40 & 140 & 200
\end{pmatrix}
\]

and the eigenvalue/vector pairs are:

\begin{align*}
\lambda_1 = 360, \quad v_1 &= \begin{pmatrix} 1/3 \\ 2/3 \\ 2/3 \end{pmatrix}, \\
\lambda_2 = 90, \quad v_2 &= \begin{pmatrix} -4/3 \\ -1/3 \\ 2/3 \end{pmatrix}, \\
\lambda_3 = 0, \quad v_3 &= \begin{pmatrix} 2/3 \\ -2/3 \\ 1/3 \end{pmatrix}.
\end{align*}

The maximum value of $x^T(A^TA)x = \|Ax\|^2$ is thus $\lambda_1 = 360$, and attained when $x = v_1$. The vector $Av_1$ is a point on the ellipse in Fig.1 above furthest from the origin, namely

\[
Av_1 = \begin{pmatrix}
4 & 8 & 14 \\
8 & 7 & -2 \\
14 & -2 & 0
\end{pmatrix}
\begin{pmatrix}
1/3 \\
2/3 \\
2/3
\end{pmatrix} = 
\begin{pmatrix}
18 \\
6 \\
0
\end{pmatrix}.
\]

For $\|x\| = 1$, the maximum value of $\|Ax\|$ is $\|Av_1\| = \sqrt{360} = 6\sqrt{10}$.

This example suggests that the effect of a matrix $A$ on the unit sphere $\|x\| = 1$ is related to the eigenpairs of $A^TA$. In fact we'll see next that the entire geometric behavior of the map $x \mapsto Ax$ is captured by this quadratic form.

\section*{The Singular Values of an m×n Matrix}

Consider an m×n real matrix $A \in \mathbb{R}^{m \times n}$. Then $A^TA$ is an n×n symmetric matrix, and can be orthogonally diagonalized. Let $V = [v_1 \cdots v_n]$ be an orthogonal matrix composed of orthonormal eigenvectors of $A^TA$, and let $\lambda_1, \ldots, \lambda_n$ be the associated eigenvalues of $A^TA$. Then for $i = 1, \ldots, n$,

\begin{align*}
\|A v_i\|^2 &= (Av_i)^T(Av_i) = v_i^T(A^TAv_i) \\
&= v_i^T(\lambda_i v_i) \\
&= \lambda_i v_i^Tv_i = \lambda_i \|v_i\|^2 \\
&= \lambda_i.
\end{align*}

This tells us that all of the eigenvalues $\lambda_i = \|Av_i\|^2 \geq 0$, i.e. $A^TA$ can only take on nonnegative values. $A^TA$ is a positive semidefinite matrix. Let's assume that we've ordered our eigenvalues in decreasing order:

\[
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_n \geq 0.
\]

The singular values of $A$ are the positive square roots of the nonzero eigenvalues $\lambda_i > 0$ of $A^TA$ denoted $\sigma_i$. That is, let $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r > 0$ and $\lambda_{r+1} = \cdots = \lambda_n = 0$ be a partition of the eigenvalues such that $\lambda_i > 0$ for $i = 1, \ldots, r$, and $\lambda_i = 0$ for $i = r+1, \ldots, n$. Then $A$ has $r$ singular values, defined as

\[
\sigma_i = \sqrt{\lambda_i}, \quad i = 1, \ldots, r.
\]

\textbf{WARNING:} Some texts include the zero eigenvalues $\lambda_{r+1}, \ldots, \lambda_n$ of $A^TA$ as singular values of $A$. This is simply a different convention and is mathematically equivalent. However, we find our definition to be more natural for our purposes.

\textbf{Example:} Using the same $A = \begin{pmatrix} 4 & 8 & 14 \\ 8 & 7 & -2 \\ 14 & -2 & 0 \end{pmatrix}$ as the previous example,

we have $\sigma_1 = \sqrt{360} = 6\sqrt{10}$, $\sigma_2 = \sqrt{90} = 3\sqrt{10}$. In this case, $A$ only has two singular values as $\lambda_3 = 0$. For this example, $r = 2$, and $\lambda_1 = 360 > \lambda_2 = 90 > \lambda_3 = 0$.

From the previous example, the first singular value of $A$ is the maximum of $\|Ax\|$ over all $\|x\| = 1$, attained at $v_1$. Our optimization-based characterization of eigenvectors of symmetric matrices tells us that the second singular value of $A$ is the maximum of $\|Ax\|$ over all unit vectors orthogonal to $v_1$; this is attained by $v_2$, the second eigenvector of $A^TA$. For $v_2$ from the previous example:

\[
Av_2 = \begin{pmatrix} 4 & 8 & 14 \\ 8 & 7 & -2 \\ 14 & -2 & 0 \end{pmatrix} \begin{pmatrix} -4/3 \\ -1/3 \\ 2/3 \end{pmatrix} = \begin{pmatrix} 2 \\ -7 \\ -7 \end{pmatrix}
\]

This point is on the minor axis of the ellipse in Fig. 1 above, just as $Av_1$ is on the major axis (see Fig. 2 below). The two singular values of $A$ are the lengths of the semi-axes of this ellipse.

of the major and minor semiaxes of the ellipse

\begin{figure}[h]
\centering
\includegraphics[width=0.5\textwidth]{figure2}
\caption{An ellipse showing $Av_1$ and $Av_2$ as orthogonal vectors along the major and minor axes.}
\end{figure}

That $Av_1$ and $Av_2$ are orthogonal is no accident, as the next theorem shows:

\begin{theorem}
Suppose that $v_1, \ldots, v_n$ is an orthonormal basis for $\mathbb{R}^n$ composed of the eigenvectors of $A^TA$, ordered so that the corresponding eigenvalues of $A^TA$ satisfy
\[
\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_r > \lambda_{r+1} = \cdots = \lambda_n = 0,
\]
where $r$ denotes the number of nonzero eigenvalues of $A^TA$, i.e. the number of singular values $\sigma_i = \sqrt{\lambda_i} > 0$, $i=1,\ldots,r$ of $A$. Then, $Av_1, \ldots, Av_r$ is an orthogonal basis for $Col(A)$, and $rank(A) = r$.
\end{theorem}

\begin{proof}
Because $v_i$ and $v_j$ are orthogonal for $i \neq j$,
\[
(Av_i)^T(Av_j) = v_i^TA^TAv_j = v_i^T\lambda_jv_j = 0.
\]

Thus, $Av_1, \ldots, Av_r$ are mutually orthogonal, and hence linearly independent. They are also clearly contained in $Col(A)$. Now, for any $y \in Col(A)$, there must be an $x \in \mathbb{R}^n$ such that $y = Ax$. Expanding $x$ in the basis $v_1, \ldots, v_n$, as $x = c_1v_1 + \cdots + c_nv_n$ for some $c_1, \ldots, c_n \in \mathbb{R}$, we have:

\begin{align*}
y &= Ax = A(c_1v_1 + \cdots + c_nv_n) = c_1Av_1 + \cdots + c_rAv_r + c_{r+1}Av_{r+1} + \cdots + c_nAv_n \\
&= c_1Av_1 + \cdots + c_rAv_r.
\end{align*}

We used that $\|Av_i\|^2 = \lambda_i = 0$ for $i=r+1,\ldots,n \Rightarrow Av_i = 0$ for $i=r+1,\ldots,n$ in the last equality.

Therefore, we have that $y \in span\{Av_1, \ldots, Av_r\}$. Thus $Av_1, \ldots, Av_r$ is both linearly independent and a spanning set for $Col(A)$, meaning it is an orthogonal basis for $Col(A)$. Hence, by the Fundamental Theorem of Linear Algebra,

\[
rank(A) = dim\,Col(A) = r.
\]
\end{proof}

\textbf{Numerical Note:} In certain cases, the rank of $A$ may be very sensitive to small changes in the entries of $A$. The classic approach of counting the \# of pivot columns in $A$ does not work well if $A$ is row reduced by a computer, as roundoff errors often create a row echelon form with full rank. In practice, the most reliable way of computing the rank of a large matrix $A$ is to count the number of singular values larger than a small threshold $\epsilon$ (typically on the order of $10^{-12}$ for very accurate computations). In this case, singular values smaller than $\epsilon$ are treated as zeros for all practical purposes, and the effective rank of $A$ is computed by counting the remaining nonzero singular values.

\section*{The Singular Value Decomposition}

The decomposition of $A$ involves an $r \times r$ diagonal matrix $\Sigma$ of the form

\[
\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r).
\]

We note that because $r = \text{dim Col}(A) = \text{dim Range}(A)$ by the FTLA, we must have that $r \leq \min\{m,n\}$ if $A \in \mathbb{R}^{m \times n}$.

\begin{theorem}
Let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix of rank $r > 0$. Then $A$ can be factored as
\[
A = U \Sigma V^T,
\]
where $U \in \mathbb{R}^{m \times r}$ has orthonormal columns, so $U^TU = I_r$, $\Sigma = \text{diag}(\sigma_1, \ldots, \sigma_r)$ is a diagonal matrix with the singular values of $A$ $\sigma_i$ along the diagonal, and $V \in \mathbb{R}^{n \times r}$ has orthonormal columns, so $V^TV = I_r$.
\end{theorem}

Such a factorization of $A$ is called its singular value decomposition, and the columns of $U$ are called the left singular vectors of $A$, while the columns of $V$ are called the right singular vectors of $A$.

\begin{proof}
Let $\lambda_i$ and $v_i$ be the eigenvalues/vectors of $A^TA$ as described previously, so that $Av_1, \ldots, Av_r$ is an orthogonal basis for col$(A)$. Normalize each $Av_i$ to form an orthonormal basis for col$(A)$:

\[
u_i := \frac{1}{\|Av_i\|} Av_i = \frac{1}{\sigma_i} Av_i
\]

and hence $Av_i = \sigma_i u_i$ for $i=1,\ldots,r$. Define the matrices

\[
U = [u_1 \cdots u_r] \in \mathbb{R}^{m \times r} \quad \text{and} \quad V = [v_1 \cdots v_r] \in \mathbb{R}^{n \times r}
\]

By construction, the columns of $U$ are orthonormal: $U^TU = I_r$, and similarly for the columns of $V$: $V^TV = I_r$.

\end{proof}

Let's define the following "full" matrices:

\[
\hat{U} = [U \; U^\perp] \in \mathbb{R}^{m \times m} \quad \text{and} \quad \hat{V} = [V \; V^\perp] \in \mathbb{R}^{n \times n}
\]

Here, $V^\perp = [v_{r+1} \cdots v_n]$ has orthonormal columns spanning the orthogonal complement of span$\{v_1, \ldots, v_r\}$, so that the columns of $\hat{V}$ form an orthonormal basis of $\mathbb{R}^n$.

Similarly, let $U^\perp$ have orthonormal columns spanning the orthogonal complement of span$\{u_1, \ldots, u_r\}$, so the columns of $\hat{U}$ form an orthonormal basis for $\mathbb{R}^m$.

Finally, define $\hat{\Sigma} = \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix}_{m \times n}$. We first show that

\[
A = \hat{U} \hat{\Sigma} \hat{V}^T, \quad \text{or equivalently (since $\hat{V}$ is orthogonal),} \quad A\hat{V} = \hat{U}\hat{\Sigma}.
\]

First,
\[
A\hat{V} = [Av_1 \cdots Av_r \; Av_{r+1} \cdots Av_n] = [\sigma_1u_1 \cdots \sigma_ru_r \; 0 \cdots 0].
\]

Then, notice:

\[
\hat{U} \hat{\Sigma} = [u_1 \cdots u_r \; u_{r+1} \cdots u_m] 
\begin{bmatrix}
\sigma_1 & & 0 & 0 \cdots 0 \\
& \ddots & & \vdots \\
0 & & \sigma_r & 0 \cdots 0 \\
0 & \cdots & 0 & 0 \cdots 0 \\
\vdots & & \vdots & \vdots \\
0 & \cdots & 0 & 0 \cdots 0
\end{bmatrix}
= [\sigma_1u_1 \cdots \sigma_ru_r \; 0 \cdots 0]
\]

So that $A\hat{V} = \hat{U}\hat{\Sigma}$, or equivalently, $A = \hat{U}\hat{\Sigma}\hat{V}^T$. But, now, notice:

\[
A = \hat{U} \hat{\Sigma} \hat{V}^T = [U \; U^\perp] \begin{bmatrix} \Sigma & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V^T \\ (V^\perp)^T \end{bmatrix} = U \Sigma V^T, \quad \text{proving our result.}
\]

\textbf{NOTE:} Some textbooks define the singular value decomposition of $A$ as $A = \hat{U} \hat{\Sigma} \hat{V}^T$. This is necessary when allowing for singular values equal to zero. When only considering nonzero singular values, as we do, $A = U\Sigma V^T$ is the appropriate definition. This is sometimes called the compact SVD of $A$, but we will just call it the SVD.

\textbf{Example:} Let's use the results of the previous example to construct the SVD of
\[
A = \begin{bmatrix} 4 & 8 & 14 \\ 8 & 7 & -2 \end{bmatrix}.
\]

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/nikolaimatni/ese-2030/HEAD?labpath=/08_Ch_9_Symmetric_Matrices/102-Apps.ipynb)