# Notes on the SVD (Singular Value Decomposition) #

- Much of this comes from Trefethen's excellent book.

## A geometric interpretation ## 

We start by noting this important fact from Trefethen:

"The image of the unit sphere under any $m \times n$ matrix is a hyperellipse.

Let's unpack that statement:

First, the *unit sphere* is the sphere in $\mathbb { R } ^ { n}$, such that the Euclidian norm or $2$-norm is always $1$. 

A hyperellipse in $\mathbb { R } ^ { m }$ is the $m$-dimensional generalization of an ellipse. We can obtain a hyperellipse by stretching the unit sphere in $\mathbb { R } ^ { m}$ by factors $\sigma_1,\ldots,\sigma_m$ in the orthogonal directions $\mathbf{u}_1,\ldots,\mathbf{u}_m \in \mathbb { R } ^ { m}$ .

For convenience, we take the $u_i$ to be unit vectors with $\|u_i\|_2 = 1$. The vectors $\{\sigma_i u_i\}$ are then the *principal semiaxes* of the hyperellipse, each with length equal to $\sigma_i$ (which could equal $0$). In particular, if the matrix $A$ has rank $r$, then exactly $r$ of these principal semiaxes will be nonzero.

Let's now actually give a simple picture for this geometric interpretation. We denote the unit sphere in $n$-space as $S $. The hyperellipse in $m$-space, which is the image of $S$ under the mapping $A \in \mathbb{R}^{m\times n}$, is then denoted simply as $AS$. A $2\times 2$ example from Trefethen is show below:

<img src="img/Trefethen_2by2_SVD.png" width="600" height="600">

We assume for simplicity that $A \in \mathbb{R}^{m\times n}$ with $m \geq n$ hase full rank $n$. The $n$ *singular values* $\sigma_1,\sigma_2,...,\sigma_n$ of $A$ are then the lengths of the $n$ principal semiaxes of the hyperellipse $AS$. We usually number the singular values such that they are in descending order; thus, $\sigma _ { 1 } \geq \sigma _ { 2 } \geq \cdots \geq \sigma_n > 0$.

Next, we denote the $n$ unit vectors $\left\{ u _ { 1 } , u _ { 2 } , \dots , u _ { n } \right\}$ representing the directions of the principal semiaxes to be the $n$ *left singular vectors* of A. These vectors are ordered to correspond with their axis lengths  $\left\{ \sigma _ { 1 } , \sigma _ { 2 } , \dots , \sigma _ { n } \right\}$. The vector $\sigma_i u_i$ is thus the $i^{th}$ largest principal semiaxis of $AS$.

Lastly, we have the $n$ *right singular vecotrs* of $A$, which are unit vectors $\left\{ v _ { 1 } , v _ { 2 } , \ldots , v _ { n } \right\} \in S$ representing the *preimages* of the principal semiaxes of $AS$. This then means that 

$$
Av_i = \sigma_iu_i \quad 1 \leq i \leq n
$$

Putting this into a more compact matrix form, we arrive at the **Reduced SVD**, $A V = \hat { U } \hat { \Sigma }$:

<img src="img/Trefethen_reduced_SVD_matrices.png" width="600" height="600">

where $V$ is an $n\times n$ matrix with orthonormal columns, $\hat { U }$ is an $m \times n$ matrix with orthonormal columns, and $\hat { \Sigma }$ is a diagonal $n \times n$ matrix with the $\sigma_i$ values on the diagonal. Since $V$ is unitary (the complex equivalent of orthogonal), we can multiply both sides on the right by $V^*$ to obtain

$$
A = \hat { U } \hat { \Sigma } V ^ { * }
$$

The hats on the matrices signify that they are a par of the *reduced singular value decomposition*. This the form of the SVD factorization of $A$ we usually use in practice. 

We can turn this reduced SVD into the **Full SVD** by making $\hat U$ into a unitary matrix $U$. To make $\hat U$ unitary, it must be square and orthogonal, which we accomplish by appending $m-n$ orthonormal columns to $\hat U$. 

To replace $\hat U \in \mathbb{R}^{m\times n}$ with $ U \in \mathbb{R}^{m \times m}$, we must also require that the size of $\hat { \Sigma }$ is changed. Ideally, we want the $m-n$ columns we added to $\hat U$ to have no effect on our product. Thus, we add $m-n$ zero rows to $\hat { \Sigma } \in \mathbb{R}^{n\times n}$ to make ${ \Sigma } \in \mathbb{R}^{m\times n}$. Thus, we arrive at the full SVD

$$
A = { U } { \Sigma } V ^ { * }
$$

where $U$ is $m\times m$ and unitary, $V$ is $n\times n$ and unitary, and $\Sigma$ is $m\times n$ and diagonal with only positive real entries. Let's compare the two approaches pictorally.

<img src="img/Trefethen_reduced_SVD_sizes.png" width="400" height="400">

<img src="img/Trefethen_full_SVD_sizes.png" width="400" height="400">

As a final note, it is worth mentioning that $A$ need not actually be full rank. If the rank of $A$ is $r < \min(n,m)$, then $\Sigma$ will only have $r$ positive entries on the diagonal, with the other diagonal entries equal to $0$. We stil fill out the columns of $U$ and rows of $\Sigma$, but now with $m-r$ columns and rows, respectively. One additional note is that we must now add $n-r$ orthonormal columns to $V$ in order to fill out the $r$ columns dictated by the geometry.

One last interpretation, let's imagine $A$ acting on $S$. And let's break the multiplication up like so:

$$
AS = ({ U } ({ \Sigma } (V ^ { * }S)))
$$

First, the unitary matrix $V ^ { * }$ preserves the sphere. The $\Sigma$ acts to stretches the sphere in directions aligned with the canonical basis vectors, with lengths corresponding to the singular values. This produces a hyperellipse. Lastly, the unitary matrix $U$ rotates or reflects the the hyperellipse into our final hyperellipse.

### Existence and Uniqueness ###

See Trefethen for a proof that every matrix $A \in \mathbb { C } ^ { m \times n }$ has an SVD, and that the singular values are uniquely determined.


## Cool Facts about the SVD ##

**Theorem 5.1 (Trefethen):** *The rank of $A$ is $r$, the number of nonzero singular values*

First we should recall the theorem that $\operatorname { Rank } ( A B ) \leq \min ( \operatorname { Rank } ( A ) , \operatorname { Rank } ( B ) )$. The intuition for this is to note that the $i^{th}$ column of $AB$ is $A$ times the $i^{th}$ column of $B$. Therefore, all columns of $AB$ lie in the column space of $A$. The dimension of the column space (the rank) of $AB$ is thus *at most* the same as the rank of $A$. The rank could only decrease if $B$ has a lower rank than $A$.

Now back to the SVD decomposition $A = { U } { \Sigma } V ^ { * }$. We know that $U$ and $V$ are full rank; therefore, the rank of $A$ is limited by ${ \Sigma }$. Furthermore, the rank of any diagonal matrix is just the number of nonzero entries. Therfore, $\operatorname { rank } ( A ) = \operatorname { rank } ( \Sigma ) = r$.

**Theorem 5.3 (Trefethen):** $\| A \| _ { 2 } = \sigma _ { 1 }$ and $\| A \| _ { F } = \sqrt { \sigma _ { 1 } ^ { 2 } + \sigma _ { 2 } ^ { 2 } + \cdots + \sigma _ { r } ^ { 2 } }$

For $A \in \mathbb { C } ^ { m \times n } $, we start with the definition of the Frobenius norm:

$$
\| A \| _ { F } = \left( \sum _ { i = 1 } ^ { m } \sum _ { j = 1 } ^ { n } \left| a _ { i j } \right| ^ { 2 } \right) ^ { 1 / 2 }
$$

We now note that the Frobenius norm can be written more simply in terms of the trace:

$$
\| A \| _ { F } = \sqrt { \operatorname { tr } \left( A ^ { * } A \right) } = \sqrt { \operatorname { tr } \left( A A ^ { * } \right) }
$$

To see this equality, consider first the quantity $\operatorname { tr } \left( A ^ { * } A \right)$. The element $(A ^ { * } A)_{ii}$ is just the dot product of the $i^{th}$ column of $A$ with itself, ($A_{:,i} \cdot A_{:,i}$).

Alternatively, the element $(A  A^ { * })_{ii}$ is just the dot product of the $i^{th}$ row of $A$ with itself, ($A_{i,:} \cdot A_{i,:}$). With a little thought, it should be apparent that $\sum_i (A ^ { * } A)_{ii} = \sum_i (A  A^ { * })_{ii} = \left(\| A \| _ { F }\right)^2$.

This formulation of the Frobenius norm in terms of the trace makes it easy to see some specific properties of this norm. For example, observe the effect of multiplication a unitary matrix $Q \in \mathbb { C } ^ { m \times m }$ on $A$.

$$
\| Q A \| _ { F } = \sqrt { \operatorname { tr } \left( (QA) ^ { * } QA \right) } = \sqrt { \operatorname { tr } \left( A^ { * }Q^ { * } QA \right) } = \sqrt { \operatorname { tr } \left( A ^ { * } A \right) } = \| A \| _ { F }
$$

Thus, the Frobenius norm is invariant under multplication by unitary matrices. 

Finally, we return to the SVD, $A = { U } { \Sigma } V ^ { * }$. Since $U$ and $V$ are unitary, $\| A \| _ { F }=\| \Sigma \| _ { F } = \sqrt { \sigma _ { 1 } ^ { 2 } + \sigma _ { 2 } ^ { 2 } + \cdots + \sigma _ { r } ^ { 2 } }$

**Low Rank Approximations** (see Trefethen Theorem 5.8 for more detail)

A nice result is that $A$ can be written as the sum of $r$ rank-one matrices like so

$$
A = \sum _ { j = 1 } ^ { r } \sigma _ { j } u _ { j } v _ { j } ^ { * }
$$

We now define the $\nu^{th}$ partial sum for any $\nu$ with $0 \leq \nu \leq r$ as

$$
A _ { \nu } = \sum _ { j = 1 } ^ { \nu } \sigma _ { j } u _ { j } v _ { j } ^ { * }
$$

It turns out that the $\nu^{th}$ partial sum $A_{\nu}$of $A$ is the best approximation to $A$ with rank $\leq \nu$ in terms of the Frobenius norm. More formally, we can write:

$$
\left\| A - A _ { \nu } \right\| _ { F } = \inf _ { B \in \mathbb { C } ^ { m \times n } \atop \operatorname { rank } ( B ) \leq \nu } \| A - B \| _ { F } = \sqrt { \sigma _ { \nu + 1 } ^ { 2 } + \cdots + \sigma _ { r } ^ { 2 } }
$$

As one can see, if the first $\nu$ singular values of $A$ are large relative to the remaining singular values, then a low-rank approximation can capture much of the Frobenius norm.

**Connection to eigenvectors and eigenvalues**

**Theorem 5.4 (Trefethen):** *The nonzero singular values of $A$ are the square roots of the nonzero eigenvalues of $A ^ { * } A$ or $A A ^ { * }$*

The proof for either $A ^ { * } A$ or $A A ^ { * }$ is quite simple, we'll just start with one of them:

$$
A ^ { * } A = \left( U \Sigma V ^ { * } \right) ^ { * } \left( U \Sigma V ^ { * } \right) = V \Sigma ^ { * } U ^ { * } U \Sigma V ^ { * } = V \left( \Sigma ^ { * } \Sigma \right) V ^ { * }
$$

Thus we see that  $A ^ { * } A$ is similar to $\Sigma ^ { * } \Sigma$, and therefore they share the same eigenvalues. The eigenvalues of a diagonal matrix are just the diagonal entries, thus the eigenvalues of $\Sigma ^ { * } \Sigma$ are $\sigma _ { 1 } ^ { 2 } , \sigma _ { 2 } ^ { 2 } , \ldots , \sigma _ { r } ^ { 2 }$.

The same proof for $A A ^ { * }$ is also quite simple.

$$
 AA ^ { * } =  \left( U \Sigma V ^ { * } \right)\left( U \Sigma V ^ { * } \right) ^ { * } =  U \Sigma V ^ { * }V \Sigma ^ { * } U ^ { * } = U \left( \Sigma \Sigma ^ { * } \right) U ^ { * }
$$

Both of these proofs also reveal something interesting about the eigenvectors of $A ^ { * } A$ and $A A ^ { * }$. The columns of $V$, the *right singular vectors*, are the eigenvectors of $A ^ { * }A$, while the columns of $U$, the *left singular vectors*, are the eigenvectors of $AA ^ { * }$.

One can also easily prove that if $A$ is symmetric, then the singular values of $A$ are the absolute values of the eigenvalues of $A$.
