# Notes on the SVD (Singular Value Decomposition) #

- Much of this comes from Trefethen's excellent book.

## A geometric interpretation ## 

We start by noting this important fact from Trefethen:

"The image of the unit sphere under any $m \times n$ matrix is a hyperellipse.

Let's unpack that statement:

First, the *unit sphere* is the sphere in $\mathbb { R } ^ { n}$, such that the Euclidian norm or $2$-norm is always $1$. 

A hyperellipse in $\mathbb { R } ^ { m }$ is the $m$-dimensional generalization of an ellipse. We can obtain a hyperellipse by stretching the unit sphere in $\mathbb { R } ^ { m}$ by factors $\sigma_1,\ldots,\sigma_m$ in the orthogonal directions $\mathbf{u}_1,\ldots,\mathbf{u}_m \in \mathbb { R } ^ { m}$ .

For convenience, we take the $u_i$ to be unit vectors with $\|u_i\|_2 = 1$. The vectors $\{\sigma_i u_i\}$ are then the *principal semiaxes* of the hyperellipse, each with length equal to $\sigma_i$ (which could equal $0$). In particular, if the matrix $A$ has rank $r$, then exactly $r$ of these principal semiaxes will be nonzero.

Let's now actually give a simple picture for this geometric interpretation. We denote the unit sphere in $n$-space as $S $. The hyperellipse in $m$-space, which is the image of $S$ under the mapping $A \in \mathbb{R}^{m\times n}$, is then denoted simply as $AS$. A $2\times 2$ example from Trefethen is show below:

<img src="img/Trefethen_2by2_SVD.png" width="600" height="600">

We assume for simplicity that $A \in \mathbb{R}^{m\times n}$ with $m \geq n$ hase full rank $n$. The $n$ *singular values* $\sigma_1,\sigma_2,...,\sigma_n$ of $A$ are then the lengths of the $n$ principal semiaxes of the hyperellipse $AS$. We usually number the singular values such that they are in descending order; thus, $\sigma _ { 1 } \geq \sigma _ { 2 } \geq \cdots \geq \sigma_n > 0$.

Next, we denote the $n$ unit vectors $\left\{ u _ { 1 } , u _ { 2 } , \dots , u _ { n } \right\}$ representing the directions of the principal semiaxes to be the $n$ *left singular vectors* of A. These vectors are ordered to correspond with their axis lengths  $\left\{ \sigma _ { 1 } , \sigma _ { 2 } , \dots , \sigma _ { n } \right\}$. The vector $\sigma_i u_i$ is thus the $i^{th}$ largest principal semiaxis of $AS$.

Lastly, we have the $n$ *right singular vecotrs* of $A$, which are unit vectors $\left\{ v _ { 1 } , v _ { 2 } , \ldots , v _ { n } \right\} \in S$ representing the *preimages* of the principal semiaxes of $AS$. This then means that 

$$
Av_i = \sigma_iu_i \quad 1 \leq i \leq n
$$

Putting this into a more compact matrix form, we arrive at the **Reduced SVD**, $A V = \hat { U } \hat { \Sigma }$:

<img src="img/Trefethen_reduced_SVD_matrices.png" width="600" height="600">

where $V$ is an $n\times n$ matrix with orthonormal columns, $\hat { U }$ is an $m \times n$ matrix with orthonormal columns, and $\hat { \Sigma }$ is a diagonal $n \times n$ matrix with the $\sigma_i$ values on the diagonal. Since $V$ is unitary (the complex equivalent of orthogonal), we can multiply both sides on the right by $V^*$ to obtain

$$
A = \hat { U } \hat { \Sigma } V ^ { * }
$$

The hats on the matrices signify that they are a par of the *reduced singular value decomposition*. This the form of the SVD factorization of $A$ we usually use in practice. 

We can turn this reduced SVD into the **Full SVD** by making $\hat U$ into a unitary matrix $U$. To make $\hat U$ unitary, it must be square and orthogonal, which we accomplish by appending $m-n$ orthonormal columns to $\hat U$. 

To replace $\hat U \in \mathbb{R}^{m\times n}$ with $ U \in \mathbb{R}^{m \times m}$, we must also require that the size of $\hat { \Sigma }$ is changed. Ideally, we want the $m-n$ columns we added to $\hat U$ to have no effect on our product. Thus, we add $m-n$ zero rows to $\hat { \Sigma } \in \mathbb{R}^{n\times n}$ to make ${ \Sigma } \in \mathbb{R}^{m\times n}$. Thus, we arrive at the full SVD

$$
A = { U } { \Sigma } V ^ { * }
$$

where $U$ is $m\times m$ and unitary, $V$ is $n\times n$ and unitary, and $\Sigma$ is $m\times n$ and diagonal with only positive real entries. Let's compare the two approaches pictorally.

<img src="img/Trefethen_reduced_SVD_sizes.png" width="400" height="400">

<img src="img/Trefethen_full_SVD_sizes.png" width="400" height="400">

As a final note, it is worth mentioning that $A$ need not actually be full rank. If the rank of $A$ is $r < \min(n,m)$, then $\Sigma$ will only have $r$ positive entries on the diagonal, with the other diagonal entries equal to $0$. We stil fill out the columns of $U$ and rows of $\Sigma$, but now with $m-r$ columns and rows, respectively. One additional note is that we must now add $n-r$ orthonormal columns to $V$ in order to fill out the $r$ columns dictated by the geometry.