# But what *is* the SVD?

Trefethen and Bau offer an interesting insight into what the SVD of a matrix is geometrically in Lecture 5.

In [1]:
import numpy as np
from plotly import __version__
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
import plotly.figure_factory as ff

%load_ext autoreload
%autoreload 2
init_notebook_mode(connected=True)
np.set_printoptions(suppress=True)

## Helper functions

In [2]:
def unit_sphere(n=100, dim=3):
    u = np.random.normal(0, 1, (100,3))
    d = np.linalg.norm(u, ord=2, axis=1)

    return u / d[:,None]

## Low-Rank Approximations of *A*

Finding the SVD of a matrix at this point is trivial, but taking a closer look at the components of the SVD reveals some interesting properties of the matrix.

For now, let's stick to a 3-by-3 matrix.

In [3]:
A = np.array([[1, 4, 5], [-7, 3, 2], [4, 0, -3]])
u, s, vh = np.linalg.svd(A)

u, s, vh

(array([[ 0.36381461, -0.92988013,  0.05442306],
        [ 0.79676269,  0.34093355,  0.49893239],
        [-0.48250196, -0.13815663,  0.8649304 ]]),
 array([9.45721378, 5.90474739, 2.16681005]),
 array([[-0.75535271,  0.40662573,  0.51390446],
        [-0.65524251, -0.45670369, -0.60173001],
        [ 0.00997685,  0.79125044, -0.61141083]]))

Just looking at the numbers alone, we don't gain much understanding other than that *u*, *s*, and *vh* contain the eigenvalues and some orthogonal bases for *A*.

One way to think of $A$ is as the sum of $r$ rank-one matrices:

$$\begin{equation}
A = \sum_{j=1}^r \sigma_j u_j v_j^T
\label{rank_one}
\end{equation} $$

where, for $v\leq r$

$$\begin{equation}
A_v = \sum_{j=1}^v \sigma_j u_j v_j^T
\label{Av}
\end{equation}$$

In this definition, the "$A_v$ is the $v^{th}$ partial sum which captures as much of the energy of A as possible." I will revisit this point later.

For now, let's take a look at the three rank-one matrices $A_i$ calculated using Equation $\ref{rank_one}$.

In [4]:
Ai = [s[i] * np.outer(u[:,i], vh[i]) for i in range(3)]
Ai = np.stack(Ai, axis=0)

Ai

array([[[-2.59892131,  1.39906598,  1.76817693],
        [-5.69169986,  3.06398798,  3.87234979],
        [ 3.44676826, -1.85548374, -2.34500987]],

       [[ 3.5977448 ,  2.50762626,  3.30392334],
        [-1.31908603, -0.91940228, -1.2113586 ],
        [ 0.53453373,  0.37256973,  0.4908793 ]],

       [[ 0.00117651,  0.09330775, -0.07210027],
        [ 0.01078589,  0.8554143 , -0.66099119],
        [ 0.01869801,  1.48291401, -1.14586943]]])

By taking the sum of these matrices as in Equation $\ref{rank_one}$, we get $A$.

In [5]:
np.sum(Ai, axis=0), np.allclose(A, np.sum(Ai, axis=0))

(array([[ 1.,  4.,  5.],
        [-7.,  3.,  2.],
        [ 4.,  0., -3.]]), True)

The geometric interpretation of this is to approximate a hyperellipsoid $A$ with hyperellipsoids of increasing dimensions $A_v$.

To visualize this, we first need to map a unit sphere in $\mathbb{R}^3$ into a hyperellipsoid described by $A$.

In [6]:
X = unit_sphere(100, 3)
Ax = np.dot(X, A.T)

fig = go.Figure(data=[go.Scatter3d(x=X[:,0], y=X[:,1], z=X[:,2],
                                   mode='markers', name='unit sphere')])
fig.add_trace(go.Scatter3d(x=Ax[:,0], y=Ax[:,1], z=Ax[:,2],
                           mode='markers', name='hyperellipsoid'))
fig.show()

If you've learned about the SVD, it's likely that you've seen this image before. Now, let's try looking at the low-rank approximations of this hyperellipsoid. Here, $A_i x$ denotes the $i^{th}$ rank-one matrix $A_i$ multiplied by $x$.

In [7]:
Aix = np.tensordot(X, Ai, axes=((1), (2)))

fig = go.Figure(data=[go.Scatter3d(x=X[:,0], y=X[:,1], z=X[:,2],
                                   mode='markers', name='unit sphere')])
fig.add_trace(go.Scatter3d(x=Ax[:,0], y=Ax[:,1], z=Ax[:,2],
                           mode='markers', name='hyperellipsoid'))

for i in range(3):
    fig.add_trace(go.Scatter3d(x=Aix[:,i,0], y=Aix[:,i,1], z=Aix[:,i,2],
                           mode='markers', name='A_'+str(i+1)+'x'))

fig.show()

We can see that $A_i x$ are merely the axes of the hyperellipsoid where $A_1 x$ is the longest axis.

Similarly, we can visualize increasingly better approximations of $A$ using Equation $\ref{Av}$.

In [8]:
fig = go.Figure(data=[go.Scatter3d(x=X[:,0], y=X[:,1], z=X[:,2],
                                   mode='markers', name='unit sphere')])

Av = [np.sum(Ai[:i+1,:,:], axis=0) for i in range(3)]

for i in range(3):    
    Avx = np.tensordot(X, Av, axes=((1), (2)))
    
    fig.add_trace(go.Scatter3d(x=Avx[:,i,0], y=Avx[:,i,1], z=Avx[:,i,2],
                           mode='markers', name='Av_'+str(i+1)+'x'))

fig.show()

The figure very clearly shows how the approximation of $A$ by $A_v$ changes the corresponding hyperellipsoid.

$A_v$ when $v=1$ is simply the major axis of the hyperellipsoid. However, when $v=2$, the line transforms into an ellipse (2D ellipsoid) with the addition of the second-longest axis. And when $v=3$, the addition of the shortest axis transforms the ellipse into an ellipsoid.

## Image Compression

As pointed out by Trefethen and Bau, one application of this idea of low-rank approximations is image compression.

For this example, we have a binary image of the word "HELLO".

## Resources

[Numerical Linear Algebra by Lloyd N. Trefethen and David Bau III](https://www.amazon.com/Numerical-Linear-Algebra-Lloyd-Trefethen/dp/0898713617)