# 7.2.4 Truncated Singular Value Decomposition

En muchas aplicaciones reales, basta con poder reconstruir las matrices aproximadamente.

Considere la descomposición espectral de la matriz $D$ basada en la discusión de la sección anterior:

$$ D = Q\Sigma P^{T} = \sum_{r = 1}^{\min\{r,d\}} \sigma_{rr}q_{r}p_{r}^{T} $$

En lugar de descartar únicamente los componentes aditivos para los que $\sigma_{rr} = 0$, también podríamos descartar aquellos componentes para los que $\sigma_{rr}$ es muy pequeño.

En otras palabras, mantenemos los valores top-k de σrr en la descomposición (como SVD compacto), excepto que k podría ser menor que el número de valores singulares distintos de cero.

En tal caso, obtenemos una aproximación $D_{k}$ del original matriz $D$, que también se conoce como la aproximación de rango-$k$ de la matriz $D$ de tamaño $n \times d$:  

$$D \approx D_{k} = \sum_{r = 1}^{k}\sigma_{rr}q_{r}p_{r}^{T} $$


Note that Equation 7.4 for truncated singular value decomposition is the same as that for compact singular value decomposition (cf. Equation 7.2); the only difference is that the value of $k$ is no longer chosen to ensure zero information loss. 

Consequently, we can express truncated singular value decomposition as a matrix factorization as follows;

$$ D \approx D_{k} = Q_{k}\Sigma_{k}P_{k}^{T} $$


Aquí, $ Q_ {k} $ es una matriz $ n \times k $ con columnas que contienen los vectores singulares de la parte superior-k izquierda, $ \Sigma_ {k} $ es una matriz diagonal de $ k \times k $ que contiene la parte superior-k valores singulares, y $ P_ {k} $ es una matriz $ d \times k $ con columnas que contienen los $k$ vectores singulares superiores derechos.

No es difícil ver que la matriz $D_{k}$ tiene un rango de $k$ y, por lo tanto, se considera una aproximación de rango bajo de $D$.

Almost all forms of matrix factorization, including singular value decomposition, are low-rank approximations of the original matrix. 

Truncated singular value decomposition can retain a surprisingly large level of accuracy using values of $k$ that are much smaller than $\min{n, d}$. 

This is because only a very small proportion of the singular values are large in real-world matrices. 

In such cases, $D_{k}$ becomes an excellent approximation of D by retaining the few singular vectors that are large.

A useful property of truncated singular value decomposition is that it is also possible to create a lower dimensional representation of the data by changing the basis to $P_{k}$, so that each $d$-dimensional data point is now represented in only k dimensions. 

In other words, we change the axes so that the basis vectors correspond to the columns of $P_{k}$. 

This transformation is achieved by post-multiplying the data matrix $D$ with $P_{k}$ to obtain the $n \times k$ matrix $U_{k}$. 

By post-multiplying Equation 7.5 with $P_{k}$ and using $P^{T}_{k}P_{k} = I_{k}$, we obtain the following:

$$ U_{k} = DP_{k} = Q_{k}\Sigma_{k} $$


Each row of $U_{k}$ contains a reduced $k$-dimensional representation of the corresponding row in $D$. 

Therefore, we can obtain a reduced representation of the data either by post-multiplying the data matrix with the matrix containing the dominant right singular vectors (i.e., using $DP_{k}$), or we can simply scale the dominant left singular vectors with the singular values (i.e., using $Q_{k}\Sigma_{k}$). 

Both these types of methods are used in real applications, depending on whether n or d is larger.

The reduction in dimensionality can be very significant in some domains such as images and text. 

Image data are often represented by matrices of numbers corresponding to pixels.

For example, an image corresponding to an 807 × 611 matrix of numbers is illustrated in Figure 7.1(a). 

Only the first 75 singular values are represented in Figure 7.1(b). 

The remaining 611 − 75 = 536 singular values are not shown because they are very small. 

<img src = ''>

The rapid decay in singular values is quite evident in the figure. 

It is this rapid decay that enables effective truncation without loss of accuracy. 

In the text domain, each document is represented as a row in a matrix with as many dimensions as the number of words. 

The value of each entry is the frequency of the word in the corresponding document. 

Note that this matrix is sparse, which is a standard use-case for SVD. 

The word-frequency matrix D might have n = 106 and d = 105. 

In such cases, truncated SVD might often yield excellent approximations of the matrix by using k ≈ 400. 

This represents a drastic level of reduction in the dimensionality of representation. 

The use of SVD in text is also referred to as latent semantic analysis because of its ability to discover latent (hidden) topics represented by the rank-1 matrices of the spectral decomposition