# Singular Value Decomposition - Algorithms and Error Analysis

---

We study only algorithms for real matrices, which are most commonly used in the applications described in this course. 


For more details, see 
[A. Kaylor Cline and I. Dhillon, Computation of the Singular Value Decomposition][Hog14] and the references therein.

[Hog14]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 58.1-58.13, CRC Press, Boca Raton, 2014."


## Prerequisites

The reader should be familiar with facts about the singular value decomposition and perturbation theory and algorithms for the symmetric eigenvalue decomposition.

 
## Competences 

The reader should be able to apply an adequate algorithm to a given problem, and to assess the accuracy of the solution.

---

## Basics

### Definitions 

The __signular value decomposition__ (SVD) of $A\in\mathbb{R}^{m\times n}$ is
$A=U\Sigma V^T$, where $U\in\mathbb{R}^{m\times m}$ is orthogonal, $U^TU=UU^T=I_m$, 
$V\in\mathbb{R}^{n\times n}$ is orthogonal, $V^TV=VV^T=I_n$, and 
$\Sigma \in\mathbb{R}^{m\times n}$ is diagonal with singular values 
$\sigma_1,\ldots,\sigma_{\min\{m,n\}}$ on the diagonal. 

If $m>n$, the __thin SVD__ of $A$ is $A=U_{1:m,1:n} \Sigma_{1:n,1:n} V^T$.

### Facts

1. Algorithms for computing SVD of $A$ are modifications of algorithms for the symmetric eigenvalue decomposition of the matrices $AA^T$, $A^TA$ and 
$\begin{bmatrix} 0 & A\\ A^T & 0 \end{bmatrix}$.

2. Most commonly used approach is the three-step algorithm:
    1. Reduce $A$ to bidiagonal matrix $B$ by orthogonal transformations, $X^TAY=B$.
    2. Compute the SVD of $B$ with QR iterations, $B=W\Sigma Z^T$.
    3. Multiply $U=XW$ and $V=YZ$.

3. If $m\geq n$, the overall operation count for this algorithm is $O(mn^2)$ operations.

4. __Error bounds__: Let $U\Sigma V^T$ and $\tilde U \tilde \Sigma \tilde V^T$ be the
exact and the computed SVDs of $A$, respectively. The algorithms generally
compute the SVD with errors bounded by
$$
|\sigma_i-\tilde \sigma_i|\leq \phi \epsilon\|A\|_2,
\qquad
\|u_i-\tilde u_i\|_2, \| v_i-\tilde v_i\|_2 \leq \psi\epsilon \frac{\|A\|_2}
{\min_{j\neq i} 
|\sigma_i-\tilde \sigma_j|},
$$
where $\epsilon$ is machine precision and $\phi$ and $\psi$
are slowly growing polynomial functions of
$n$ which depend upon the algorithm used (typically $O(n)$ or $O(n^2)$).
These bounds are obtained by combining perturbation bounds with the floating-point error analysis of the algorithms.

##  Bidiagonalization


### Facts

1. The reduction of $A$ to bidiagonal matrix can be performed by applying 
$\min\{m-1,n\}$ Householder reflections $H_L$ from the left and $n-2$ Householder reflections $H_R$ from the right. In the first step, $H_L$ is chosen to annihilate all elements of the first column below the diagonal, and $H_R$ is chosen to annihilate all elements of the first row right of the first super-diagonal. Applying this procedure recursively yields the bidiagonal matrix.

3. $H_L$ and $H_R$ do not depend on the normalization of the respective Householder 
vectors $v_L$ and $v_R$. With the normalization $[v_L]_1=[V_R]_1=1$, the vectors $v_L$ are stored in the lower-triangular part of $A$, and the vectors $v_R$ are stored in the upper-triangular part of $A$ above the super-diagonal. 

4. The matrices $H_L$ and $H_R$ are not formed explicitly - given $v_L$ and $v_R$, $A$ is overwritten with $H_L A H_R$ in $O(mn)$ operations by using matrix-vector multiplication and rank-one updates.

5. Instead of performing rank-one updates, $p$ transformations can be accumulated, and then applied. This __block algorithm__ is rich in matrix-matrix multiplications (roughly one
half of the operations is performed using BLAS 3 routines), but
it requires extra workspace.

6. If the matrices $X$ and $Y$ are needed explicitly, they can be computed from the 
stored Householder vectors.
In order to minimize the operation count, the computation
starts from the smallest matrix and the size is gradually
increased.

7. The backward error bounds for the bidiagonalization are as follows: 
The computed matrix $\tilde B$ is equal to the matrix which
would be obtained by exact bidiagonalization of some perturbed matrix $A+\Delta A$, 
where $\|\Delta A\|_2 \leq \psi \varepsilon \|A\|_2$ and $\psi$ is a
slowly increasing function of $n$.
The computed matrices $\tilde X$ and $\tilde Y$ satisfy $\tilde X=X+\Delta X$ and 
$\tilde Y=Y+\Delta Y$, where
$\|\Delta X \|_2,\|\Delta Y\|_2\leq \phi \varepsilon$ 
and $\phi$ is a slowly increasing function of $n$.

12. The bidiagonal reduction is implemented in the 
[LAPACK](http://www.netlib.org/lapack) subroutine 
[DGEBRD](http://www.netlib.org/lapack/explore-html/dd/d9a/group__double_g_ecomputational.html#ga9c735b94f840f927f8085fd23f3ee2e6).
The computation of $X$ and $Y$ is implemented in the subroutine
[DORGBR](http://www.netlib.org/lapack/lapack-3.1.1/html/dorgtr.f.html), which is not yet wrapped in Julia.

8. Bidiagonalization can also be performed using Givens rotations.
Givens rotations act more selectively than Householder reflectors, and
are useful if $A$ has some special structure, for example, if $A$ is a banded matrix. 
Error bounds for function `myBidiagG()` are the same as above, 
but with slightly different functions $\psi$ and $\phi$.

In [1]:
m=8
n=5
A=map(Float64,rand(-9:9,m,n))

8x5 Array{Float64,2}:
 -9.0  -6.0  -9.0  -8.0  -9.0
  1.0  -4.0  -9.0   2.0   8.0
  2.0  -3.0   2.0   6.0  -2.0
  3.0   9.0   3.0  -9.0  -6.0
  7.0   6.0  -1.0  -6.0  -8.0
  9.0   1.0  -1.0   5.0   0.0
 -6.0   2.0   5.0   4.0  -5.0
  9.0  -2.0   7.0   0.0  -7.0

In [2]:
?LAPACK.gebrd!

```
gebrd!(A) -> (A, d, e, tauq, taup)
```

Reduce `A` in-place to bidiagonal form `A = QBP'`. Returns `A`, containing the bidiagonal matrix `B`; `d`, containing the diagonal elements of `B`; `e`, containing the off-diagonal elements of `B`; `tauq`, containing the elementary reflectors representing `Q`; and `taup`, containing the elementary reflectors representing `P`.


In [3]:
# We need copy()
Out=LAPACK.gebrd!(copy(A))

(
8x5 Array{Float64,2}:
 18.4932      -7.79789       0.431822    0.160875    -0.0931382
 -0.0363726  -16.4195       -5.55048    -0.334302    -0.432763 
 -0.0727451   -0.000921678  16.7671    -11.119        0.48751  
 -0.109118    -0.163374      0.335752  -11.0792      -7.12552  
 -0.254608     0.0751739     0.544518    0.400533     9.13753  
 -0.327353     0.182766      0.10458     0.107658     0.491441 
  0.218235    -0.401176     -0.306529    0.00780384   0.536132 
 -0.327353     0.0519265     0.671579   -0.271868    -0.355809 ,

[18.49324200890693,-16.419500108180916,16.767114699176663,-11.079206500900566,9.137525613844675],[-7.797885453368729,-5.5504772118429,-11.119030325076256,-7.125520306323141,0.0],[1.4866642633922875,1.6268301190913301,1.017736704498735,1.6051494500235832,1.2080563870985905],[1.637966571548049,1.5395958912413656,1.615944291304213,0.0,0.0])

In [4]:
B=Bidiagonal(Out[2],Out[3][1:end-1],true)

5x5 Bidiagonal{Float64}:
 18.4932   -7.79789   0.0        0.0      0.0    
  0.0     -16.4195   -5.55048    0.0      0.0    
  0.0       0.0      16.7671   -11.119    0.0    
  0.0       0.0       0.0      -11.0792  -7.12552
  0.0       0.0       0.0        0.0      9.13753

In [5]:
svdvals(A), svdvals(B)

([23.05690050927775,20.9332244124872,14.552987762575267,12.111135956464679,6.05890977223462],[23.05690050927775,20.9332244124872,14.552987762575267,12.111135956464679,6.05890977223462])

In [6]:
# Extract X
function myBidiagX{T}(H::Array{T})
    m,n=size(H)
    X = eye(T,m,n)
    v=Array(T,m)
    for j = n : -1 : 1
        v[j] = one(T)
        v[j+1:m] = H[j+1:m, j]
        γ = -2 / (v[j:m]⋅v[j:m])
        w = γ * X[j:m, j:n]'*v[j:m]
        X[j:m, j:n] = X[j:m, j:n] + v[j:m]*w'
    end
    X
end

# Extract Y
function myBidiagY{T}(H::Array{T})
    n,m=size(H)
    Y = eye(T,n)
    v=Array(T,n)
    for j = n-2 : -1 : 1
        v[j+1] = one(T)
        v[j+2:n] = H[j+2:n, j]
        γ = -2 / (v[j+1:n]⋅v[j+1:n])
        w = γ * Y[j+1:n, j+1:n]'*v[j+1:n]
        Y[j+1:n, j+1:n] = Y[j+1:n, j+1:n] + v[j+1:n]*w'
    end
    Y
end

myBidiagY (generic function with 1 method)

In [7]:
X=myBidiagX(Out[1])

8x5 Array{Float64,2}:
 -0.486664   -0.434465   -0.671704   -0.0178347  -0.269437 
  0.0540738  -0.611028    0.294346   -0.148881   -0.0275076
  0.108148    0.0331046   0.0308777   0.280724   -0.464869 
  0.162221    0.31319    -0.31251    -0.485002    0.304868 
  0.378517   -0.011677   -0.362864   -0.497537   -0.233032 
  0.486664   -0.155106    0.16278    -0.16509    -0.516462 
 -0.324443    0.557829    0.0570937  -0.0420409  -0.539626 
  0.486664    0.0577478  -0.44959     0.622028    0.0732598

In [8]:
Y=myBidiagY(Out[1]')

5x5 Array{Float64,2}:
 1.0   0.0        0.0        0.0        0.0     
 0.0  -0.637967   0.347683   0.619034   0.298182
 0.0  -0.707311  -0.389459  -0.574597   0.133685
 0.0  -0.263508   0.570624  -0.234905  -0.741466
 0.0   0.152557   0.633898  -0.481098   0.586041

In [9]:
# Orthogonality
norm(X'*X-I), norm(Y'*Y-I)

(6.081307256636832e-16,2.647317538811372e-16)

In [10]:
# Error
X'*A*Y-B

5x5 Array{Float64,2}:
  3.55271e-15  -2.66454e-15   5.55112e-16  -1.88738e-15  -2.22045e-16
  1.11022e-16  -3.55271e-15  -8.88178e-16  -8.88178e-16  -1.77636e-15
  3.55271e-15  -1.33227e-15   0.0          -3.55271e-15  -1.77636e-15
 -8.88178e-16  -1.05471e-15  -2.55351e-15  -1.77636e-15  -3.55271e-15
  9.99201e-16  -1.11022e-15   1.77636e-15   0.0           0.0        

In [11]:
# Bidiagonalization using Givens rotations
function myBidiagG{T}(A::Array{T})
    m,n=size(A)
    X=eye(T,m,m)
    Y=eye(T,n,n)
    for j = 1 : n        
        for i = j+1 : m
            G,r=givens(A,j,i,j)
            A=G*A
            X=G*X
        end
        for i=j+2:n
            G,r=givens(A',j+1,i,j)
            A=A*G'
            Y*=G'
        end
    end
    X',Bidiagonal(diag(A),diag(A,1),true), Y
end

myBidiagG (generic function with 1 method)

In [12]:
X1, B1, Y1 = myBidiagG(A)

(
8x8 Array{Float64,2}:
  0.486664    0.434465   -0.671704   …  -0.0259927  -0.221603   -0.0226884
 -0.0540738   0.611028    0.294346      -0.362203    0.215523    0.580102 
 -0.108148   -0.0331046   0.0308777     -0.648687    0.268326   -0.445592 
 -0.162221   -0.31319    -0.31251       -0.593123   -0.295628    0.102816 
 -0.378517    0.011677   -0.362864       0.274054    0.587808   -0.0494786
 -0.486664    0.155106    0.16278    …   0.14296    -0.63005    -0.0348522
  0.324443   -0.557829    0.0570937      0.0         0.0506539   0.533643 
 -0.486664   -0.0577478  -0.44959        0.0         0.0         0.406702 ,

5x5 Bidiagonal{Float64}:
 diag: -18.4932  16.4195  16.7671  11.0792  9.13753
 super: 7.79789  5.55048  11.119  -7.12552,

5x5 Array{Float64,2}:
 1.0   0.0        0.0        0.0        0.0     
 0.0  -0.637967   0.347683  -0.619034   0.298182
 0.0  -0.707311  -0.389459   0.574597   0.133685
 0.0  -0.263508   0.570624   0.234905  -0.741466
 0.0   0.152557   0.633898   0.481

In [13]:
# Orthogonality
norm(X1'*X1-I), norm(Y1'*Y1-I)

(1.032766086574982e-15,6.521546438536692e-16)

In [14]:
# Error
X1'*A*Y1

8x5 Array{Float64,2}:
 -18.4932        7.79789       1.22125e-15   2.44249e-15  -8.88178e-16
  -7.77156e-16  16.4195        5.55048       1.33227e-15  -2.66454e-15
  -8.88178e-16   1.33227e-15  16.7671       11.119         0.0        
  -8.88178e-16   7.77156e-16  -2.22045e-16  11.0792       -7.12552    
  -2.22045e-16  -1.9984e-15   -8.88178e-16  -8.88178e-16   9.13753    
   2.22045e-16   7.52168e-16   8.28926e-16   1.8577e-16    1.02841e-15
   1.22125e-15   9.68503e-16   3.54737e-16  -1.37317e-15   1.78392e-17
  -1.77636e-15   2.80235e-16   9.57071e-16  -6.21728e-17   1.2162e-15 

In [15]:
# X, Y and B are not unique
B

5x5 Bidiagonal{Float64}:
 18.4932   -7.79789   0.0        0.0      0.0    
  0.0     -16.4195   -5.55048    0.0      0.0    
  0.0       0.0      16.7671   -11.119    0.0    
  0.0       0.0       0.0      -11.0792  -7.12552
  0.0       0.0       0.0        0.0      9.13753

## Bidiagonal QR method

Let $B$ be a real upper-bidiagonal matrix of order $n$ and let 
$B=W\Sigma Z^T$ be its SVD.

All metods for computing the SVD of bidiagonal matrix are derived from the methods 
for computing the EVD of the tridiagonal matrix $T=B^T B$.


### Facts

1. The shift $\mu$ is the eigenvalue of the $2\times 2$ matrix $T_{n-1:n,n-1:n}$ which is closer to $T_{n,n}$. The first Givens rotation from the right is the one which annihilates 
the element $(1,2)$ of the shifted $2\times 2$ matrix $T_{1:2,1:2}-\mu I$. Applying this rotation to $B$ creates the bulge at the element $B_{2,1}$. This bulge is subsequently chased out by applying adequate Givens rotations alternating from the left and from the right.
This is the __Golub-Kahan algorithm__.

2. The computed SVD satisfes error bounds from the Fact 4 above.

3. The special variant of zero-shift QR algorithm (the __Demmel-Kahan algorithm__) computes the singular values with high relative accuracy. 

4. The tridiagonal divide-and-conquer method, bisection and inverse iteration, and MRRR method can also be adapted for bidiagonal matrices. 

5. Zero shift QR algorithm for bidiagonal matrices is implemented in the LAPACK routine 
[DBDSQR](http://www.netlib.org/lapack/explore-html/db/dcc/dbdsqr_8f.html). It is also used in the function `svdvals()`. Divide-and-conquer algorithm for bidiagonal matrices is implemented in the LAPACK routine 
[DBDSDC](http://www.netlib.org/lapack/explore-html/d9/d08/dbdsdc_8f.html). However, this algorithm also calls zero-shift QR to compute singular values.

### Examples

In [16]:
W,σ,Z=svd(B)

(
5x5 Array{Float64,2}:
  0.539491   -0.668158   -0.464621   0.213637   0.0316369
  0.542761   -0.184696    0.693163  -0.427363  -0.0904592
 -0.606223   -0.645776    0.020825  -0.412012  -0.212787 
 -0.214206   -0.31478     0.491036   0.540329   0.567412 
  0.0311227   0.0577809  -0.249201  -0.556786   0.789672 ,

[23.05690050927775,20.933224412487217,14.552987762575265,12.111135956464674,6.058909772234619],
5x5 Array{Float64,2}:
  0.432709   -0.590277  -0.590419   0.326216   0.0965633
 -0.568973    0.393768  -0.533109   0.441838   0.204425 
 -0.571508   -0.468282  -0.240378  -0.374547  -0.505988 
  0.395277    0.509617  -0.389738  -0.116029  -0.647062 
  0.0785325   0.132371  -0.396892  -0.73798    0.523615 )

In [17]:
@which svd(B)

In [18]:
σ1=svdvals(B)

5-element Array{Float64,1}:
 23.0569 
 20.9332 
 14.553  
 12.1111 
  6.05891

In [19]:
@which svdvals(B)

In [20]:
σ-σ1

5-element Array{Float64,1}:
  0.0        
  1.77636e-14
 -1.77636e-15
 -5.32907e-15
 -8.88178e-16

In [23]:
?LAPACK.bdsqr!

```
bdsqr!(uplo, d, e_, Vt, U, C) -> (d, Vt, U, C)
```

Computes the singular value decomposition of a bidiagonal matrix with `d` on the diagonal and `e_` on the off-diagonal. If `uplo = U`, `e_` is the superdiagonal. If `uplo = L`, `e_` is the subdiagonal. Can optionally also compute the product `Q' * C`.

Returns the singular values in `d`, and the matrix `C` overwritten with `Q' * C`.


In [24]:
BV=eye(n)
BU=eye(n)
BC=eye(n)
σ2,Z2,W2,C = LAPACK.bdsqr!('U',copy(B.dv),copy(B.ev),BV,BU,BC)

([23.05690050927775,20.933224412487217,14.552987762575265,12.111135956464674,6.058909772234619],
5x5 Array{Float64,2}:
  0.432709   -0.568973  -0.571508   0.395277   0.0785325
 -0.590277    0.393768  -0.468282   0.509617   0.132371 
 -0.590419   -0.533109  -0.240378  -0.389738  -0.396892 
  0.326216    0.441838  -0.374547  -0.116029  -0.73798  
  0.0965633   0.204425  -0.505988  -0.647062   0.523615 ,

5x5 Array{Float64,2}:
  0.539491   -0.668158   -0.464621   0.213637   0.0316369
  0.542761   -0.184696    0.693163  -0.427363  -0.0904592
 -0.606223   -0.645776    0.020825  -0.412012  -0.212787 
 -0.214206   -0.31478     0.491036   0.540329   0.567412 
  0.0311227   0.0577809  -0.249201  -0.556786   0.789672 ,

5x5 Array{Float64,2}:
  0.539491    0.542761   -0.606223  -0.214206   0.0311227
 -0.668158   -0.184696   -0.645776  -0.31478    0.0577809
 -0.464621    0.693163    0.020825   0.491036  -0.249201 
  0.213637   -0.427363   -0.412012   0.540329  -0.556786 
  0.0316369  -0.0904592  -

In [25]:
W2'*full(B)*Z2'

5x5 Array{Float64,2}:
 23.0569       -1.33227e-15   3.10862e-15  -2.22045e-15  -1.11022e-16
 -6.7446e-15   20.9332       -6.66134e-16  -4.44089e-16   0.0        
  5.55112e-15   3.10862e-15  14.553         0.0           2.22045e-15
 -1.11022e-15   2.22045e-16   2.22045e-15  12.1111        1.86517e-14
 -1.13798e-15  -8.32667e-16   0.0           0.0           6.05891    

In [26]:
?LAPACK.bdsdc!

```
bdsdc!(uplo, compq, d, e_) -> (d, e, u, vt, q, iq)
```

Computes the singular value decomposition of a bidiagonal matrix with `d` on the diagonal and `e_` on the off-diagonal using a divide and conqueq method. If `uplo = U`, `e_` is the superdiagonal. If `uplo = L`, `e_` is the subdiagonal. If `compq = N`, only the singular values are found. If `compq = I`, the singular values and vectors are found. If `compq = P`, the singular values and vectors are found in compact form. Only works for real types.

Returns the singular values in `d`, and if `compq = P`, the compact singular vectors in `iq`.


In [27]:
σ3,ee,W3,Z3,rest=LAPACK.bdsdc!('U','I',copy(B.dv),copy(B.ev))

([23.05690050927775,20.933224412487217,14.552987762575265,12.111135956464674,6.058909772234619],e = 2.7182818284590...,
5x5 Array{Float64,2}:
  0.539491   -0.668158   -0.464621   0.213637   0.0316369
  0.542761   -0.184696    0.693163  -0.427363  -0.0904592
 -0.606223   -0.645776    0.020825  -0.412012  -0.212787 
 -0.214206   -0.31478     0.491036   0.540329   0.567412 
  0.0311227   0.0577809  -0.249201  -0.556786   0.789672 ,

5x5 Array{Float64,2}:
  0.432709   -0.568973  -0.571508   0.395277   0.0785325
 -0.590277    0.393768  -0.468282   0.509617   0.132371 
 -0.590419   -0.533109  -0.240378  -0.389738  -0.396892 
  0.326216    0.441838  -0.374547  -0.116029  -0.73798  
  0.0965633   0.204425  -0.505988  -0.647062   0.523615 ,

[6.9130615359347e-310],[139921923212720])

In [28]:
W3'*full(B)*Z3'

5x5 Array{Float64,2}:
 23.0569       -1.33227e-15   3.10862e-15  -2.22045e-15  -1.11022e-16
 -6.7446e-15   20.9332       -6.66134e-16  -4.44089e-16   0.0        
  5.55112e-15   3.10862e-15  14.553         0.0           2.22045e-15
 -1.11022e-15   2.22045e-16   2.22045e-15  12.1111        1.86517e-14
 -1.13798e-15  -8.32667e-16   0.0           0.0           6.05891    

Functions `svd()`, `LAPACK.bdsqr!()` and `LAPACK.bdsdc!()` use the same algorithm to compute singular values.

In [29]:
[σ3-σ2 σ3-σ]

5x2 Array{Float64,2}:
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0

Let us compute some timings. We observe $O(n^2)$ operations.

In [30]:
n=1000
Abig=Bidiagonal(rand(n),rand(n-1),true)
Bbig=Bidiagonal(rand(2*n),rand(2*n-1),true)
@time svdvals(Abig);
@time svdvals(Bbig);
@time LAPACK.bdsdc!('U','N',copy(Abig.dv),copy(Abig.ev));
@time svd(Abig);
@time svd(Bbig);

  0.036636 seconds (168 allocations: 151.667 KB)
  0.086415 seconds (24 allocations: 282.297 KB)
  0.025992 seconds (23 allocations: 141.641 KB)
  0.151774 seconds (33 allocations: 45.884 MB, 3.92% gc time)
  0.543586 seconds (33 allocations: 183.320 MB, 2.35% gc time)


## QR method

Final algorithm is obtained by combining bidiagonalization and bidiagonal SVD methods.
Standard method is implemented in the LAPACK routine 
[DGESVD](http://www.netlib.org/lapack/explore-html/d8/d2d/dgesvd_8f.html).
Divide-and-conquer method is implemented in the LAPACK routine 
[DGESDD](http://www.netlib.org/lapack/explore-html/db/db4/dgesdd_8f.html).

The functions `svd()`, `svdvals()`, and `svdvecs()`  use `DGESDD`.
Wrappers for `DGESVD` and `DGESDD` give more control about output of eigenvectors.

In [31]:
# The built-in algorithm
U,σA,V=svd(A)

(
8x5 Array{Float64,2}:
 -0.0957244   0.829229   -0.0306424   0.498836   -0.0560516 
 -0.449874   -0.0680817  -0.508787    0.0862792  -0.111848  
 -0.0170071  -0.213541    0.227033    0.40675    -0.213951  
  0.560334    0.205861   -0.178915   -0.402239    0.00884894
  0.517168    0.126727   -0.377755    0.0962755  -0.376083  
  0.098974   -0.379515   -0.282601    0.301544   -0.506719  
  0.0853335   0.0589339   0.65243    -0.0534916  -0.522856  
  0.435484   -0.237068    0.0917332   0.559837    0.516637  ,

[23.05690050927775,20.933224412487217,14.552987762575265,12.111135956464674,6.058909772234619],
5x5 Array{Float64,2}:
  0.432709  -0.590277   -0.590419   0.326216   0.0965633
  0.432389  -0.0590841  -0.103076  -0.70398   -0.55076  
  0.408393  -0.371268    0.641575  -0.198633   0.494269 
 -0.327269  -0.588834    0.389146   0.244289  -0.578842 
 -0.593222  -0.404372   -0.278798  -0.546684   0.328602 )

In [32]:
# With our building blocks
U1=X*W
V1=Y*Z
U1'*A*V1

5x5 Array{Float64,2}:
 23.0569       -4.44089e-15   3.10862e-15  -4.44089e-15  -8.88178e-16
 -8.88178e-15  20.9332        0.0          -4.44089e-15  -1.33227e-15
  7.54952e-15   3.33067e-15  14.553         2.22045e-15   2.88658e-15
 -8.88178e-16   3.10862e-15   4.44089e-15  12.1111        1.64313e-14
 -2.22045e-16  -9.99201e-16   8.88178e-16   1.9984e-15    6.05891    

In [33]:
?LAPACK.gesvd!

```
gesvd!(jobu, jobvt, A) -> (U, S, VT)
```

Finds the singular value decomposition of `A`, `A = U * S * V'`. If `jobu = A`, all the columns of `U` are computed. If `jobvt = A` all the rows of `V'` are computed. If `jobu = N`, no columns of `U` are computed. If `jobvt = N` no rows of `V'` are computed. If `jobu = O`, `A` is overwritten with the columns of (thin) `U`. If `jobvt = O`, `A` is overwritten with the rows of (thin) `V'`. If `jobu = S`, the columns of (thin) `U` are computed and returned separately. If `jobvt = S` the rows of (thin) `V'` are computed and returned separately. `jobu` and `jobvt` can't both be `O`.

Returns `U`, `S`, and `Vt`, where `S` are the singular values of `A`.


In [34]:
# DGESVD
LAPACK.gesvd!('A','A',copy(A))

(
8x8 Array{Float64,2}:
 -0.0957244  -0.829229   -0.0306424   0.498836   …   0.00637886  -0.000941379
 -0.449874    0.0680817  -0.508787    0.0862792      0.64812     -0.179225   
 -0.0170071   0.213541    0.227033    0.40675       -0.219861    -0.789206   
  0.560334   -0.205861   -0.178915   -0.402239       0.322832    -0.479953   
  0.517168   -0.126727   -0.377755    0.0962755     -0.1893       0.158474   
  0.098974    0.379515   -0.282601    0.301544   …  -0.027736     0.209763   
  0.0853335  -0.0589339   0.65243    -0.0534916      0.49558      0.167152   
  0.435484    0.237068    0.0917332   0.559837       0.380969     0.13275    ,

[23.05690050927775,20.933224412487206,14.552987762575265,12.11113595646468,6.058909772234615],
5x5 Array{Float64,2}:
  0.432709    0.432389    0.408393  -0.327269  -0.593222
  0.590277    0.0590841   0.371268   0.588834   0.404372
 -0.590419   -0.103076    0.641575   0.389146  -0.278798
  0.326216   -0.70398    -0.198633   0.244289  -0.546684
  0.0

In [35]:
?LAPACK.gesdd!

```
gesdd!(job, A) -> (U, S, VT)
```

Finds the singular value decomposition of `A`, `A = U * S * V'`, using a divide and conquer approach. If `job = A`, all the columns of `U` and the rows of `V'` are computed. If `job = N`, no columns of `U` or rows of `V'` are computed. If `job = O`, `A` is overwritten with the columns of (thin) `U` and the rows of (thin) `V'`. If `job = S`, the columns of (thin) `U` and the rows of (thin) `V'` are computed and returned separately.


In [36]:
LAPACK.gesdd!('N',copy(A))

(8x0 Array{Float64,2},[23.05690050927775,20.9332244124872,14.552987762575267,12.111135956464679,6.05890977223462],5x0 Array{Float64,2})

Let us perform some timings. We observe $O(n^3)$ operations.

In [37]:
n=1000
Abig=rand(n,n)
Bbig=rand(2*n,2*n)
@time Ubig,σbig,Vbig=svd(Abig);
@time svd(Bbig);
@time LAPACK.gesvd!('A','A',copy(Abig));
@time LAPACK.gesdd!('A',copy(Abig));
@time LAPACK.gesdd!('A',copy(Bbig));

  0.561084 seconds (41 allocations: 53.529 MB, 0.38% gc time)
  6.330250 seconds (35 allocations: 213.868 MB, 1.13% gc time)
  8.577787 seconds (24 allocations: 23.408 MB)
  0.533864 seconds (26 allocations: 45.899 MB)
  6.277768 seconds (26 allocations: 183.350 MB, 1.00% gc time)


In [38]:
# Residual
norm(Abig*Vbig-Ubig*diagm(σbig))

6.275333151332255e-13