# Singular Value Decomposition - Algorithms and Error Analysis

---

We study only algorithms for real matrices, which are most commonly used in the applications described in this course. 


For more details, see 
[A. Kaylor Cline and I. Dhillon, Computation of the Singular Value Decomposition][Hog14] and the references therein.

[Hog14]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 58.1-58.13, CRC Press, Boca Raton, 2014."


## Prerequisites

The reader should be familiar with facts about the singular value decomposition and perturbation theory and algorithms for the symmetric eigenvalue decomposition.

 
## Competences 

The reader should be able to apply an adequate algorithm to a given problem, and to assess the accuracy of the solution.

---

## Basics

### Definitions 

The __signular value decomposition__ (SVD) of $A\in\mathbb{R}^{m\times n}$ is
$A=U\Sigma V^T$, where $U\in\mathbb{R}^{m\times m}$ is orthogonal, $U^TU=UU^T=I_m$, 
$V\in\mathbb{R}^{n\times n}$ is orthogonal, $V^TV=VV^T=I_n$, and 
$\Sigma \in\mathbb{R}^{m\times n}$ is diagonal with singular values 
$\sigma_1,\ldots,\sigma_{\min\{m,n\}}$ on the diagonal. 

If $m>n$, the __thin SVD__ of $A$ is $A=U_{1:m,1:n} \Sigma_{1:n,1:n} V^T$.

### Facts

1. Algorithms for computing SVD of $A$ are modifications of algorithms for the symmetric eigenvalue decomposition of the matrices $AA^T$, $A^TA$ and 
$\begin{bmatrix} 0 & A\\ A^T & 0 \end{bmatrix}$.

2. Most commonly used approach is the three-step algorithm:
    1. Reduce $A$ to bidiagonal matrix $B$ by orthogonal transformations, $X^TAY=B$.
    2. Compute the SVD of $B$ with QR iterations, $B=W\Sigma Z^T$.
    3. Multiply $U=XW$ and $V=YZ$.

3. If $m\geq n$, the overall operation count for this algorithm is $O(mn^2)$ operations.

4. __Error bounds__: Let $U\Sigma V^T$ and $\tilde U \tilde \Sigma \tilde V^T$ be the
exact and the computed SVDs of $A$, respectively. The algorithms generally
compute the SVD with errors bounded by
$$
|\sigma_i-\tilde \sigma_i|\leq \phi \epsilon\|A\|_2,
\qquad
\|u_i-\tilde u_i\|_2, \| v_i-\tilde v_i\|_2 \leq \psi\epsilon \frac{\|A\|_2}
{\min_{j\neq i} 
|\sigma_i-\tilde \sigma_j|},
$$
where $\epsilon$ is machine precision and $\phi$ and $\psi$
are slowly growing polynomial functions of
$n$ which depend upon the algorithm used (typically $O(n)$ or $O(n^2)$).
These bounds are obtained by combining perturbation bounds with the floating-point error analysis of the algorithms.

##  Bidiagonalization


### Facts

1. The reduction of $A$ to bidiagonal matrix can be performed by applying 
$\min\{m-1,n\}$ Householder reflections $H_L$ from the left and $n-2$ Householder reflections $H_R$ from the right. In the first step, $H_L$ is chosen to annihilate all elements of the first column below the diagonal, and $H_R$ is chosen to annihilate all elements of the first row right of the first super-diagonal. Applying this procedure recursively yields the bidiagonal matrix.

3. $H_L$ and $H_R$ do not depend on the normalization of the respective Householder 
vectors $v_L$ and $v_R$. With the normalization $[v_L]_1=[V_R]_1=1$, the vectors $v_L$ are stored in the lower-triangular part of $A$, and the vectors $v_R$ are stored in the upper-triangular part of $A$ above the super-diagonal. 

4. The matrices $H_L$ and $H_R$ are not formed explicitly - given $v_L$ and $v_R$, $A$ is overwritten with $H_L A H_R$ in $O(mn)$ operations by using matrix-vector multiplication and rank-one updates.

5. Instead of performing rank-one updates, $p$ transformations can be accumulated, and then applied. This __block algorithm__ is rich in matrix-matrix multiplications (roughly one
half of the operations is performed using BLAS 3 routines), but
it requires extra workspace.

6. If the matrices $X$ and $Y$ are needed explicitly, they can be computed from the 
stored Householder vectors.
In order to minimize the operation count, the computation
starts from the smallest matrix and the size is gradually
increased.

7. The backward error bounds for the bidiagonalization are as follows: 
The computed matrix $\tilde B$ is equal to the matrix which
would be obtained by exact bidiagonalization of some perturbed matrix $A+\Delta A$, 
where $\|\Delta A\|_2 \leq \psi \varepsilon \|A\|_2$ and $\psi$ is a
slowly increasing function of $n$.
The computed matrices $\tilde X$ and $\tilde Y$ satisfy $\tilde X=X+\Delta X$ and 
$\tilde Y=Y+\Delta Y$, where
$\|\Delta X \|_2,\|\Delta Y\|_2\leq \phi \varepsilon$ 
and $\phi$ is a slowly increasing function of $n$.

12. The bidiagonal reduction is implemented in the 
[LAPACK](http://www.netlib.org/lapack) subroutine 
[DGEBRD](http://www.netlib.org/lapack/explore-html/dd/d9a/group__double_g_ecomputational.html#ga9c735b94f840f927f8085fd23f3ee2e6).
The computation of $X$ and $Y$ is implemented in the subroutine
[DORGBR](http://www.netlib.org/lapack/lapack-3.1.1/html/dorgtr.f.html), which is not yet wrapped in Julia.

8. Bidiagonalization can also be performed using Givens rotations.
Givens rotations act more selectively than Householder reflectors, and
are useful if $A$ has some special structure, for example, if $A$ is a banded matrix. 
Error bounds for function `myBidiagG()` are the same as above, 
but with slightly different functions $\psi$ and $\phi$.

In [1]:
m=8
n=5
A=map(Float64,rand(-9:9,m,n))

8×5 Array{Float64,2}:
 -9.0   8.0   5.0   1.0   1.0
  4.0  -7.0   1.0   9.0   7.0
  3.0   4.0   3.0   1.0   6.0
  4.0  -5.0   8.0  -2.0   0.0
 -1.0  -1.0   2.0  -2.0  -8.0
  3.0   2.0  -4.0  -3.0   7.0
  6.0   4.0   6.0   3.0  -9.0
  8.0   8.0   0.0   2.0   4.0

In [2]:
?LAPACK.gebrd!

```
gebrd!(A) -> (A, d, e, tauq, taup)
```

Reduce `A` in-place to bidiagonal form `A = QBP'`. Returns `A`, containing the bidiagonal matrix `B`; `d`, containing the diagonal elements of `B`; `e`, containing the off-diagonal elements of `B`; `tauq`, containing the elementary reflectors representing `Q`; and `taup`, containing the elementary reflectors representing `P`.


In [3]:
# We need copy()
Outg=LAPACK.gebrd!(copy(A))

(
[15.2315 4.63774 … -0.585845 -0.526065; -0.165074 -14.0641 … 0.2933 -0.406048; … ; -0.247611 -0.124113 … -0.0966363 -0.613486; -0.330148 0.0520011 … -0.197481 -0.690267],

[15.2315,-14.0641,-15.0451,-9.25422,14.5361],[4.63774,-5.8756,2.49475,4.43326,1.10367e-314],[1.59088,1.82967,1.00999,1.90166,1.07925],[1.18403,1.59885,1.97532,0.0,0.0])

In [4]:
B=Bidiagonal(Outg[2],Outg[3][1:end-1],true)

5×5 Bidiagonal{Float64}:
 15.2315    4.63774     ⋅        ⋅         ⋅     
   ⋅      -14.0641    -5.8756    ⋅         ⋅     
   ⋅         ⋅       -15.0451   2.49475    ⋅     
   ⋅         ⋅          ⋅      -9.25422   4.43326
   ⋅         ⋅          ⋅        ⋅       14.5361 

In [5]:
svdvals(A), svdvals(B)

([18.6363,15.8358,15.4243,11.3271,8.40835],[18.6363,15.8358,15.4243,11.3271,8.40835])

In [6]:
# Extract X
function myBidiagX{T}(H::Array{T})
    m,n=size(H)
    X = eye(T,m,n)
    v=Array(T,m)
    for j = n : -1 : 1
        v[j] = one(T)
        v[j+1:m] = H[j+1:m, j]
        γ = -2 / (v[j:m]⋅v[j:m])
        w = γ * X[j:m, j:n]'*v[j:m]
        X[j:m, j:n] = X[j:m, j:n] + v[j:m]*w'
    end
    X
end

# Extract Y
function myBidiagY{T}(H::Array{T})
    n,m=size(H)
    Y = eye(T,n)
    v=Array(T,n)
    for j = n-2 : -1 : 1
        v[j+1] = one(T)
        v[j+2:n] = H[j+2:n, j]
        γ = -2 / (v[j+1:n]⋅v[j+1:n])
        w = γ * Y[j+1:n, j+1:n]'*v[j+1:n]
        Y[j+1:n, j+1:n] = Y[j+1:n, j+1:n] + v[j+1:n]*w'
    end
    Y
end

myBidiagY (generic function with 1 method)

In [7]:
X=myBidiagX(Outg[1])

8×5 Array{Float64,2}:
 -0.590879   -0.294496     0.341835     0.0862554   0.578876 
  0.262613   -0.781053     0.207509     0.116875   -0.403696 
  0.19696    -0.264195    -0.00894177  -0.117724    0.452022 
  0.262613   -0.057339     0.236129    -0.873003    0.138908 
 -0.0656532   0.373927     0.243962    -0.0978046  -0.128991 
  0.19696     0.0176408   -0.553621    -0.0846847   0.238308 
  0.393919    0.300006     0.642071     0.242832    0.0855278
  0.525226    0.00208253  -0.0773515    0.356655    0.444486 

In [8]:
Y=myBidiagY(Outg[1]')

5×5 Array{Float64,2}:
 1.0   0.0        0.0        0.0        0.0     
 0.0  -0.184032  -0.107414  -0.381629   0.899419
 0.0   0.311439  -0.570596   0.7011     0.293062
 0.0   0.693659  -0.406015  -0.575551  -0.150767
 0.0   0.622877   0.705715   0.17765    0.287107

In [9]:
# Orthogonality
norm(X'*X-I), norm(Y'*Y-I)

(6.894577218787718e-16,5.295773760145473e-16)

In [10]:
# Error
X'*A*Y-B

5×5 Array{Float64,2}:
  1.77636e-15   8.88178e-16   9.93881e-16   8.7944e-16    6.47863e-16
 -3.33067e-16   3.55271e-15   8.88178e-16  -8.29334e-16  -9.84844e-16
  6.66134e-16   1.27049e-15   1.77636e-15  -1.33227e-15   1.68069e-15
 -4.44089e-16  -6.21219e-17  -5.96347e-16  -5.32907e-15   1.77636e-15
 -4.44089e-16   6.88488e-16   4.57047e-16  -2.61451e-15  -5.32907e-15

In [11]:
# Bidiagonalization using Givens rotations
function myBidiagG{T}(A::Array{T})
    m,n=size(A)
    X=eye(T,m,m)
    Y=eye(T,n,n)
    for j = 1 : n        
        for i = j+1 : m
            G,r=givens(A,j,i,j)
            A=G*A
            X=G*X
        end
        for i=j+2:n
            G,r=givens(A',j+1,i,j)
            A=A*G'
            Y*=G'
        end
    end
    X',Bidiagonal(diag(A),diag(A,1),true), Y
end

myBidiagG (generic function with 1 method)

In [12]:
X1, B1, Y1 = myBidiagG(A)

(
[0.590879 0.294496 … 0.245742 0.0786568; -0.262613 0.781053 … 0.06033 0.027902; … ; -0.393919 -0.300006 … 0.271003 -0.450382; -0.525226 -0.00208253 … 0.0 0.6272],

5×5 Bidiagonal{Float64}:
 diag: -15.231546211727817  -14.064081674194052  …  14.536097745085854
 super: 4.637738747456047  5.875601496038507  …  -4.4332641240721875,

[1.0 0.0 … 0.0 -0.0; 0.0 0.184032 … 0.381629 -0.899419; … ; 0.0 -0.693659 … 0.575551 0.150767; 0.0 -0.622877 … -0.17765 -0.287107])

In [13]:
# Orthogonality
norm(X1'*X1-I), norm(Y1'*Y1-I)

(9.312250125178711e-16,5.300654330857523e-16)

In [14]:
# Error
X1'*A*Y1

8×5 Array{Float64,2}:
 -15.2315         4.63774       5.43041e-16  -7.74829e-16   1.65112e-15
   5.55112e-17  -14.0641        5.8756       -6.86008e-16  -4.94186e-16
  -4.44089e-16    3.36054e-16  15.0451        2.49475      -1.73562e-15
   8.88178e-16   -3.69143e-16   1.90666e-15   9.25422      -4.43326    
  -8.88178e-16    1.26514e-15  -2.6734e-16    7.79888e-16  14.5361     
   0.0           -1.40978e-15   2.73971e-16   4.56551e-16  -1.27461e-16
   1.44329e-15   -1.74219e-15   6.52064e-16  -1.1761e-15    3.01664e-15
   0.0            1.62824e-15  -1.55392e-15   1.35165e-15   1.09225e-15

In [15]:
# X, Y and B are not unique
B

5×5 Bidiagonal{Float64}:
 15.2315    4.63774     ⋅        ⋅         ⋅     
   ⋅      -14.0641    -5.8756    ⋅         ⋅     
   ⋅         ⋅       -15.0451   2.49475    ⋅     
   ⋅         ⋅          ⋅      -9.25422   4.43326
   ⋅         ⋅          ⋅        ⋅       14.5361 

In [16]:
B1

5×5 Bidiagonal{Float64}:
 -15.2315    4.63774    ⋅       ⋅         ⋅     
    ⋅      -14.0641    5.8756   ⋅         ⋅     
    ⋅         ⋅       15.0451  2.49475    ⋅     
    ⋅         ⋅         ⋅      9.25422  -4.43326
    ⋅         ⋅         ⋅       ⋅       14.5361 

## Bidiagonal QR method

Let $B$ be a real upper-bidiagonal matrix of order $n$ and let 
$B=W\Sigma Z^T$ be its SVD.

All metods for computing the SVD of bidiagonal matrix are derived from the methods 
for computing the EVD of the tridiagonal matrix $T=B^T B$.


### Facts

1. The shift $\mu$ is the eigenvalue of the $2\times 2$ matrix $T_{n-1:n,n-1:n}$ which is closer to $T_{n,n}$. The first Givens rotation from the right is the one which annihilates 
the element $(1,2)$ of the shifted $2\times 2$ matrix $T_{1:2,1:2}-\mu I$. Applying this rotation to $B$ creates the bulge at the element $B_{2,1}$. This bulge is subsequently chased out by applying adequate Givens rotations alternating from the left and from the right.
This is the __Golub-Kahan algorithm__.

2. The computed SVD satisfes error bounds from the Fact 4 above.

3. The special variant of zero-shift QR algorithm (the __Demmel-Kahan algorithm__) computes the singular values with high relative accuracy. 

4. The tridiagonal divide-and-conquer method, bisection and inverse iteration, and MRRR method can also be adapted for bidiagonal matrices. 

5. Zero shift QR algorithm for bidiagonal matrices is implemented in the LAPACK routine 
[DBDSQR](http://www.netlib.org/lapack/explore-html/db/dcc/dbdsqr_8f.html). It is also used in the function `svdvals()`. Divide-and-conquer algorithm for bidiagonal matrices is implemented in the LAPACK routine 
[DBDSDC](http://www.netlib.org/lapack/explore-html/d9/d08/dbdsdc_8f.html). However, this algorithm also calls zero-shift QR to compute singular values.

### Examples

In [17]:
W,σ,Z=svd(B)

(
[0.478337 0.669608 … 0.368398 -0.0441381; -0.687919 0.0280734 … 0.707168 -0.123706; … ; 0.059159 -0.286539 … 0.177088 0.883832; 0.028029 -0.46776 … -0.137502 -0.4051],

[18.6363,15.8358,15.4243,11.3271,8.40835],
[0.390947 0.644055 … 0.495384 -0.0799553; 0.638181 0.171171 … -0.727207 0.182571; … ; -0.101921 0.246208 … -0.268084 -0.915302; 0.0359352 -0.509584 … -0.107147 -0.234329])

In [18]:
@which svd(B)

In [19]:
σ1=svdvals(B)

5-element Array{Float64,1}:
 18.6363 
 15.8358 
 15.4243 
 11.3271 
  8.40835

In [20]:
@which svdvals(B)

In [21]:
σ-σ1

5-element Array{Float64,1}:
  3.55271e-15
 -3.55271e-15
 -5.32907e-15
  0.0        
 -1.77636e-15

In [22]:
?LAPACK.bdsqr!

```
bdsqr!(uplo, d, e_, Vt, U, C) -> (d, Vt, U, C)
```

Computes the singular value decomposition of a bidiagonal matrix with `d` on the diagonal and `e_` on the off-diagonal. If `uplo = U`, `e_` is the superdiagonal. If `uplo = L`, `e_` is the subdiagonal. Can optionally also compute the product `Q' * C`.

Returns the singular values in `d`, and the matrix `C` overwritten with `Q' * C`.


In [23]:
BV=eye(n)
BU=eye(n)
BC=eye(n)
σ2,Z2,W2,C = LAPACK.bdsqr!('U',copy(B.dv),copy(B.ev),BV,BU,BC)

([18.6363,15.8358,15.4243,11.3271,8.40835],
[0.390947 0.638181 … -0.101921 0.0359352; 0.644055 0.171171 … 0.246208 -0.509584; … ; 0.495384 -0.727207 … -0.268084 -0.107147; -0.0799553 0.182571 … -0.915302 -0.234329],

[0.478337 0.669608 … 0.368398 -0.0441381; -0.687919 0.0280734 … 0.707168 -0.123706; … ; 0.059159 -0.286539 … 0.177088 0.883832; 0.028029 -0.46776 … -0.137502 -0.4051],

[0.478337 -0.687919 … 0.059159 0.028029; 0.669608 0.0280734 … -0.286539 -0.46776; … ; 0.368398 0.707168 … 0.177088 -0.137502; -0.0441381 -0.123706 … 0.883832 -0.4051])

In [24]:
W2'*full(B)*Z2'

5×5 Array{Float64,2}:
 18.6363       -4.85542e-15   7.39047e-16   2.17683e-16  -3.95118e-17
 -2.4911e-15   15.8358        1.05073e-15  -2.09125e-16   6.20298e-16
 -2.05204e-16  -1.24334e-15  15.4243        1.98729e-15  -1.19704e-15
 -8.14261e-16   8.82293e-16   3.78711e-16  11.3271       -5.65926e-16
  2.15022e-16   1.64004e-15   9.54756e-16   1.10004e-16   8.40835    

In [25]:
?LAPACK.bdsdc!

```
bdsdc!(uplo, compq, d, e_) -> (d, e, u, vt, q, iq)
```

Computes the singular value decomposition of a bidiagonal matrix with `d` on the diagonal and `e_` on the off-diagonal using a divide and conqueq method. If `uplo = U`, `e_` is the superdiagonal. If `uplo = L`, `e_` is the subdiagonal. If `compq = N`, only the singular values are found. If `compq = I`, the singular values and vectors are found. If `compq = P`, the singular values and vectors are found in compact form. Only works for real types.

Returns the singular values in `d`, and if `compq = P`, the compact singular vectors in `iq`.


In [26]:
σ3,ee,W3,Z3,rest=LAPACK.bdsdc!('U','I',copy(B.dv),copy(B.ev))

([18.6363,15.8358,15.4243,11.3271,8.40835],e = 2.7182818284590...,
[0.478337 0.669608 … 0.368398 -0.0441381; -0.687919 0.0280734 … 0.707168 -0.123706; … ; 0.059159 -0.286539 … 0.177088 0.883832; 0.028029 -0.46776 … -0.137502 -0.4051],

[0.390947 0.638181 … -0.101921 0.0359352; 0.644055 0.171171 … 0.246208 -0.509584; … ; 0.495384 -0.727207 … -0.268084 -0.107147; -0.0799553 0.182571 … -0.915302 -0.234329],

[0.0],[0])

In [27]:
W3'*full(B)*Z3'

5×5 Array{Float64,2}:
 18.6363       -4.85542e-15   7.39047e-16   2.17683e-16  -3.95118e-17
 -2.4911e-15   15.8358        1.05073e-15  -2.09125e-16   6.20298e-16
 -2.05204e-16  -1.24334e-15  15.4243        1.98729e-15  -1.19704e-15
 -8.14261e-16   8.82293e-16   3.78711e-16  11.3271       -5.65926e-16
  2.15022e-16   1.64004e-15   9.54756e-16   1.10004e-16   8.40835    

Functions `svd()`, `LAPACK.bdsqr!()` and `LAPACK.bdsdc!()` use the same algorithm to compute singular values.

In [28]:
[σ3-σ2 σ3-σ]

5×2 Array{Float64,2}:
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0
 0.0  0.0

Let us compute some timings. We observe $O(n^2)$ operations.

In [29]:
n=1000
Abig=Bidiagonal(rand(n),rand(n-1),true)
Bbig=Bidiagonal(rand(2*n),rand(2*n-1),true)
@time svdvals(Abig);
@time svdvals(Bbig);
@time LAPACK.bdsdc!('U','N',copy(Abig.dv),copy(Abig.ev));
@time svd(Abig);
@time svd(Bbig);

  0.028653 seconds (143 allocations: 165.125 KB)
  0.090468 seconds (19 allocations: 313.844 KB)
  0.029725 seconds (16 allocations: 141.688 KB)
  0.103570 seconds (27 allocations: 45.900 MB, 4.60% gc time)
  0.450427 seconds (27 allocations: 183.351 MB, 39.42% gc time)


## QR method

Final algorithm is obtained by combining bidiagonalization and bidiagonal SVD methods.
Standard method is implemented in the LAPACK routine 
[DGESVD](http://www.netlib.org/lapack/explore-html/d8/d2d/dgesvd_8f.html).
Divide-and-conquer method is implemented in the LAPACK routine 
[DGESDD](http://www.netlib.org/lapack/explore-html/db/db4/dgesdd_8f.html).

The functions `svd()`, `svdvals()`, and `svdvecs()`  use `DGESDD`.
Wrappers for `DGESVD` and `DGESDD` give more control about output of eigenvectors.

In [30]:
# The built-in algorithm
U,σA,V=svd(A)

(
[-0.243969 -0.52852 … -0.681787 -0.0295747; 0.546065 0.413005 … -0.49565 0.39204; … ; -0.349141 0.483599 … 0.0287679 0.249785; 0.325278 0.0029749 … 0.240346 0.0967452],

[18.6363,15.8358,15.4243,11.3271,8.40835],
[0.390947 0.644055 … 0.495384 -0.0799553; -0.116518 -0.531654 … 0.0992321 0.132872; … ; 0.230236 0.250932 … -0.487205 0.794328; 0.851522 -0.338492 … -0.265026 -0.299631])

In [31]:
# With our building blocks
U1=X*W
V1=Y*Z
U1'*A*V1

5×5 Array{Float64,2}:
 18.6363       -3.84783e-15   7.74201e-16   1.23463e-15  -2.37339e-17
  1.69651e-16  15.8358       -1.75722e-15   4.62896e-17  -2.35026e-15
 -3.08733e-15  -2.74897e-15  15.4243       -3.11616e-16  -4.76002e-15
 -8.13619e-16   1.76623e-15   7.19214e-16  11.3271       -7.60204e-16
  1.02094e-15  -1.66047e-15  -2.2886e-15    5.29864e-16   8.40835    

In [32]:
?LAPACK.gesvd!

```
gesvd!(jobu, jobvt, A) -> (U, S, VT)
```

Finds the singular value decomposition of `A`, `A = U * S * V'`. If `jobu = A`, all the columns of `U` are computed. If `jobvt = A` all the rows of `V'` are computed. If `jobu = N`, no columns of `U` are computed. If `jobvt = N` no rows of `V'` are computed. If `jobu = O`, `A` is overwritten with the columns of (thin) `U`. If `jobvt = O`, `A` is overwritten with the rows of (thin) `V'`. If `jobu = S`, the columns of (thin) `U` are computed and returned separately. If `jobvt = S` the rows of (thin) `V'` are computed and returned separately. `jobu` and `jobvt` can't both be `O`.

Returns `U`, `S`, and `Vt`, where `S` are the singular values of `A`.


In [33]:
# DGESVD
LAPACK.gesvd!('A','A',copy(A))

(
[-0.243969 0.52852 … 0.166673 0.20974; 0.546065 -0.413005 … -0.0224094 0.143849; … ; -0.349141 -0.483599 … 0.365209 -0.318246; 0.325278 -0.0029749 … -0.15332 0.560796],

[18.6363,15.8358,15.4243,11.3271,8.40835],
[0.390947 -0.116518 … 0.230236 0.851522; -0.644055 0.531654 … -0.250932 0.338492; … ; 0.495384 0.0992321 … -0.487205 -0.265026; 0.0799553 -0.132872 … -0.794328 0.299631])

In [34]:
?LAPACK.gesdd!

```
gesdd!(job, A) -> (U, S, VT)
```

Finds the singular value decomposition of `A`, `A = U * S * V'`, using a divide and conquer approach. If `job = A`, all the columns of `U` and the rows of `V'` are computed. If `job = N`, no columns of `U` or rows of `V'` are computed. If `job = O`, `A` is overwritten with the columns of (thin) `U` and the rows of (thin) `V'`. If `job = S`, the columns of (thin) `U` and the rows of (thin) `V'` are computed and returned separately.


In [35]:
LAPACK.gesdd!('N',copy(A))

(,[18.6363,15.8358,15.4243,11.3271,8.40835],)

Let us perform some timings. We observe $O(n^3)$ operations.

In [36]:
n=1000
Abig=rand(n,n)
Bbig=rand(2*n,2*n)
@time Ubig,σbig,Vbig=svd(Abig);
@time svd(Bbig);
@time LAPACK.gesvd!('A','A',copy(Abig));
@time LAPACK.gesdd!('A',copy(Abig));
@time LAPACK.gesdd!('A',copy(Bbig));

  1.259852 seconds (30 allocations: 53.529 MB, 0.33% gc time)
  7.907667 seconds (24 allocations: 213.868 MB, 1.76% gc time)
  5.615014 seconds (16 allocations: 23.408 MB, 1.37% gc time)
  0.944407 seconds (18 allocations: 45.899 MB, 0.71% gc time)
  7.598393 seconds (18 allocations: 183.350 MB, 0.34% gc time)


In [37]:
# Residual
norm(Abig*Vbig-Ubig*diagm(σbig))

5.816777316677065e-13