# Singular Value Decomposition - Perturbation Theory

---

## Prerequisites

The reader should be familiar with eigenvalue decomposition, singular value decompostion, and perturbation theory for eigenvalue decomposition.

## Competences 

The reader should be able to understand and check the facts about perturbations of singular values and vectors.

---

## Peturbation bounds

For more details and the proofs of the Facts below, see 
[R.-C. Li, Matrix Perturbation Theory][Hog14], and the references therein.

[Hog14]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 21.6-21.8 and 21.16-21.18, CRC Press, Boca Raton, 2014."

### Definitions
Let $A\in\mathbb{C}^{m\times n}$ and let $A=U\Sigma V^*$ be its SVD.

The set of $A$'s singular values is $sv(B)=\{\sigma_1,\sigma_2,\ldots)$, with 
$\sigma_1\geq \sigma_2\geq \cdots\geq 0$, and let 
$sv_{ext}(B)=sv(B)$ unless $m>n$ for which $sv_{ext}(B)=sv(B)\cup \{0,\ldots,0\}$ (additional $|m-n|$ zeros).

Triplet $(u,\sigma,v)\in\times\mathbb{C}^{m}\times\mathbb{R}\times\mathbb{C}^{n}$ is a __singular triplet__ of $A$ if $\|u\|_2=1$, $\|v\|_2=1$, $\sigma\geq 0$, and $Av=\sigma u$ and $A^*u=\sigma v$.

$\tilde A=A+\Delta A$ is a __perturbed matrix__, where $\Delta A$ is __perturbation__.
_The same notation is adopted to $\tilde A$, except all symbols are with tildes._

__Spectral condition number__ of $A$ is $\kappa_2(A)=\sigma_{\max}(A)/ \sigma_{\min}(A)$.

Let $X,Y\in\mathbb{C}^{n\times k}$ with $\mathop{\mathrm{rank}}(X)=\mathop{\mathrm{rank}}(Y)=k$. The __canonical angles__ between their column spaces are $\theta_i=\cos^{-1}\sigma_i$, where $\sigma_i$ are the singular values of 
$(Y^*Y)^{-1/2}Y^*X(X^*X)^{-1/2}$. The __canonical angle matrix__ between $X$ and $Y$ is 
$$\Theta(X,Y)=\mathop{\mathrm{diag}}(\theta_1,\theta_2,\ldots,\theta_k).
$$
    

### Facts

1. _(Mirsky)_ $\|\Sigma-\tilde\Sigma\|_2\leq \|\Delta A\|_2$ and 
$\|\Sigma-\tilde\Sigma\|_F\leq \|\Delta A\|_F$.

2.  _(Residual bounds)_ Let $\|\tilde u\|_2=\|\tilde v\|_2=1$ and 
$\tilde \mu=\tilde u^* A \tilde v$. Let residuals $r=A\tilde v-\tilde \mu \tilde u$ and $s=A^*\tilde u - \tilde \mu \tilde v$, and let 
$\varepsilon=\max\{\|r\|_2,\|s\|_2\}$. Then $|\tilde \mu -\mu|\leq \varepsilon$ for some singular value $\mu$ of $A$. 

3. The smallest error matrix $\Delta A$ for which $(\tilde u, \tilde \mu, \tilde v)$ is a singular triplet of $\tilde A$ satisfies $\| \Delta A\|_2=\varepsilon$.

4. Let $\mu$ be the closest singular value in $sv_{ext}(A)$ to $\tilde \mu$ and $(u,\mu,v)$
be the associated singular triplet, and let
$$\eta=\mathop{\mathrm{gap}}(\tilde\mu)= \min_{\mu\neq\sigma\in sv_{ext}(A)}|\tilde\mu-\sigma|.$$
If $\eta>0$, then
\begin{align*}
|\tilde\mu-\mu |&\leq \frac{\varepsilon^2}{\eta},\\
\sqrt{\sin^2\theta(u,\tilde u)+ \sin^2\theta(v,\tilde v)} & \leq 
\frac{\sqrt{\|r\|_2^2 + \|s\|_2^2}}{\eta}.
\end{align*}

5. Let 
$$
A=\begin{bmatrix} M & E \\ F & H \end{bmatrix}, \quad 
\tilde A=\begin{bmatrix} M & 0 \\ 0 & H \end{bmatrix},
$$ 
where $M\in\mathbb{C}^{k\times k}$, and set $\eta=\min |\mu-\nu|$ over all $\mu\in sv(M)$ and $\nu\in sv_{ext}(H)$, and $\varepsilon =\max \{ \|E\|_2,\|F\|_2 \}$. Then
$$ 
\max |\sigma_j -\tilde\sigma_j| \leq \frac{2\varepsilon^2}{\eta+\sqrt{\eta^2+4\varepsilon^2}}.
$$

6. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min|\tilde \mu-\nu|$ over all $\tilde \mu\in sv(\tilde A_1)$ and 
$\nu\in sv_{ext}(A_2)$. If $\eta > 0$, then
$$
\sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2}
\leq \frac{\sqrt{\|R\|_F^2 + \|S\|_F^2 }}{\eta}.
$$


### Example

In [1]:
m=8
n=5
k=min(m,n)
A=rand(-9:9,m,n)

8x5 Array{Int64,2}:
  2   2   6  -2   5
  4   4   8  -9   5
 -3   3   6  -9   5
 -3   4   6  -8   8
  2  -5   0   0  -2
 -2  -4  -7  -7   8
  3   9   8   2   1
 -5   0  -5  -7   2

In [2]:
ΔA=rand(m,n)/100
B=A+ΔA

8x5 Array{Float64,2}:
  2.00058   2.0005       6.0047      -1.99218      5.00367
  4.00772   4.00774      8.00807     -8.999        5.00343
 -2.99988   3.00366      6.00738     -8.99619      5.00396
 -2.99219   4.00675      6.00169     -7.99026      8.00219
  2.00056  -4.99949      0.00178869   0.00959485  -1.9976 
 -1.99771  -3.99395     -6.9922      -6.99567      8.0046 
  3.00116   9.00931      8.00103      2.00911      1.00412
 -4.99626   0.00731796  -4.99486     -6.99446      2.00112

In [4]:
U,σ,V=svd(A)
UB,μ,VB=svd(B)

(
8x5 Array{Float64,2}:
  0.295611   0.148122   -0.277612   0.334022    0.324476
  0.531892   0.112603   -0.459605  -0.178585   -0.59823 
  0.485022  -0.101269    0.126796  -0.436214    0.271357
  0.534625  -0.0996925   0.18888    0.0693467   0.453168
 -0.11106   -0.0405325  -0.58006   -0.338042    0.149826
  0.138215  -0.654337   -0.203774   0.598974   -0.139085
  0.252803   0.546959    0.308408   0.328686   -0.293545
  0.115267  -0.464997    0.432242  -0.283491   -0.360388,

[25.05505659439356,19.017697086472655,7.7848822163478415,5.819323566514415,4.1633984829747614],
5x5 Array{Float64,2}:
 -0.0258287   0.343919  -0.684695   0.272128  -0.581543
  0.34339     0.409321   0.67257    0.202264  -0.470403
  0.504376    0.62355   -0.220919  -0.304443   0.464004
 -0.609731    0.490472   0.135578   0.474555   0.37958 
  0.505221   -0.291202  -0.107965   0.753092   0.284866)

In [5]:
# Mirsky's Theorems
maxabs(σ-μ), norm(ΔA), vecnorm(σ-μ), vecnorm(ΔA)

(0.006680552250738714,0.030011731504529342,0.010802022922646285,0.03407314516842115)

In [6]:
# Residual bounds - how close is (x,ζ,y) to (U[:,j],σ[j],V[:,j])
j=rand(2:k-1)
x=round(U[:,j],3)
y=round(V[:,j],3)
x=x/norm(x)
y=y/norm(y)
ζ=(x'*A*y)[]
σ, j, ζ

([25.04837604214282,19.023585936412545,7.781581040692892,5.816379121192239,4.159178548345899],2,19.023579291596192)

In [7]:
# Fact 2
r=A*y-ζ*x
s=A'*x-ζ*y
ϵ=max(norm(r),norm(s))

0.012162012156092849

In [8]:
minimum(abs(σ-ζ)), ϵ

(6.6448163522636605e-6,0.012162012156092849)

In [9]:
# Fact 4
η=min(abs(ζ-σ[j-1]),abs(ζ-σ[j+1]))

6.024796750546628

In [10]:
ζ-σ[j], ϵ^2/η

(-6.6448163522636605e-6,2.4550959278672756e-5)

In [11]:
# Eigenvector bound
# cos(θ)
cosθU=dot(x,U[:,j])
cosθV=dot(y,V[:,j])
# Bound
sqrt(1-cosθU^2+1-cosθV^2), sqrt(norm(r)^2+norm(s)^2)/η

(0.0010194834330840695,0.002614345509743883)

In [12]:
# Fact 5 - we create small off-diagonal block perturbation
j=3
M=A[1:j,1:j]
H=A[j+1:m,j+1:n]
B=cat([1,2],M,H)

8x5 Array{Int64,2}:
  2  2  6   0   0
  4  4  8   0   0
 -3  3  6   0   0
  0  0  0  -8   8
  0  0  0   0  -2
  0  0  0  -7   8
  0  0  0   2   1
  0  0  0  -7   2

In [13]:
E=rand(size(A[1:j,j+1:n]))/100
F=rand(size(A[j+1:m,1:j]))/100
C=map(Float64,B)
C[1:j,j+1:n]=E
C[j+1:m,1:j]=F
C

8x5 Array{Float64,2}:
  2.0          2.0          6.0           0.00209259    0.00503404 
  4.0          4.0          8.0           0.00972093    0.000351275
 -3.0          3.0          6.0           0.000416214   0.00489447 
  0.00484784   0.00767216   0.0024805    -8.0           8.0        
  0.00716453   0.00542464   0.00919852    0.0          -2.0        
  0.00778205   0.000118194  0.00162238   -7.0           8.0        
  0.000228019  0.00187979   0.00259237    2.0           1.0        
  0.00666706   0.00348855   0.000665173  -7.0           2.0        

In [15]:
svdvals(M).-svdvals(H)'

3x2 Array{Float64,2}:
  -3.82116   8.71482 
 -12.0255    0.510504
 -16.0998   -3.5638  

In [16]:
ϵ=max(norm(E), norm(F))
β=svdvals(B)
γ=svdvals(C)
η=minabs(svdvals(M).-svdvals(H)')
[β γ], maxabs(β-γ), 2*ϵ^2/(η+sqrt(η^2+4*ϵ^2))

(
5x2 Array{Float64,2}:
 16.861     16.861   
 13.0399    13.0399  
  4.83555    4.83556 
  4.32504    4.32504 
  0.761242   0.761257,

1.5174660087491354e-5,0.0006222763555939372)

## Relative perturbation theory

### Definitions

Matrix $A\in\mathbb{C}^{m\times n}$ is __multiplicatively pertubed__ to $\tilde A$ if
$\tilde A=D_L^* A D_R$ for some $D_L\in\mathbb{C}^{m\times m}$ and 
$D_R\in\mathbb{C}^{n\times n}$. 

Matrix $A$ is (highly) __graded__ if it can be scaled as $A=GS$ such that $G$ is _well-behaved_ (that is, $\kappa_2(G)$ is of modest magnitude), where the __scaling matrix__ $S$ is often diagonal. Interesting cases are when $\kappa_2(G)\ll \kappa_2(A)$.

__Relative distances__ between two complex numbers $\alpha$ and $\tilde \alpha$ are:
\begin{align*}
\zeta(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}{\sqrt{|\alpha\tilde \alpha|}}, \quad \textrm{for } \alpha\tilde\alpha\neq 0,\\
\varrho(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}
{\sqrt{|\alpha|^2 +  |\tilde \alpha|^2}}, \quad \textrm{for } |\alpha|+|\tilde\alpha|> 0.
\end{align*}

### Facts

1. If $D_L$ and $D_R$ are non-singular and $m\geq n$, then
\begin{align*}
\frac{\sigma_j}{\|D_L^{-1}\|_2\|D_R^{-1}\|_2}& \leq \tilde\sigma_j \leq
\sigma_j \|D_L\|_2\|D_R\|_2, \quad \textrm{for } i=1,\ldots,n, \\
\| \mathop{\mathrm{diag}}(\zeta(\sigma_1,\tilde \sigma_1),\ldots,
\zeta(\sigma_n,\tilde \sigma_n)\|_{2,F} & \leq
\frac{1}{2}\|D_L^*-D_L^{-1}\|_{2,F} + \frac{1}{2}\|D_R^*-D_R^{-1}\|_{2,F}.
\end{align*}

2. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min \varrho(\mu,\tilde \mu)$ over all $\mu\in sv(A_1)$ and 
$\tilde \mu\in sv_{ext}(A_2)$. If $\eta > 0$, then
\begin{align*}
& \sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2} \\
& \leq \frac{1}{\eta}( \|(I-D_L^*)U_1\|_F^2+ \|(I-D_L^{-1})U_1\|_F^2 \\
& \quad +\|(I-D_R^*)V_1\|_F^2+ \|(I-D_R^{-1})V_1\|_F^2 )^{1/2}.
\end{align*}

3. Let $A=GS$ and $\tilde A=\tilde GS$, and let 
$\Delta G=\tilde G-G$. Then $\tilde A=DA$, where $D=I+(\Delta G) G^{\dagger}$, and 
Fact 1 applies with $D_L=D$, $D_R=I$, and 
$$
\|D^*-D^{-1}\|_{2,F} \leq \bigg(1+\frac{1}{1-\|(\Delta G) G^{\dagger}\|_{2}}\bigg)
\frac{\|(\Delta G) G^{\dagger}\|_{2,F}}{2}.
$$
According to the notebook on 
[Jacobi Method and High Relative Accuracy](L4c Symmetric Eigenvalue Decomposition - Jacobi Method and High Relative Accuracy.ipynb), nearly optimal diagonal scaling is such that all columns of $G$ have unit norms, $S=\mathop{\mathrm{diag}} \big( \| A_{:,1}\|_2,\ldots,\|A_{:,n}\|_2 \big)$.

4. Let $A$ be an real upper-bidiagonal matrix with diagonal entries $a_1,a_2,\ldots,a_n$ and 
the super-diagonal entries $b_1,b_2, \ldots,b_{n-1}$. Let the diagonal entries of 
$\tilde A$ be $\alpha_1 a_1,\alpha_2 a_2,\ldots,\alpha_n a_n$, and its super-diagonal entries be
$\beta_1 b_1,\beta_2 b_2,\ldots,\beta_{n-1} b_{n-1}$. Then $\tilde A=D_L^* A D_R$ with 
\begin{align*}
D_L &=\mathop{\mathrm{diag}} \bigg(\alpha_1,\frac{\alpha_1 \alpha_2}{\beta_1},
\frac{\alpha_1 \alpha_2 \alpha_3}{\beta_1 \beta_2},\cdots\bigg),\\
D_R &=\mathop{\mathrm{diag}} \bigg(1, \frac{\beta_1}{\alpha_1},
\frac{\beta_1 \beta_2}{\alpha_1 \alpha_2},\cdots\bigg).
\end{align*}
Let $\alpha=\prod\limits_{j=1}^n \max\{\alpha_j, 1/\alpha_j\}$ and 
$\beta=\prod\limits_{j=1}^{n-1} \max\{\beta_j, 1/\beta_j\}$. Then
$$
(\alpha\beta)^{-1}\leq \big(\| D_L^{-1}\|_2 \|D_R^{-1}\|_2 \leq
\| D_L\|_2 \|D_R\|_2  \leq \alpha\beta,
$$
and Fact 1 applies.
 
5. Consider the block partitioned matrices
\begin{align*}
A & =\begin{bmatrix} B & C \\ 0 & D\end{bmatrix}, \\
\tilde A & =  \begin{bmatrix} B & 0 \\ 0 & D\end{bmatrix}
=A \begin{bmatrix} I & -B^{-1} C \\ 0 & I \end{bmatrix}\equiv A D_R.
\end{align*}
By Fact 1, $\zeta(\sigma_j,\tilde \sigma_j) \leq \frac{1}{2} \|B^{-1}C\|_2$. This is used as a deflation criterion in the SVD algorithm for bidiagonal matrices.

### Example - Bidiagonal matrix

In order to illustrate Facts 1 to 3, we need an algorithm which computes the singular values with high relative acuracy. Such algorithm, the one-sided Jacobi method, is discussed in the following notebook. 

The algorithm actually used in the function `svdvals()` for `Bidiagonal` is the zero-shift bidiagonal QR algorithm, which attains the accuracy given by Fact 4: if all
$1-\varepsilon \leq \alpha_i,\beta_j \leq 1+\varepsilon$, then
$$
(1-\varepsilon)^{2n-1} \leq (\alpha\beta)^{-1} \leq \alpha\beta \leq (1-\varepsilon)^{2n-1}.
$$
In other words, $\varepsilon$ relative changes in diagonal and super-diagonal elements, cause at most $(2n-1)\varepsilon$ relative changes in the singular values.

__However__, if singular values and vectors are desired, the function `svd()` calls the standard algorithm, described in the next notebook, which __does not attain this accuracy__ .

In [21]:
n=50
δ=100000
# The starting matrix
a=exp(50*(rand(n)-0.5))
b=exp(50*(rand(n-1)-0.5))
A=Bidiagonal(a,b, true)
# Multiplicative perturbation
DL=ones(n)+(rand(n)-0.5)/δ
DR=ones(n)+(rand(n)-0.5)/δ
# The perturbed matrix
α=DL.*a.*DR
β=DL[1:end-1].*b.*DR[2:end]
B=Bidiagonal(α,β,true)
(A.dv-B.dv)./A.dv

50-element Array{Float64,1}:
 -4.73991e-6
 -2.5918e-6 
 -2.51681e-7
 -5.06231e-6
  6.46927e-7
 -6.04042e-7
  2.08138e-6
  1.79542e-6
 -3.59043e-6
 -3.40904e-6
 -5.6103e-6 
  7.00961e-6
 -4.18607e-6
  ⋮         
 -9.46796e-6
 -2.47653e-6
 -4.82816e-6
 -6.84491e-6
 -1.06868e-6
  6.2468e-6 
 -2.99296e-6
  7.54174e-6
  5.83866e-6
 -2.47725e-6
 -2.76297e-6
  6.50914e-6

In [22]:
(a-α)./a, (b-β)./b

([-4.73991e-6,-2.5918e-6,-2.51681e-7,-5.06231e-6,6.46927e-7,-6.04042e-7,2.08138e-6,1.79542e-6,-3.59043e-6,-3.40904e-6  …  -4.82816e-6,-6.84491e-6,-1.06868e-6,6.2468e-6,-2.99296e-6,7.54174e-6,5.83866e-6,-2.47725e-6,-2.76297e-6,6.50914e-6],[-5.42114e-7,-7.91381e-6,2.92164e-6,-1.10946e-6,-4.88568e-6,2.15685e-6,3.11763e-6,-3.99115e-7,-9.38052e-7,-8.075e-6  …  -4.44027e-6,-6.4269e-6,-1.06345e-6,5.494e-7,3.22241e-6,-1.13015e-6,5.89168e-6,6.8054e-6,-5.42489e-6,1.14287e-6])

In [23]:
@which svdvals(A)

In [25]:
σ=svdvals(A)
μ=svdvals(B)
[σ (σ-μ)./σ]

50x2 Array{Float64,2}:
 6.34323e10   -2.51681e-7
 5.36533e10   -5.42489e-6
 3.23625e10    1.75479e-7
 2.9451e10    -6.04042e-7
 2.86248e10   -4.53498e-6
 2.26082e10    2.66622e-6
 1.25208e10   -2.39127e-6
 9.88407e9    -3.11189e-7
 9.86353e9    -4.15931e-6
 9.8522e9     -5.06231e-6
 7.64762e9     1.74776e-6
 3.25606e9     1.04312e-8
 3.65121e8     3.80782e-6
 ⋮                       
 1.71773e-6   -4.73991e-6
 8.9644e-7     6.46927e-7
 6.49657e-7    5.84057e-6
 3.67319e-8   -1.35299e-6
 2.49016e-8    1.18976e-6
 5.67467e-9   -4.25005e-6
 3.42868e-9    7.54164e-6
 8.87534e-11  -6.28876e-7
 8.27636e-15  -2.41269e-6
 9.29187e-18   5.50225e-6
 2.57363e-22  -4.09362e-6
 1.48411e-77  -2.29173e-6

In [26]:
cond(A)

4.274097420844119e87

In [27]:
# The standard algorithm
U,ν,V=svd(A);

In [19]:
(σ-ν)./σ

50-element Array{Float64,1}:
 -1.15121e-16
 -1.49498e-16
  1.749e-16  
 -1.88142e-16
  0.0        
 -1.24454e-16
  0.0        
  0.0        
  0.0        
 -1.74204e-16
  0.0        
  0.0        
  0.0        
  ⋮          
 -2.75535e-7 
  0.0        
  0.0        
 -1.84806e-12
 -0.342439   
  1.38957e-16
  0.999008   
  0.70221    
  0.997497   
  0.999982   
  1.0        
 -5.0563e60  