# Singular Value Decomposition - Perturbation Theory

---

## Prerequisites

The reader should be familiar with eigenvalue decomposition, singular value decompostion, and perturbation theory for eigenvalue decomposition.

## Competences 

The reader should be able to understand and check the facts about perturbations of singular values and vectors.

---

## Peturbation bounds

For more details and the proofs of the Facts below, see 
[R.-C. Li, Matrix Perturbation Theory][Hog14], and the references therein.

[Hog14]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 21.6-21.8 and 21.16-21.18, CRC Press, Boca Raton, 2014."

### Definitions
Let $A\in\mathbb{C}^{m\times n}$ and let $A=U\Sigma V^*$ be its SVD.

The set of $A$'s singular values is $sv(B)=\{\sigma_1,\sigma_2,\ldots)$, with 
$\sigma_1\geq \sigma_2\geq \cdots\geq 0$, and let 
$sv_{ext}(B)=sv(B)$ unless $m>n$ for which $sv_{ext}(B)=sv(B)\cup \{0,\ldots,0\}$ (additional $|m-n|$ zeros).

Triplet $(u,\sigma,v)\in\times\mathbb{C}^{m}\times\mathbb{R}\times\mathbb{C}^{n}$ is a __singular triplet__ of $A$ if $\|u\|_2=1$, $\|v\|_2=1$, $\sigma\geq 0$, and $Av=\sigma u$ and $A^*u=\sigma v$.

$\tilde A=A+\Delta A$ is a __perturbed matrix__, where $\Delta A$ is __perturbation__.
_The same notation is adopted to $\tilde A$, except all symbols are with tildes._

__Spectral condition number__ of $A$ is $\kappa_2(A)=\sigma_{\max}(A)/ \sigma_{\min}(A)$.

Let $X,Y\in\mathbb{C}^{n\times k}$ with $\mathop{\mathrm{rank}}(X)=\mathop{\mathrm{rank}}(Y)=k$. The __canonical angles__ between their column spaces are $\theta_i=\cos^{-1}\sigma_i$, where $\sigma_i$ are the singular values of 
$(Y^*Y)^{-1/2}Y^*X(X^*X)^{-1/2}$. The __canonical angle matrix__ between $X$ and $Y$ is 
$$\Theta(X,Y)=\mathop{\mathrm{diag}}(\theta_1,\theta_2,\ldots,\theta_k).
$$
    

### Facts

1. _(Mirsky)_ $\|\Sigma-\tilde\Sigma\|_2\leq \|\Delta A\|_2$ and 
$\|\Sigma-\tilde\Sigma\|_F\leq \|\Delta A\|_F$.

2.  _(Residual bounds)_ Let $\|\tilde u\|_2=\|\tilde v\|_2=1$ and 
$\tilde \mu=\tilde u^* A \tilde v$. Let residuals $r=A\tilde v-\tilde \mu \tilde u$ and $s=A^*\tilde u - \tilde \mu \tilde v$, and let 
$\varepsilon=\max\{\|r\|_2,\|s\|_2\}$. Then $|\tilde \mu -\mu|\leq \varepsilon$ for some singular value $\mu$ of $A$. 

3. The smallest error matrix $\Delta A$ for which $(\tilde u, \tilde \mu, \tilde v)$ is a singular triplet of $\tilde A$ satisfies $\| \Delta A\|_2=\varepsilon$.

4. Let $\mu$ be the closest singular value in $sv_{ext}(A)$ to $\tilde \mu$ and $(u,\mu,v)$
be the associated singular triplet, and let
$$\eta=\mathop{\mathrm{gap}}(\tilde\mu)= \min_{\mu\neq\sigma\in sv_{ext}(A)}|\tilde\mu-\sigma|.$$
If $\eta>0$, then
\begin{align*}
|\tilde\mu-\mu |&\leq \frac{\varepsilon^2}{\eta},\\
\sqrt{\sin^2\theta(u,\tilde u)+ \sin^2\theta(v,\tilde v)} & \leq 
\frac{\sqrt{\|r\|_2^2 + \|s\|_2^2}}{\eta}.
\end{align*}

5. Let 
$$
A=\begin{bmatrix} M & E \\ F & H \end{bmatrix}, \quad 
\tilde A=\begin{bmatrix} M & 0 \\ 0 & H \end{bmatrix},
$$ 
where $M\in\mathbb{C}^{k\times k}$, and set $\eta=\min |\mu-\nu|$ over all $\mu\in sv(M)$ and $\nu\in sv_{ext}(H)$, and $\varepsilon =\max \{ \|E\|_2,\|F\|_2 \}$. Then
$$ 
\max |\sigma_j -\tilde\sigma_j| \leq \frac{2\varepsilon^2}{\eta+\sqrt{\eta^2+4\varepsilon^2}}.
$$

6. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min|\tilde \mu-\nu|$ over all $\tilde \mu\in sv(\tilde A_1)$ and 
$\nu\in sv_{ext}(A_2)$. If $\eta > 0$, then
$$
\sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2}
\leq \frac{\sqrt{\|R\|_F^2 + \|S\|_F^2 }}{\eta}.
$$


### Example

In [1]:
m=8
n=5
k=min(m,n)
A=rand(-9:9,m,n)

8×5 Array{Int64,2}:
 -3  -1   8  -8   9
  3  -2  -8  -4   1
  9   2  -1   0   2
 -7  -7  -7  -6   2
  6   6   4   5  -4
 -3   6  -3   0   0
  6  -8  -4  -4   5
 -1   3  -6  -1  -6

In [2]:
ΔA=rand(m,n)/100
B=A+ΔA

8×5 Array{Float64,2}:
 -2.99086  -0.995529   8.00744   -7.99853      9.00246   
  3.00777  -1.99162   -7.99317   -3.99099      1.00549   
  9.00878   2.00729   -0.994105   0.00955933   2.00614   
 -6.99801  -6.99132   -7.0       -5.99156      2.00949   
  6.00106   6.00074    4.00222    5.00747     -3.99324   
 -2.99197   6.00039   -2.99567    0.00721638   0.00229287
  6.00176  -7.99428   -3.99393   -3.99898      5.00369   
 -0.99484   3.00301   -5.99195   -0.996826    -5.99034   

In [3]:
U,σ,V=svd(A)
UB,μ,VB=svd(B)

(
[-0.34865 0.735496 … -0.463624 0.339866; -0.255233 -0.380887 … -0.359283 0.0672928; … ; -0.411064 -0.0811639 … 0.160698 -0.0305385; 0.0835215 -0.467984 … -0.305297 0.677471],

[20.9205,16.6371,14.8149,8.14198,3.51658],
[0.27623 -0.0644051 … -0.0686091 0.163659; 0.592144 0.0327687 … -0.770251 -0.106633; … ; 0.549272 -0.105472 … 0.512743 -0.651117; -0.428469 0.471787 … -0.285341 -0.669502])

In [4]:
# Mirsky's Theorems
maxabs(σ-μ), norm(ΔA), vecnorm(σ-μ), vecnorm(ΔA)

(0.0075373382873564765,0.03551088858911506,0.010135443641603063,0.03911839366177477)

In [5]:
# Residual bounds - how close is (x,ζ,y) to (U[:,j],σ[j],V[:,j])
j=rand(2:k-1)
x=round(U[:,j],3)
y=round(V[:,j],3)
x=x/norm(x)
y=y/norm(y)
ζ=(x'*A*y)[]
σ, j, ζ

([20.928,16.6418,14.8116,8.14199,3.52017],2,16.641821209596337)

In [6]:
# Fact 2
r=A*y-ζ*x
s=A'*x-ζ*y
ϵ=max(norm(r),norm(s))

0.01465129735211335

In [7]:
minimum(abs(σ-ζ)), ϵ

(7.3049148134884945e-6,0.01465129735211335)

In [8]:
# Fact 4
η=min(abs(ζ-σ[j-1]),abs(ζ-σ[j+1]))

1.8301838235704118

In [9]:
ζ-σ[j], ϵ^2/η

(-7.3049148134884945e-6,0.00011728904568792083)

In [10]:
# Eigenvector bound
# cos(θ)
cosθU=dot(x,U[:,j])
cosθV=dot(y,V[:,j])
# Bound
sqrt(1-cosθU^2+1-cosθV^2), sqrt(norm(r)^2+norm(s)^2)/η

(0.001030338443831892,0.008611871657769964)

In [11]:
# Fact 5 - we create small off-diagonal block perturbation
j=3
M=A[1:j,1:j]
H=A[j+1:m,j+1:n]
B=cat([1,2],M,H)

8×5 Array{Int64,2}:
 -3  -1   8   0   0
  3  -2  -8   0   0
  9   2  -1   0   0
  0   0   0  -6   2
  0   0   0   5  -4
  0   0   0   0   0
  0   0   0  -4   5
  0   0   0  -1  -6

In [12]:
E=rand(size(A[1:j,j+1:n]))/100
F=rand(size(A[j+1:m,1:j]))/100
C=map(Float64,B)
C[1:j,j+1:n]=E
C[j+1:m,1:j]=F
C

8×5 Array{Float64,2}:
 -3.0         -1.0          8.0          0.00159212   0.0060245  
  3.0         -2.0         -8.0          0.00207048   0.00925301 
  9.0          2.0         -1.0          0.0057695    0.000564695
  0.00854628   9.12756e-5   0.00428263  -6.0          2.0        
  0.00211289   0.0064007    0.00326013   5.0         -4.0        
  0.00257978   0.00964825   0.00707589   0.0          0.0        
  0.00474951   0.0063167    0.00107806  -4.0          5.0        
  0.00993646   0.00231309   0.00144315  -1.0         -6.0        

In [13]:
svdvals(M).-svdvals(H)'

3×2 Array{Float64,2}:
  1.95148   7.36944
 -3.47078   1.94718
 -9.16896  -3.75099

In [14]:
ϵ=max(norm(E), norm(F))
β=svdvals(B)
γ=svdvals(C)
η=minabs(svdvals(M).-svdvals(H)')
[β γ], maxabs(β-γ), 2*ϵ^2/(η+sqrt(η^2+4*ϵ^2))

(
[13.1552 13.1552; 11.2038 11.2038; … ; 5.78581 5.78581; 2.03481 2.03484],

2.5766814953698258e-5,0.00017800064849323415)

## Relative perturbation theory

### Definitions

Matrix $A\in\mathbb{C}^{m\times n}$ is __multiplicatively pertubed__ to $\tilde A$ if
$\tilde A=D_L^* A D_R$ for some $D_L\in\mathbb{C}^{m\times m}$ and 
$D_R\in\mathbb{C}^{n\times n}$. 

Matrix $A$ is (highly) __graded__ if it can be scaled as $A=GS$ such that $G$ is _well-behaved_ (that is, $\kappa_2(G)$ is of modest magnitude), where the __scaling matrix__ $S$ is often diagonal. Interesting cases are when $\kappa_2(G)\ll \kappa_2(A)$.

__Relative distances__ between two complex numbers $\alpha$ and $\tilde \alpha$ are:
\begin{align*}
\zeta(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}{\sqrt{|\alpha\tilde \alpha|}}, \quad \textrm{for } \alpha\tilde\alpha\neq 0,\\
\varrho(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}
{\sqrt{|\alpha|^2 +  |\tilde \alpha|^2}}, \quad \textrm{for } |\alpha|+|\tilde\alpha|> 0.
\end{align*}

### Facts

1. If $D_L$ and $D_R$ are non-singular and $m\geq n$, then
\begin{align*}
\frac{\sigma_j}{\|D_L^{-1}\|_2\|D_R^{-1}\|_2}& \leq \tilde\sigma_j \leq
\sigma_j \|D_L\|_2\|D_R\|_2, \quad \textrm{for } i=1,\ldots,n, \\
\| \mathop{\mathrm{diag}}(\zeta(\sigma_1,\tilde \sigma_1),\ldots,
\zeta(\sigma_n,\tilde \sigma_n)\|_{2,F} & \leq
\frac{1}{2}\|D_L^*-D_L^{-1}\|_{2,F} + \frac{1}{2}\|D_R^*-D_R^{-1}\|_{2,F}.
\end{align*}

2. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min \varrho(\mu,\tilde \mu)$ over all $\mu\in sv(A_1)$ and 
$\tilde \mu\in sv_{ext}(A_2)$. If $\eta > 0$, then
\begin{align*}
& \sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2} \\
& \leq \frac{1}{\eta}( \|(I-D_L^*)U_1\|_F^2+ \|(I-D_L^{-1})U_1\|_F^2 \\
& \quad +\|(I-D_R^*)V_1\|_F^2+ \|(I-D_R^{-1})V_1\|_F^2 )^{1/2}.
\end{align*}

3. Let $A=GS$ and $\tilde A=\tilde GS$, and let 
$\Delta G=\tilde G-G$. Then $\tilde A=DA$, where $D=I+(\Delta G) G^{\dagger}$, and 
Fact 1 applies with $D_L=D$, $D_R=I$, and 
$$
\|D^*-D^{-1}\|_{2,F} \leq \bigg(1+\frac{1}{1-\|(\Delta G) G^{\dagger}\|_{2}}\bigg)
\frac{\|(\Delta G) G^{\dagger}\|_{2,F}}{2}.
$$
According to the notebook on 
[Jacobi Method and High Relative Accuracy](L4c Symmetric Eigenvalue Decomposition - Jacobi Method and High Relative Accuracy.ipynb), nearly optimal diagonal scaling is such that all columns of $G$ have unit norms, $S=\mathop{\mathrm{diag}} \big( \| A_{:,1}\|_2,\ldots,\|A_{:,n}\|_2 \big)$.

4. Let $A$ be an real upper-bidiagonal matrix with diagonal entries $a_1,a_2,\ldots,a_n$ and 
the super-diagonal entries $b_1,b_2, \ldots,b_{n-1}$. Let the diagonal entries of 
$\tilde A$ be $\alpha_1 a_1,\alpha_2 a_2,\ldots,\alpha_n a_n$, and its super-diagonal entries be
$\beta_1 b_1,\beta_2 b_2,\ldots,\beta_{n-1} b_{n-1}$. Then $\tilde A=D_L^* A D_R$ with 
\begin{align*}
D_L &=\mathop{\mathrm{diag}} \bigg(\alpha_1,\frac{\alpha_1 \alpha_2}{\beta_1},
\frac{\alpha_1 \alpha_2 \alpha_3}{\beta_1 \beta_2},\cdots\bigg),\\
D_R &=\mathop{\mathrm{diag}} \bigg(1, \frac{\beta_1}{\alpha_1},
\frac{\beta_1 \beta_2}{\alpha_1 \alpha_2},\cdots\bigg).
\end{align*}
Let $\alpha=\prod\limits_{j=1}^n \max\{\alpha_j, 1/\alpha_j\}$ and 
$\beta=\prod\limits_{j=1}^{n-1} \max\{\beta_j, 1/\beta_j\}$. Then
$$
(\alpha\beta)^{-1}\leq \big(\| D_L^{-1}\|_2 \|D_R^{-1}\|_2 \leq
\| D_L\|_2 \|D_R\|_2  \leq \alpha\beta,
$$
and Fact 1 applies.
 
5. Consider the block partitioned matrices
\begin{align*}
A & =\begin{bmatrix} B & C \\ 0 & D\end{bmatrix}, \\
\tilde A & =  \begin{bmatrix} B & 0 \\ 0 & D\end{bmatrix}
=A \begin{bmatrix} I & -B^{-1} C \\ 0 & I \end{bmatrix}\equiv A D_R.
\end{align*}
By Fact 1, $\zeta(\sigma_j,\tilde \sigma_j) \leq \frac{1}{2} \|B^{-1}C\|_2$. This is used as a deflation criterion in the SVD algorithm for bidiagonal matrices.

### Example - Bidiagonal matrix

In order to illustrate Facts 1 to 3, we need an algorithm which computes the singular values with high relative acuracy. Such algorithm, the one-sided Jacobi method, is discussed in the following notebook. 

The algorithm actually used in the function `svdvals()` for `Bidiagonal` is the zero-shift bidiagonal QR algorithm, which attains the accuracy given by Fact 4: if all
$1-\varepsilon \leq \alpha_i,\beta_j \leq 1+\varepsilon$, then
$$
(1-\varepsilon)^{2n-1} \leq (\alpha\beta)^{-1} \leq \alpha\beta \leq (1-\varepsilon)^{2n-1}.
$$
In other words, $\varepsilon$ relative changes in diagonal and super-diagonal elements, cause at most $(2n-1)\varepsilon$ relative changes in the singular values.

__However__, if singular values and vectors are desired, the function `svd()` calls the standard algorithm, described in the next notebook, which __does not attain this accuracy__ .

In [15]:
n=50
δ=100000
# The starting matrix
a=exp(50*(rand(n)-0.5))
b=exp(50*(rand(n-1)-0.5))
A=Bidiagonal(a,b, true)
# Multiplicative perturbation
DL=ones(n)+(rand(n)-0.5)/δ
DR=ones(n)+(rand(n)-0.5)/δ
# The perturbed matrix
α=DL.*a.*DR
β=DL[1:end-1].*b.*DR[2:end]
B=Bidiagonal(α,β,true)
(A.dv-B.dv)./A.dv

50-element Array{Float64,1}:
  4.19583e-6
 -2.09241e-6
 -1.39335e-6
 -5.09234e-8
 -1.44286e-6
  1.34773e-6
  2.21681e-6
 -6.43832e-6
  2.44523e-6
 -7.6082e-7 
  5.20543e-7
  4.11967e-6
 -1.5319e-7 
  ⋮         
 -7.72779e-7
 -1.40317e-6
 -3.89109e-6
 -7.25101e-6
  2.79374e-6
 -1.58138e-6
 -1.84248e-6
  2.63186e-6
  8.15129e-6
 -6.1743e-6 
 -3.55404e-6
  2.94443e-6

In [16]:
(a-α)./a, (b-β)./b

([4.19583e-6,-2.09241e-6,-1.39335e-6,-5.09234e-8,-1.44286e-6,1.34773e-6,2.21681e-6,-6.43832e-6,2.44523e-6,-7.6082e-7  …  -3.89109e-6,-7.25101e-6,2.79374e-6,-1.58138e-6,-1.84248e-6,2.63186e-6,8.15129e-6,-6.1743e-6,-3.55404e-6,2.94443e-6],[1.32336e-6,3.29319e-7,-5.89954e-6,5.58612e-6,9.0123e-7,-1.16395e-7,-3.38469e-6,-5.11299e-6,6.07394e-6,1.35835e-6  …  -3.6934e-6,-4.33311e-6,-6.76433e-6,2.44723e-6,2.62953e-7,1.29481e-7,5.62422e-6,2.23457e-6,-9.59318e-6,4.14818e-6])

In [17]:
@which svdvals(A)

In [18]:
σ=svdvals(A)
μ=svdvals(B)
[σ (σ-μ)./σ]

50×2 Array{Float64,2}:
 7.04716e10   -1.40317e-6
 3.52903e10   -1.44278e-6
 2.37267e10    2.44523e-6
 1.3133e10    -1.45838e-6
 1.22502e10    4.26538e-6
 1.20807e10   -4.57784e-7
 5.97298e9    -1.84248e-6
 3.0402e9      1.35835e-6
 2.8041e9      1.10341e-7
 2.62328e9     4.14818e-6
 2.41426e9    -9.59318e-6
 4.97002e8    -7.25101e-6
 2.08498e8     2.44723e-6
 ⋮                       
 0.00038784    5.59419e-6
 0.000137832  -5.89954e-6
 6.08007e-6   -2.10985e-6
 2.33132e-7   -3.51311e-6
 4.41986e-8    2.23511e-6
 1.79833e-10  -1.59863e-6
 1.52181e-11  -4.76893e-6
 4.9063e-17   -3.24597e-6
 1.56358e-26  -4.48218e-7
 1.05726e-33   1.04431e-6
 1.43197e-38  -1.16808e-6
 3.96918e-89  -2.21179e-6

In [19]:
cond(A)

1.775471444108934e99

In [20]:
# The standard algorithm
U,ν,V=svd(A);

In [21]:
(σ-ν)./σ

50-element Array{Float64,1}:
     2.16524e-16
    -2.16189e-16
     0.0        
    -1.45233e-16
     1.55699e-16
     0.0        
    -1.59665e-16
     0.0        
     0.0        
     0.0        
     0.0        
     0.0        
     0.0        
     ⋮          
     6.98871e-16
     4.10611e-13
    -0.528138   
   -29.2046     
    -8.72718    
     0.0        
    -0.680816   
     1.0        
    -1.5429     
 -2654.08       
 -3848.78       
    -9.42275e43 