# Singular Value Decomposition - Perturbation Theory

---

## Prerequisites

The reader should be familiar with eigenvalue decomposition, singular value decompostion, and perturbation theory for eigenvalue decomposition.

## Competences 

The reader should be able to understand and check the facts about perturbations of singular values and vectors.

---

## Peturbation bounds

For more details and the proofs of the Facts below, see 
[R.-C. Li, Matrix Perturbation Theory][Hog14], and the references therein.

[Hog14]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 21.6-21.8 and 21.16-21.18, CRC Press, Boca Raton, 2014."

### Definitions
Let $A\in\mathbb{C}^{m\times n}$ and let $A=U\Sigma V^*$ be its SVD.

The set of $A$'s singular values is $sv(B)=\{\sigma_1,\sigma_2,\ldots)$, with 
$\sigma_1\geq \sigma_2\geq \cdots\geq 0$, and let 
$sv_{ext}(B)=sv(B)$ unless $m>n$ for which $sv_{ext}(B)=sv(B)\cup \{0,\ldots,0\}$ (additional $|m-n|$ zeros).

Triplet $(u,\sigma,v)\in\times\mathbb{C}^{m}\times\mathbb{R}\times\mathbb{C}^{n}$ is a __singular triplet__ of $A$ if $\|u\|_2=1$, $\|v\|_2=1$, $\sigma\geq 0$, and $Av=\sigma u$ and $A^*u=\sigma v$.

$\tilde A=A+\Delta A$ is a __perturbed matrix__, where $\Delta A$ is __perturbation__.
_The same notation is adopted to $\tilde A$, except all symbols are with tildes._

__Spectral condition number__ of $A$ is $\kappa_2(A)=\sigma_{\max}(A)/ \sigma_{\min}(A)$.

Let $X,Y\in\mathbb{C}^{n\times k}$ with $\mathop{\mathrm{rank}}(X)=\mathop{\mathrm{rank}}(Y)=k$. The __canonical angles__ between their column spaces are $\theta_i=\cos^{-1}\sigma_i$, where $\sigma_i$ are the singular values of 
$(Y^*Y)^{-1/2}Y^*X(X^*X)^{-1/2}$. The __canonical angle matrix__ between $X$ and $Y$ is 
$$\Theta(X,Y)=\mathop{\mathrm{diag}}(\theta_1,\theta_2,\ldots,\theta_k).
$$
    

### Facts

1. _(Mirsky)_ $\|\Sigma-\tilde\Sigma\|_2\leq \|\Delta A\|_2$ and 
$\|\Sigma-\tilde\Sigma\|_F\leq \|\Delta A\|_F$.

2.  _(Residual bounds)_ Let $\|\tilde u\|_2=\|\tilde v\|_2=1$ and 
$\tilde \mu=\tilde u^* A \tilde v$. Let residuals $r=A\tilde v-\tilde \mu \tilde u$ and $s=A^*\tilde u - \tilde \mu \tilde v$, and let 
$\varepsilon=\max\{\|r\|_2,\|s\|_2\}$. Then $|\tilde \mu -\mu|\leq \varepsilon$ for some singular value $\mu$ of $A$. 

3. The smallest error matrix $\delta A$ for which $(\tilde u, \tilde \mu, \tilde v)$ is a singular triplet of $\tilde A$ satisfies $\| \Delta A\|_2=\varepsilon$.

4. Let $\mu$ be the closest singular value in $sv_{ext}(A)$ to $\tilde \mu$ and $(u,\mu,v)$
be the associated singular triplet, and let
$$\eta=\mathop{\mathrm{gap}}(\tilde\mu)= \min_{\mu\neq\sigma\in sv_{ext}(A)}|\tilde\mu-\sigma|.$$
If $\eta>0$, then
\begin{align*}
|\tilde\mu-\mu |&\leq \frac{\varepsilon^2}{\eta},\\
\sqrt{\sin^2\theta(u,\tilde u)+ \sin^2\theta(v,\tilde v)} & \leq 
\frac{\sqrt{\|r\|_2^2 + \|s\|_2^2}}{\eta}.
\end{align*}

5. Let 
$$
A=\begin{bmatrix} M & E \\ F & H \end{bmatrix}, \quad 
\tilde A=\begin{bmatrix} M & 0 \\ 0 & H \end{bmatrix},
$$ 
where $M\in\mathbb{C}^{k\times k}$, and set $\eta=\min |\mu-\nu|$ over all $\mu\in sv(M)$ and $\nu\in sv_{ext}(H)$, and $\varepsilon =\max \{ \|E\|_2,\|F\|_2 \}$. Then
$$ 
\max |\sigma_j -\tilde\sigma_j| \leq \frac{2\varepsilon^2}{\eta+\sqrt{\eta^2+4\varepsilon^2}}.
$$

6. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min|\tilde \mu-\nu|$ over all $\tilde \mu\in sv(\tilde A_1)$ and 
$\nu\in sv_{ext}(A_2)$. If $\eta > 0$, then
$$
\sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2}
\leq \frac{\sqrt{\|R\|_F^2 + \|S\|_F^2 }}{\eta}.
$$


### Example

In [1]:
m=8
n=5
k=min(m,n)
A=rand(-9:9,m,n)

8x5 Array{Int64,2}:
 -5   7  -6   1  -6
 -7   7  -4   5   2
 -2   7  -9  -8   6
  1  -9   8  -9   5
 -4  -1   6  -2  -2
  3  -1   7   3  -1
  3   3  -1  -7  -5
  4  -3   1   3   0

In [2]:
ΔA=rand(m,n)/100
B=A+ΔA

8x5 Array{Float64,2}:
 -4.99651   7.00853   -5.99439    1.0043   -5.99049   
 -6.9933    7.00928   -3.99684    5.00413   2.00102   
 -1.99078   7.00601   -8.99964   -7.99108   6.00944   
  1.00361  -8.9954     8.00587   -8.99752   5.00248   
 -3.99393  -0.992812   6.00234   -1.99216  -1.99733   
  3.0097   -0.993469   7.0057     3.00126  -0.996869  
  3.0008    3.00601   -0.992398  -6.99154  -4.99213   
  4.00897  -2.99841    1.00985    3.0081    0.00310363

In [3]:
U,σ,V=svd(A)
UB,μ,VB=svd(B)

(
8x5 Array{Float64,2}:
  0.463878   -0.10798    -0.382581   -0.291121    0.317776 
  0.427491   -0.181731    0.431732   -0.360747   -0.263873 
  0.414048    0.710101    0.216101    0.124978   -0.323277 
 -0.566599    0.49832     0.170319   -0.313457    0.154328 
 -0.151043   -0.0224838  -0.133424   -0.713355   -0.0720899
 -0.248715   -0.284868   -0.0878237  -0.0366098  -0.789693 
  0.0501945   0.297962   -0.751785    0.0520642  -0.264282 
 -0.149686   -0.179754    0.0288935   0.39788     0.0490616,

[23.296723985600188,16.16869058552629,10.848397890709466,9.528613533804307,4.61967879720757],
5x5 Array{Float64,2}:
 -0.313136    0.0187287  -0.298524    0.829526  -0.352654  
  0.654249    0.01248    -0.165929   -0.122319  -0.727535  
 -0.669695   -0.224843   -0.0603209  -0.486479  -0.510544  
  0.13511    -0.903602    0.356012    0.196054  -0.00815807
 -0.0846072   0.363926    0.867737    0.147734  -0.292585  )

In [4]:
# Mirsky's Theorems
maxabs(σ-μ), norm(ΔA), vecnorm(σ-μ), vecnorm(ΔA)

(0.013372622807068524,0.0359353499241745,0.016621804157901463,0.03976948089570283)

In [5]:
# Residual bounds - how close is (x,ζ,y) to (U[:,j],σ[j],V[:,j])
j=rand(2:k-1)
x=round(U[:,j],3)
y=round(V[:,j],3)
x=x/norm(x)
y=y/norm(y)
ζ=(x'*A*y)[]
σ, j, ζ

([23.292468466143806,16.172273554647937,10.856101762031024,9.531289460898613,4.606306174400501],4,9.53128988552817)

In [6]:
# Fact 2
r=A*y-ζ*x
s=A'*x-ζ*y
ϵ=max(norm(r),norm(s))

0.009236418624467026

In [7]:
minimum(abs(σ-ζ)), ϵ

(4.246295564058755e-7,0.009236418624467026)

In [8]:
# Fact 4
η=min(abs(ζ-σ[j-1]),abs(ζ-σ[j+1]))

1.3248118765028547

In [9]:
ζ-σ[j], ϵ^2/η

(4.246295564058755e-7,6.439512697576388e-5)

In [10]:
# Eigenvector bound
# cos(θ)
cosθU=dot(x,U[:,j])
cosθV=dot(y,V[:,j])
# Bound
sqrt(1-cosθU^2+1-cosθV^2), sqrt(norm(r)^2+norm(s)^2)/η

(0.0006863789507620033,0.007841809791517083)

In [11]:
# Fact 5 - we create small off-diagonal block perturbation
j=3
M=A[1:j,1:j]
H=A[j+1:m,j+1:n]
B=cat([1,2],M,H)
E=rand(size(A[1:j,j+1:n]))/100
F=rand(size(A[j+1:m,1:j]))/100
C=map(Float64,B)
C[1:j,j+1:n]=E
C[j+1:m,1:j]=F
C

8x5 Array{Float64,2}:
 -5.0         7.0          -6.0          0.00535031   0.00029281
 -7.0         7.0          -4.0          0.00146746   0.00745401
 -2.0         7.0          -9.0          0.00131762   0.0034951 
  0.00353067  0.000344556   0.00336747  -9.0          5.0       
  0.00575373  0.00183245    0.00666969  -2.0         -2.0       
  0.00128292  0.00936926    0.00168663   3.0         -1.0       
  0.00692802  0.0036979     0.00874577  -7.0         -5.0       
  0.00597927  0.00733959    0.00362242   3.0          0.0       

In [12]:
ϵ=max(norm(E), norm(F))
β=svdvals(B)
γ=svdvals(C)
η=minabs(svdvals(M).-svdvals(H)')
[β γ], maxabs(β-γ), 2*ϵ^2/(η+sqrt(η^2+4*ϵ^2))

(
5x2 Array{Float64,2}:
 18.2486       18.2486   
 12.3624       12.3624   
  7.36016       7.36017  
  4.99903       4.99903  
  7.36571e-16   0.0157183,

0.015718303207464056,0.0001513156165687041)

## Relative perturbation theory

### Definitions

Matrix $A\in\mathbb{C}^{m\times n}$ is __multiplicatively pertubed__ to $\tilde A$ if
$\tilde A=D_L^* A D_R$ for some $D_L\in\mathbb{C}^{m\times m}$ and 
$D_R\in\mathbb{C}^{n\times n}$. 

Matrix $A$ is (highly) __graded__ if it can be scaled as $A=GS$ such that $G$ is _well-behaved_ (that is, $\kappa_2(G)$ is of modest magnitude), where the __scaling matrix__ $S$ is often diagonal. Interesting cases are when $\kappa_2(G)\ll \kappa_2(A)$.

__Relative distances__ between two complex numbers $\alpha$ and $\tilde \alpha$ are:
\begin{align*}
\zeta(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}{\sqrt{|\alpha\tilde \alpha|}}, \quad \textrm{for } \alpha\tilde\alpha\neq 0,\\
\varrho(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}
{\sqrt{|\alpha|^2 +  |\tilde \alpha|^2}}, \quad \textrm{for } |\alpha|+|\tilde\alpha|> 0.
\end{align*}

### Facts

1. If $D_L$ and $D_R$ are non-singular and $m\geq n$, then
\begin{align*}
\frac{\sigma_j}{\|D_L^{-1}\|_2\|D_R^{-1}\|_2}& \leq \tilde\sigma_j \leq
\sigma_j \|D_L\|_2\|D_R\|_2, \quad \textrm{for } i=1,\ldots,n, \\
\| \mathop{\mathrm{diag}}(\zeta(\sigma_1,\tilde \sigma_1),\ldots,
\zeta(\sigma_n,\tilde \sigma_n)\|_{2,F} & \leq
\frac{1}{2}\|D_L^*-D_L^{-1}\|_{2,F} + \frac{1}{2}\|D_R^*-D_R^{-1}\|_{2,F}.
\end{align*}

2. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min \varrho(\mu,\tilde \mu)$ over all $\mu\in sv(A_1)$ and 
$\tilde \mu\in sv_{ext}(A_2)$. If $\eta > 0$, then
\begin{align*}
& \sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2} \\
& \leq \frac{1}{\eta}( \|(I-D_L^*)U_1\|_F^2+ \|(I-D_L^{-1})U_1\|_F^2 \\
& \quad +\|(I-D_R^*)V_1\|_F^2+ \|(I-D_R^{-1})V_1\|_F^2 )^{1/2}.
\end{align*}

3. Let $A=GS$ and $\tilde A=\tilde GS$, where $\mathop{\mathrm{G}}=n$, and let 
$\Delta G=\tilde G-G$. Then $\tilde A=DA$, where $D=I+(\Delta G) G^{\dagger}$, and 
Fact 1 applies with $D_=D$, $D_R=I$, and 
$$
\|D^*-D^{-1}\|_{2,F} \leq \bigg(1+\frac{1}{1-\|(\Delta G) G^{\dagger}\|_{2}}\bigg)
\frac{\|(\Delta G) G^{\dagger}\|_{2,F}}{2}.
$$
According to the notebook on 
[Jacobi Method and High Relative Accuracy](L4c Symmetric Eigenvalue Decomposition - Jacobi Method and High Relative Accuracy.ipynb), nearly optimal diagonal scaling is such that all columns of $G$ have unit norms, $S=\mathop{\mathrm{diag}} \big( \| A_{:,1}\|_2,\ldots,\|A_{:,n}\|_2 \big)$.

4. Let $A$ be an real upper-bidiagonal matrix with diagonal entries $a_1,a_2,\ldots,a_n$ and 
the super-diagonal entries $b_1,b_2, \ldots,b_{n-1}$. Let the diagonal entries of 
$\tilde A$ be $\alpha_1 a_1,\alpha_2 a_2,\ldots,\alpha_n a_n$, and its super-diagonal entries be
$\beta_1 b_1,\beta_2 b_2,\ldots,\beta_{n-1} b_{n-1}$. Then $\tilde A=D_L^* A D_R$ with 
\begin{align*}
D_L &=\mathop{\mathrm{diag}} \bigg(\alpha_1,\frac{\alpha_1 \alpha_2}{\beta_1},
\frac{\alpha_1 \alpha_2 \alpha_3}{\beta_1 \beta_2},\cdots\bigg),\\
D_R &=\mathop{\mathrm{diag}} \bigg(1, \frac{\beta_1}{\alpha_1},
\frac{\beta_1 \beta_2}{\alpha_1 \alpha_2},\cdots\bigg).
\end{align*}
Let $\alpha=\prod\limits_{j=1}^n \max\{\alpha_j, 1/\alpha_j\}$ and 
$\beta=\prod\limits_{j=1}^{n-1} \max\{\beta_j, 1/\beta_j\}$. Then
$$
(\alpha\beta)^{-1}\leq \big(\| D_L^{-1}\|_2 \|D_R^{-1}\|_2 \leq
\| D_L\|_2 \|D_R\|_2  \leq \alpha\beta,
$$
and Fact 1 applies.
 
5. Consider the block partitioned matrices
\begin{align*}
A & =\begin{bmatrix} B & C \\ 0 & D\end{bmatrix}, \\
\tilde A & =  \begin{bmatrix} B & 0 \\ 0 & D\end{bmatrix}
=A \begin{bmatrix} I & -B^{-1} C \\ 0 & I \end{bmatrix}\equiv A D_R.
\end{align*}
By Fact 1, $\zeta(\sigma_j,\tilde \sigma_j) \leq \frac{1}{2} \|B^{-1}C\|_2$. This is used as a deflation criterion in the SVD algorithm for bidiagonal matrices.

### Example - Bidiagonal matrix

In oreder to illustrate Facts 1 to 3, we need an algorithm which comutes the singulačr valoues with high relative acuracy. Such algorithm, the one-sided Jacobi method, is discussed in the following notebook. 

The algorithm actually used in the function `svdvals()` for `Bidiagonal` is the zero-shift bidiagonal QR algorithm, which attains the accuracy given by Fact 4: if all
$1-\varepsilon \leq \alpha_i,\beta_j \leq 1+\varepsilon$, then
$$
(1-\varepsilon)^{2n-1} \leq (\alpha\beta)^{-1} \leq \alpha\beta \leq (1-\varepsilon)^{2n-1}.
$$
In other words, $\varepsilon$ relative changes in diagonal and super-diagonal elements, cause at most $(2n-1)\varepsilon$ relative changes in the singular values.

__However__, if singular values and vectors are desired, the function `svd()` calls the standard algorithm, described in the next notebook, which __does not attain this accuracy__ .

In [13]:
n=50
δ=100000
# The starting matrix
a=exp(50*(rand(n)-0.5))
b=exp(50*(rand(n-1)-0.5))
A=Bidiagonal(a,b, true)
# Multiplicative perturbation
DL=ones(n)+(rand(n)-0.5)/δ
DR=ones(n)+(rand(n)-0.5)/δ
# The perturbed matrix
α=DL.*a.*DR
β=DL[1:end-1].*b.*DR[2:end]
B=Bidiagonal(α,β,true)

50x50 Bidiagonal{Float64}:
 2.08477e-9  3979.12       …     0.0       0.0         0.0       
 0.0            0.0160994        0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0        …     0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0        …     0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 0.0            0.0              0.0       0.0         0.0       
 ⋮                         ⋱                                     
 0.0            0.0              0.0       0.0   

In [14]:
(a-α)./a, (b-β)./b

([4.10308e-6,-7.74756e-6,-4.14313e-7,1.26467e-6,2.91084e-6,5.7071e-7,-1.2077e-6,7.24296e-7,-4.94908e-6,-4.68213e-6  …  4.44446e-6,8.79142e-7,4.93265e-6,-8.27193e-6,-7.28262e-7,-8.85983e-6,3.66958e-7,-1.87447e-7,-9.24902e-6,5.01701e-6],[1.67877e-6,-4.73666e-6,-1.79523e-6,1.87202e-6,2.45085e-6,9.50387e-7,-3.82971e-6,3.27215e-6,-8.44488e-6,-2.76204e-6  …  4.81754e-6,1.71329e-6,7.2777e-6,-2.90432e-6,-8.70757e-6,-1.25286e-6,-7.17766e-6,1.83454e-6,-3.41908e-6,-2.26216e-6])

In [15]:
@which svdvals(A)

In [16]:
σ=svdvals(A)
μ=svdvals(B)
[σ (σ-μ)./σ]

50x2 Array{Float64,2}:
 6.62726e10    -7.17766e-6
 5.10335e10     1.15467e-6
 4.36214e10    -3.80057e-6
 4.05512e10    -8.70757e-6
 3.44475e10    -7.13151e-6
 1.53257e10    -2.0363e-6 
 3.9942e9       2.08668e-6
 3.9053e9      -1.87447e-7
 3.89047e9      5.12241e-6
 2.73723e9      2.40934e-7
 2.19498e9      4.81754e-6
 2.01868e9     -2.26216e-6
 1.20977e9     -1.25286e-6
 ⋮                        
 0.0030481     -4.90262e-6
 0.000173653    6.43197e-7
 8.3336e-5     -6.49589e-6
 9.51231e-6    -3.95159e-6
 4.93278e-6     6.53628e-6
 1.19055e-8    -1.39709e-6
 8.16637e-9    -9.5009e-6 
 2.58052e-11    5.59597e-6
 8.06225e-12    5.24998e-6
 1.96207e-15    3.34629e-6
 1.07145e-18    2.05903e-6
 6.10009e-122   1.50271e-6

In [17]:
cond(A)

1.0864191503808956e132

In [18]:
# The standard algorithm
U,ν,V=svd(A);

In [19]:
(σ-ν)./σ

50-element Array{Float64,1}:
 -1.15121e-16
 -1.49498e-16
  1.749e-16  
 -1.88142e-16
  0.0        
 -1.24454e-16
  0.0        
  0.0        
  0.0        
 -1.74204e-16
  0.0        
  0.0        
  0.0        
  ⋮          
 -2.75535e-7 
  0.0        
  0.0        
 -1.84806e-12
 -0.342439   
  1.38957e-16
  0.999008   
  0.70221    
  0.997497   
  0.999982   
  1.0        
 -5.0563e60  