# Singular Value Decomposition - Perturbation Theory

---

## Prerequisites

The reader should be familiar with eigenvalue decomposition, singular value decompostion, and perturbation theory for eigenvalue decomposition.

## Competences 

The reader should be able to understand and check the facts about perturbations of singular values and vectors.

---

## Peturbation bounds

For more details and the proofs of the Facts below, see 
[R.-C. Li, Matrix Perturbation Theory][Hog14], and the references therein.

[Hog14]: #1 "L. Hogben, ed., 'Handbook of Linear Algebra', pp. 21.6-21.8 and 21.16-21.18, CRC Press, Boca Raton, 2014."

### Definitions
Let $A\in\mathbb{C}^{m\times n}$ and let $A=U\Sigma V^*$ be its SVD.

The set of $A$'s singular values is $sv(B)=\{\sigma_1,\sigma_2,\ldots)$, with 
$\sigma_1\geq \sigma_2\geq \cdots\geq 0$, and let 
$sv_{ext}(B)=sv(B)$ unless $m>n$ for which $sv_{ext}(B)=sv(B)\cup \{0,\ldots,0\}$ (additional $|m-n|$ zeros).

Triplet $(u,\sigma,v)\in\times\mathbb{C}^{m}\times\mathbb{R}\times\mathbb{C}^{n}$ is a __singular triplet__ of $A$ if $\|u\|_2=1$, $\|v\|_2=1$, $\sigma\geq 0$, and $Av=\sigma u$ and $A^*u=\sigma v$.

$\tilde A=A+\Delta A$ is a __perturbed matrix__, where $\Delta A$ is __perturbation__.
_The same notation is adopted to $\tilde A$, except all symbols are with tildes._

__Spectral condition number__ of $A$ is $\kappa_2(A)=\sigma_{\max}(A)/ \sigma_{\min}(A)$.

Let $X,Y\in\mathbb{C}^{n\times k}$ with $\mathop{\mathrm{rank}}(X)=\mathop{\mathrm{rank}}(Y)=k$. The __canonical angles__ between their column spaces are $\theta_i=\cos^{-1}\sigma_i$, where $\sigma_i$ are the singular values of 
$(Y^*Y)^{-1/2}Y^*X(X^*X)^{-1/2}$. The __canonical angle matrix__ between $X$ and $Y$ is 
$$\Theta(X,Y)=\mathop{\mathrm{diag}}(\theta_1,\theta_2,\ldots,\theta_k).
$$
    

### Facts

1. _(Mirsky)_ $\|\Sigma-\tilde\Sigma\|_2\leq \|\Delta A\|_2$ and 
$\|\Sigma-\tilde\Sigma\|_F\leq \|\Delta A\|_F$.

2.  _(Residual bounds)_ Let $\|\tilde u\|_2=\|\tilde v\|_2=1$ and 
$\tilde \mu=\tilde u^* A \tilde v$. Let residuals $r=A\tilde v-\tilde \mu \tilde u$ and $s=A^*\tilde u - \tilde \mu \tilde v$, and let 
$\varepsilon=\max\{\|r\|_2,\|s\|_2\}$. Then $|\tilde \mu -\mu|\leq \varepsilon$ for some singular value $\mu$ of $A$. 

3. The smallest error matrix $\Delta A$ for which $(\tilde u, \tilde \mu, \tilde v)$ is a singular triplet of $\tilde A$ satisfies $\| \Delta A\|_2=\varepsilon$.

4. Let $\mu$ be the closest singular value in $sv_{ext}(A)$ to $\tilde \mu$ and $(u,\mu,v)$
be the associated singular triplet, and let
$$\eta=\mathop{\mathrm{gap}}(\tilde\mu)= \min_{\mu\neq\sigma\in sv_{ext}(A)}|\tilde\mu-\sigma|.$$
If $\eta>0$, then
\begin{align*}
|\tilde\mu-\mu |&\leq \frac{\varepsilon^2}{\eta},\\
\sqrt{\sin^2\theta(u,\tilde u)+ \sin^2\theta(v,\tilde v)} & \leq 
\frac{\sqrt{\|r\|_2^2 + \|s\|_2^2}}{\eta}.
\end{align*}

5. Let 
$$
A=\begin{bmatrix} M & E \\ F & H \end{bmatrix}, \quad 
\tilde A=\begin{bmatrix} M & 0 \\ 0 & H \end{bmatrix},
$$ 
where $M\in\mathbb{C}^{k\times k}$, and set $\eta=\min |\mu-\nu|$ over all $\mu\in sv(M)$ and $\nu\in sv_{ext}(H)$, and $\varepsilon =\max \{ \|E\|_2,\|F\|_2 \}$. Then
$$ 
\max |\sigma_j -\tilde\sigma_j| \leq \frac{2\varepsilon^2}{\eta+\sqrt{\eta^2+4\varepsilon^2}}.
$$

6. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min|\tilde \mu-\nu|$ over all $\tilde \mu\in sv(\tilde A_1)$ and 
$\nu\in sv_{ext}(A_2)$. If $\eta > 0$, then
$$
\sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2}
\leq \frac{\sqrt{\|R\|_F^2 + \|S\|_F^2 }}{\eta}.
$$


### Example

In [1]:
m=8
n=5
k=min(m,n)
A=rand(-9:9,m,n)

8×5 Array{Int64,2}:
  1   1   8   2  -4
  9  -8   3   4   9
  8  -5   0   5   4
  1   4   6  -6  -9
  9  -1  -3  -9  -9
 -8   8  -2  -8  -5
  2   4   9   6   7
 -8  -7  -6  -3  -6

In [2]:
ΔA=rand(m,n)/100
B=A+ΔA

8×5 Array{Float64,2}:
  1.00813   1.00314    8.00881      2.00192  -3.99981
  9.00916  -7.99758    3.00845      4.00976   9.00991
  8.0018   -4.99292    0.00209825   5.00316   4.00827
  1.00262   4.00004    6.00322     -5.99305  -8.99495
  9.00901  -0.994723  -2.99043     -8.99813  -8.99222
 -7.99946   8.00806   -1.99559     -7.99358  -4.99306
  2.00386   4.00926    9.00166      6.00149   7.00858
 -7.99971  -6.9918    -5.99644     -2.99264  -5.99133

In [3]:
U,σ,V=svd(A)
UB,μ,VB=svd(B)

(
[-0.0229218 -0.22399 … -0.603119 0.165238; -0.523621 0.288828 … -0.0345646 -0.756694; … ; -0.34694 -0.508048 … 0.000476426 -0.138833; 0.305533 0.321705 … -0.602477 -0.219714],

[27.8002,17.8628,17.2091,9.19816,4.31756],
[-0.433169 0.469585 … 0.311504 0.187403; 0.259402 -0.647535 … 0.584063 0.327155; … ; -0.533443 -0.226914 … -0.310627 0.714567; -0.634372 -0.19729 … 0.473791 -0.450953])

In [4]:
# Mirsky's Theorems
maxabs(σ-μ), norm(ΔA), vecnorm(σ-μ), vecnorm(ΔA)

(0.010258172149566036,0.03538838066450705,0.011361243938886262,0.039581539588805)

In [5]:
# Residual bounds - how close is (x,ζ,y) to (U[:,j],σ[j],V[:,j])
j=rand(2:k-1)
x=round(U[:,j],3)
y=round(V[:,j],3)
x=x/norm(x)
y=y/norm(y)
ζ=(x'*A*y)[]
σ, j, ζ

([27.7981,17.8658,17.1989,9.19837,4.3208],2,17.865761811224893)

In [6]:
# Fact 2
r=A*y-ζ*x
s=A'*x-ζ*y
ϵ=max(norm(r),norm(s))

0.016046687344708613

In [7]:
minimum(abs(σ-ζ)), ϵ

(8.254663086404435e-6,0.016046687344708613)

In [8]:
# Fact 4
η=min(abs(ζ-σ[j-1]),abs(ζ-σ[j+1]))

0.6669012691867202

In [9]:
ζ-σ[j], ϵ^2/η

(-8.254663086404435e-6,0.0003861083891066)

In [10]:
# Eigenvector bound
# cos(θ)
cosθU=dot(x,U[:,j])
cosθV=dot(y,V[:,j])
# Bound
sqrt(1-cosθU^2+1-cosθV^2), sqrt(norm(r)^2+norm(s)^2)/η

(0.0009850051588642667,0.028142695164709023)

In [11]:
# Fact 5 - we create small off-diagonal block perturbation
j=3
M=A[1:j,1:j]
H=A[j+1:m,j+1:n]
B=cat([1,2],M,H)

8×5 Array{Int64,2}:
 1   1  8   0   0
 9  -8  3   0   0
 8  -5  0   0   0
 0   0  0  -6  -9
 0   0  0  -9  -9
 0   0  0  -8  -5
 0   0  0   6   7
 0   0  0  -3  -6

In [12]:
E=rand(size(A[1:j,j+1:n]))/100
F=rand(size(A[j+1:m,1:j]))/100
C=map(Float64,B)
C[1:j,j+1:n]=E
C[j+1:m,1:j]=F
C

8×5 Array{Float64,2}:
 1.0           1.0         8.0           0.00746833   0.00512853
 9.0          -8.0         3.0           0.00874766   0.00431362
 8.0          -5.0         0.0           0.00490297   0.00772475
 0.00481472    0.0018486   0.009313     -6.0         -9.0       
 0.00539264    0.00317966  0.000952928  -9.0         -9.0       
 0.00677612    0.00514157  0.00225721   -8.0         -5.0       
 0.00309826    0.00134866  0.00412794    6.0          7.0       
 0.000582293   0.00838642  0.000261748  -3.0         -6.0       

In [13]:
svdvals(M).-svdvals(H)'

3×2 Array{Float64,2}:
  -6.50627  11.9308 
 -13.9105    4.52653
 -20.5089   -2.07186

In [14]:
ϵ=max(norm(E), norm(F))
β=svdvals(B)
γ=svdvals(C)
η=minabs(svdvals(M).-svdvals(H)')
[β γ], maxabs(β-γ), 2*ϵ^2/(η+sqrt(η^2+4*ϵ^2))

(
[22.0255 22.0255; 15.5192 15.5192; … ; 3.58847 3.58848; 1.51661 1.51663],

2.154598866166424e-5,0.00011934031120294469)

## Relative perturbation theory

### Definitions

Matrix $A\in\mathbb{C}^{m\times n}$ is __multiplicatively pertubed__ to $\tilde A$ if
$\tilde A=D_L^* A D_R$ for some $D_L\in\mathbb{C}^{m\times m}$ and 
$D_R\in\mathbb{C}^{n\times n}$. 

Matrix $A$ is (highly) __graded__ if it can be scaled as $A=GS$ such that $G$ is _well-behaved_ (that is, $\kappa_2(G)$ is of modest magnitude), where the __scaling matrix__ $S$ is often diagonal. Interesting cases are when $\kappa_2(G)\ll \kappa_2(A)$.

__Relative distances__ between two complex numbers $\alpha$ and $\tilde \alpha$ are:
\begin{align*}
\zeta(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}{\sqrt{|\alpha\tilde \alpha|}}, \quad \textrm{for } \alpha\tilde\alpha\neq 0,\\
\varrho(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}
{\sqrt{|\alpha|^2 +  |\tilde \alpha|^2}}, \quad \textrm{for } |\alpha|+|\tilde\alpha|> 0.
\end{align*}

### Facts

1. If $D_L$ and $D_R$ are non-singular and $m\geq n$, then
\begin{align*}
\frac{\sigma_j}{\|D_L^{-1}\|_2\|D_R^{-1}\|_2}& \leq \tilde\sigma_j \leq
\sigma_j \|D_L\|_2\|D_R\|_2, \quad \textrm{for } i=1,\ldots,n, \\
\| \mathop{\mathrm{diag}}(\zeta(\sigma_1,\tilde \sigma_1),\ldots,
\zeta(\sigma_n,\tilde \sigma_n)\|_{2,F} & \leq
\frac{1}{2}\|D_L^*-D_L^{-1}\|_{2,F} + \frac{1}{2}\|D_R^*-D_R^{-1}\|_{2,F}.
\end{align*}

2. Let $m\geq n$ and let
$$
\begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}=
\begin{bmatrix} A_1 &  \\ & A_2 \end{bmatrix}, \quad 
\begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}=
\begin{bmatrix} \tilde A_1 &  \\ & \tilde A_2 \end{bmatrix},
$$
where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, 
$\begin{bmatrix} V_1 & V_2 \end{bmatrix}$,
$\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and
$\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and 
$U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, 
$V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set
$$
R=A\tilde V_1-\tilde U_1\tilde A_1,\quad 
S=A^*\tilde U_1-\tilde V_1 \tilde A_1.
$$
Let $\eta=\min \varrho(\mu,\tilde \mu)$ over all $\mu\in sv(A_1)$ and 
$\tilde \mu\in sv_{ext}(A_2)$. If $\eta > 0$, then
\begin{align*}
& \sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 +
\|\sin\Theta(V_1,\tilde V_1)\|_F^2} \\
& \leq \frac{1}{\eta}( \|(I-D_L^*)U_1\|_F^2+ \|(I-D_L^{-1})U_1\|_F^2 \\
& \quad +\|(I-D_R^*)V_1\|_F^2+ \|(I-D_R^{-1})V_1\|_F^2 )^{1/2}.
\end{align*}

3. Let $A=GS$ and $\tilde A=\tilde GS$, and let 
$\Delta G=\tilde G-G$. Then $\tilde A=DA$, where $D=I+(\Delta G) G^{\dagger}$, and 
Fact 1 applies with $D_L=D$, $D_R=I$, and 
$$
\|D^*-D^{-1}\|_{2,F} \leq \bigg(1+\frac{1}{1-\|(\Delta G) G^{\dagger}\|_{2}}\bigg)
\frac{\|(\Delta G) G^{\dagger}\|_{2,F}}{2}.
$$
According to the notebook on 
[Jacobi Method and High Relative Accuracy](L4c Symmetric Eigenvalue Decomposition - Jacobi Method and High Relative Accuracy.ipynb), nearly optimal diagonal scaling is such that all columns of $G$ have unit norms, $S=\mathop{\mathrm{diag}} \big( \| A_{:,1}\|_2,\ldots,\|A_{:,n}\|_2 \big)$.

4. Let $A$ be an real upper-bidiagonal matrix with diagonal entries $a_1,a_2,\ldots,a_n$ and 
the super-diagonal entries $b_1,b_2, \ldots,b_{n-1}$. Let the diagonal entries of 
$\tilde A$ be $\alpha_1 a_1,\alpha_2 a_2,\ldots,\alpha_n a_n$, and its super-diagonal entries be
$\beta_1 b_1,\beta_2 b_2,\ldots,\beta_{n-1} b_{n-1}$. Then $\tilde A=D_L^* A D_R$ with 
\begin{align*}
D_L &=\mathop{\mathrm{diag}} \bigg(\alpha_1,\frac{\alpha_1 \alpha_2}{\beta_1},
\frac{\alpha_1 \alpha_2 \alpha_3}{\beta_1 \beta_2},\cdots\bigg),\\
D_R &=\mathop{\mathrm{diag}} \bigg(1, \frac{\beta_1}{\alpha_1},
\frac{\beta_1 \beta_2}{\alpha_1 \alpha_2},\cdots\bigg).
\end{align*}
Let $\alpha=\prod\limits_{j=1}^n \max\{\alpha_j, 1/\alpha_j\}$ and 
$\beta=\prod\limits_{j=1}^{n-1} \max\{\beta_j, 1/\beta_j\}$. Then
$$
(\alpha\beta)^{-1}\leq \| D_L^{-1}\|_2 \|D_R^{-1}\|_2 \leq
\| D_L\|_2 \|D_R\|_2  \leq \alpha\beta,
$$
and Fact 1 applies.
 
5. Consider the block partitioned matrices
\begin{align*}
A & =\begin{bmatrix} B & C \\ 0 & D\end{bmatrix}, \\
\tilde A & =  \begin{bmatrix} B & 0 \\ 0 & D\end{bmatrix}
=A \begin{bmatrix} I & -B^{-1} C \\ 0 & I \end{bmatrix}\equiv A D_R.
\end{align*}
By Fact 1, $\zeta(\sigma_j,\tilde \sigma_j) \leq \frac{1}{2} \|B^{-1}C\|_2$. This is used as a deflation criterion in the SVD algorithm for bidiagonal matrices.

### Example - Bidiagonal matrix

In order to illustrate Facts 1 to 3, we need an algorithm which computes the singular values with high relative acuracy. Such algorithm, the one-sided Jacobi method, is discussed in the following notebook. 

The algorithm actually used in the function `svdvals()` for `Bidiagonal` is the zero-shift bidiagonal QR algorithm, which attains the accuracy given by Fact 4: if all
$1-\varepsilon \leq \alpha_i,\beta_j \leq 1+\varepsilon$, then
$$
(1-\varepsilon)^{2n-1} \leq (\alpha\beta)^{-1} \leq \alpha\beta \leq (1-\varepsilon)^{2n-1}.
$$
In other words, $\varepsilon$ relative changes in diagonal and super-diagonal elements, cause at most $(2n-1)\varepsilon$ relative changes in the singular values.

__However__, if singular values and vectors are desired, the function `svd()` calls the standard algorithm, described in the next notebook, which __does not attain this accuracy__ .

In [15]:
n=50
δ=100000
# The starting matrix
a=exp(50*(rand(n)-0.5))
b=exp(50*(rand(n-1)-0.5))
A=Bidiagonal(a,b, true)
# Multiplicative perturbation
DL=ones(n)+(rand(n)-0.5)/δ
DR=ones(n)+(rand(n)-0.5)/δ
# The perturbed matrix
α=DL.*a.*DR
β=DL[1:end-1].*b.*DR[2:end]
B=Bidiagonal(α,β,true)
(A.dv-B.dv)./A.dv

50-element Array{Float64,1}:
 -1.29771e-6
 -1.9658e-6 
 -8.19235e-7
 -4.69741e-6
  4.80814e-7
 -4.93721e-6
 -2.60945e-6
  7.90615e-6
 -9.38764e-7
 -3.08349e-6
 -2.7413e-6 
 -1.55638e-6
 -8.26756e-6
  ⋮         
 -6.21736e-7
 -1.64187e-6
  7.46648e-6
 -4.74445e-6
 -1.18368e-6
  4.49824e-6
  1.45674e-6
 -1.38842e-6
  4.43873e-6
 -2.36052e-6
 -5.22332e-6
  2.85778e-6

In [16]:
(a-α)./a, (b-β)./b

([-1.29771e-6,-1.9658e-6,-8.19235e-7,-4.69741e-6,4.80814e-7,-4.93721e-6,-2.60945e-6,7.90615e-6,-9.38764e-7,-3.08349e-6  …  7.46648e-6,-4.74445e-6,-1.18368e-6,4.49824e-6,1.45674e-6,-1.38842e-6,4.43873e-6,-2.36052e-6,-5.22332e-6,2.85778e-6],[1.5148e-6,-6.49666e-6,-1.10256e-6,-3.66558e-6,-1.73601e-6,-2.23174e-6,3.683e-6,6.62486e-6,-7.98341e-6,3.39545e-7  …  3.10208e-6,1.68647e-6,-6.9782e-6,6.05872e-6,2.2546e-6,-3.37961e-6,6.75735e-6,8.97095e-7,-6.99437e-6,-4.83138e-7])

In [17]:
@which svdvals(A)

In [18]:
σ=svdvals(A)
μ=svdvals(B)
[σ (σ-μ)./σ]

50×2 Array{Float64,2}:
 6.64838e10    3.35004e-6
 4.9375e10    -2.03315e-6
 2.52386e10    3.39545e-7
 1.64459e10    2.12407e-7
 1.23592e10    6.62486e-6
 1.18188e10   -4.02273e-6
 5.81067e9    -5.88549e-6
 4.03938e9    -1.46825e-6
 3.15812e9     4.57643e-6
 2.48578e9     8.03348e-6
 1.52474e8    -4.74445e-6
 1.02738e8    -4.93685e-6
 4.94015e7     6.3923e-6 
 ⋮                       
 1.36358e-6    4.89574e-6
 6.73836e-8    1.07632e-6
 2.40526e-9    3.31439e-7
 2.16628e-9   -2.03496e-6
 1.16732e-9   -1.18368e-6
 7.06124e-10   9.68586e-7
 5.25624e-10  -2.8914e-6 
 4.42392e-10   1.65225e-6
 4.47902e-12   1.39815e-6
 1.84379e-13   2.56721e-6
 1.44007e-39  -1.23348e-6
 2.91827e-70   3.91984e-7

In [19]:
cond(A)

2.278191862600967e80

In [20]:
# The standard algorithm
U,ν,V=svd(A);

In [21]:
(σ-ν)./σ

50-element Array{Float64,1}:
     0.0        
     0.0        
     1.51145e-16
     0.0        
     0.0        
    -1.61382e-16
     0.0        
    -1.18047e-16
     0.0        
     0.0        
     0.0        
     0.0        
    -1.50817e-16
     ⋮          
    -3.87178    
   -97.5857     
 -2760.89       
 -3065.58       
 -1167.13       
  -783.37       
  -127.197      
     0.999795   
     0.997226   
     1.0        
    -9.97029e13 
    -8.90869e29 